ChatGPT And Elasticsearch The RAG Really Tied The App Together
openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIrag-ties-the-app-togetherchatlogvectordatabasePythonsearchgenaistacksupporting-blog-contentvectorelasticsearch-labslangchainapplications
alph-notebooks/elasticsearch-labs / ChatGPT_and_Elasticsearch__The_RAG_Really_Tied_the_App_Together.ipynb
Export
ChatGPT and Elasticsearch: The RAG Really Tied the App Together
This notebook will show you how to:
- Create an Elastics Serverless Project
- Setup an Inference API
- This will download and deploy ELSER for embedding inference
- Create an index template
- This will use
semantic_textwhich will auto-chunk and embed the body of text - Use the Elastic Open Crawler to crawl the Elastic Search/Observability/Security Labs
The accompying blog takes it further by showing you how to:
- Use Playground to test chat prompts and configurations
- Then generate queries for our RAG app
- Use the queries from Playground to finish out a RAG Chatbot app
- Python FastAPI backend with React frontend
[ ]
[2]
Project Setup
Enter your Cloud API Key
Generate your secret API key at https://cloud.elastic.co/account/keys
[3]
Enter your API key: ·········· API key successfully entered!
Create Elasticsearch project
[4]
Waiting for project to be ready. Current status:initializing - Loop 7 Sleeping 10 seconds Project is ready
Create elasticsearch client
[5]
Project API Key
Create a Project level API key
[6]
full_access_key has been created
Inference API and Index Setup
Inference API
This will:
- Create an inference API endpoint
- Download ELSER model (if not already downloaded)
- Deploy ELSER model with
service_settingsconfigs
Note - This will wait for ELSER to be downloaded and deployed
[7]
Waiting for inference model to be fully deployed Inference API created and Inference model is fully deployed.
Create index template
The two key fields here are:
- body
- the field with the body of text and we use that as the source to copy to our semantic text field
semantic_body - semantic_body
- This field will automatically handle chunking and generating embeddings
[8]
{'acknowledged': True}
Crawl the docs
Open Crawler
This HAS TO BE RUN on a Linux/Mac/Windows host/vm NOT in colab
The blog details the steps below running on a Macbook
You can also review the Open Crawler setup.
High level steps to configure and run crawler
This HAS TO BE RUN on a Linux/Mac/Windows host/vm NOT in colab
- Clone the repo
git clone git@github.com:elastic/crawler.git- Build the Open Crawler Docker container
docker build -t crawler-image . && docker run -i -d --name crawler crawler-image- Create a new config file
vi config/elastic-labs.yml- run the generate config cell below then paste the output in the config file and save.
- Copy the new local config into the container
docker cp config/elastic-labs.yml crawler:/app/config/elastic-labs.yml- Run the crawler
docker exec -it crawler bin/crawler crawl config/elastic-labs.yml
Generate Config
Run the below cell to generate the yml config file
[ ]
Confirm the docs have been crawled
First look at the count of docs for each Labs' site
[20]
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
'aggregations': {'url_path_dir1': {'buckets': [{'doc_count': 216,
'key': 'search-labs'},
{'doc_count': 214,
'key': 'security-labs'},
{'doc_count': 158,
'key': 'observability-labs'}],
'doc_count_error_upper_bound': 0,
'sum_other_doc_count': 0}},
'hits': {'hits': [],
'max_score': None,
'total': {'relation': 'eq', 'value': 588}},
'timed_out': False,
'took': 6}
Next review a sample doc
[23]
Streaming output truncated to the last 5000 lines.
'autoscaling '
'metrics '
'API '
'exposes '
'a '
'list '
'of '
'ingestion '
'load '
'values, '
'one '
'for '
'each '
'indexing '
'node. '
'Note '
'that '
'as '
'the '
'write '
'thread '
'pools '
'(which '
'handle '
'indexing '
'requests) '
'are '
'sized '
'based '
'on '
'the '
'number '
'of '
'CPU '
'cores '
'on '
'the '
'node, '
'this '
'essentially '
'determines '
'the '
'total '
'number '
'of '
'cores '
'that '
'is '
'needed '
'in '
'the '
'cluster '
'to '
'handle '
'the '
'indexing '
'workload. '
'The '
'ingestion '
'load '
'on '
'each '
'indexing '
'node '
'consists '
'of '
'two '
'components: '
'Thread '
'pool '
'utilization: '
'the '
'average '
'number '
'of '
'threads '
'in '
'the '
'write '
'thread '
'pool '
'processing '
'indexing '
'requests '
'during '
'that '
'sampling '
'period. '
'Queued '
'ingestion '
'load: '
'the '
'estimated '
'number '
'of '
'threads '
'needed '
'to '
'handle '
'queued '
'write '
'requests. '
'The '
'ingestion '
'load '
'of '
'each '
'indexing '
'node '
'is '
'calculated '
'as '
'the '
'sum '
'of '
'these '
'two '
'values '
'for '
'all '
'the '
'three '
'write '
'thread '
'pools '
'. '
'The '
'total '
'ingestion '
'load '
'of '
'the '
'Elasticsearch '
'cluster '
'is '
'the '
'sum '
'of '
'the '
'ingestion '
'load '
'of '
'the '
'individual '
'nodes. '
'n '
'o '
'd '
'e '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
't '
'h'},
{'embeddings': {'##est': 1.3433179,
'##estinal': 0.5916747,
'##ical': 0.21335103,
'##ing': 0.66160166,
'##ion': 1.223692,
'##l': 0.06755174,
'##ler': 0.34178317,
'##line': 0.6707441,
'##ling': 1.0343578,
'##load': 0.9880499,
'##mat': 0.01314945,
'##rch': 1.3459072,
'##s': 0.25005433,
'##sca': 1.6867673,
'##scu': 0.028700678,
'##sea': 1.6748068,
'_': 0.28835136,
'access': 0.116686985,
'accounting': 0.15865436,
'algorithm': 1.0487378,
'algorithms': 0.2763102,
'allocation': 0.1481772,
'amazon': 0.9099395,
'among': 0.04313716,
'anal': 0.025087006,
'analysis': 0.64178395,
'analyze': 0.18673302,
'and': 0.19101046,
'apache': 0.6617465,
'api': 1.4468017,
'approximate': 0.026616694,
'are': 0.19081613,
'arithmetic': 0.12217364,
'ass': 0.12156314,
'auto': 1.4633765,
'automatic': 0.73048806,
'availability': 0.20461462,
'average': 0.58710635,
'bot': 0.12357169,
'buffer': 0.14556783,
'calculate': 0.02387442,
'calculated': 0.2452304,
'calculation': 0.81089926,
'called': 0.2972479,
'capacity': 0.60224617,
'catalog': 0.078262925,
'category': 0.21683785,
'checkpoint': 0.012995078,
'chess': 0.41694775,
'chip': 0.10178017,
'class': 0.5914888,
'classification': 0.17686933,
'cluster': 1.4369037,
'clusters': 0.21254443,
'comply': 0.131236,
'component': 0.37191656,
'components': 0.87235415,
'computation': 0.47024545,
'compute': 0.14372817,
'computer': 0.397558,
'constant': 0.09540719,
'consumption': 0.123454005,
'cope': 0.7024604,
'core': 0.62535626,
'cores': 1.0230916,
'cpu': 0.874175,
'crawl': 0.23010625,
'current': 0.5516459,
'data': 0.25792596,
'database': 0.4601695,
'determine': 0.3844099,
'determined': 0.41348428,
'diagram': 0.025166756,
'dimensions': 0.07042265,
'disk': 0.07931721,
'each': 0.22229394,
'elastic': 1.8257822,
'enter': 0.058845505,
'equation': 0.43812877,
'es': 0.8055687,
'estimate': 0.03608101,
'estimated': 0.46266982,
'execution': 0.05638616,
'factors': 0.12973839,
'forest': 0.3904727,
'formula': 0.016075172,
'framework': 0.34186286,
'g': 0.08017753,
'gage': 0.30852094,
'gene': 0.27250904,
'handle': 0.9037246,
'handling': 0.69093794,
'implement': 0.053764082,
'index': 1.3896008,
'indexed': 0.25086805,
'ing': 1.5002296,
'integration': 0.20222682,
'interface': 0.25386703,
'inventory': 0.5645011,
'is': 0.05772473,
'java': 1.2391971,
'l': 0.048691455,
'lake': 0.24773102,
'lane': 0.25919613,
'lang': 0.039321195,
'learning': 0.033810128,
'library': 0.14143226,
'list': 0.10985089,
'lists': 0.12752165,
'load': 1.7350225,
'loaded': 0.057171866,
'loading': 0.75305617,
'loads': 0.12072936,
'log': 0.06388949,
'machine': 0.47294563,
'mass': 0.092697844,
'math': 0.7472431,
'matrix': 0.045127213,
'maximum': 0.094020285,
'measure': 0.32414404,
'memories': 0.03024405,
'memory': 1.2586498,
'method': 0.016832462,
'metric': 1.1439759,
'mining': 0.40203753,
'mp': 0.09331862,
'multi': 0.031247457,
'multiple': 0.38688186,
'n': 0.33228758,
'need': 0.19645856,
'network': 0.42359397,
'new': 0.041632555,
'node': 1.3807943,
'nodes': 0.63807905,
'number': 0.4450389,
'o': 0.50335085,
'operation': 0.008523868,
'order': 0.08601924,
'pattern': 0.11067777,
'percent': 0.13746342,
'performance': 0.41614294,
'period': 0.49507552,
'pool': 1.3188534,
'poole': 0.3433027,
'pools': 1.2800426,
'predict': 0.23377013,
'processing': 1.0733001,
'processor': 0.10840816,
'pure': 0.11351536,
'quantity': 0.109573685,
'queue': 1.1129105,
'ram': 0.14691876,
'rank': 0.36504152,
'ratio': 0.011385939,
'read': 0.13304754,
'represent': 0.42444453,
'representation': 0.058323957,
'request': 0.755568,
'requests': 0.7039498,
'routing': 0.060857404,
'sample': 0.62170815,
'sampling': 0.8610632,
'scala': 0.25192302,
'scale': 0.5968038,
'sea': 0.20613533,
'search': 0.4318061,
'semi': 0.33687106,
'sequence': 0.23863083,
'serial': 0.15801017,
'server': 0.16233677,
'si': 0.2002626,
'sid': 0.44975162,
'size': 0.8577202,
'sized': 0.21010487,
'sizes': 0.4059122,
'small': 0.09116832,
'software': 0.09232291,
'sort': 0.35720947,
'sorting': 0.06234357,
'spectrum': 0.07792632,
'sql': 0.116530605,
'statistical': 0.0852167,
'statistics': 0.22820702,
'stomach': 0.018201118,
'sum': 0.89766365,
'swarm': 0.20437151,
'table': 0.007837142,
'task': 0.37974054,
'taste': 0.053832427,
'taylor': 0.10206632,
'thread': 1.5052487,
'threads': 1.2515007,
'three': 0.27322263,
'total': 0.64918166,
'tree': 0.098200426,
'unit': 0.15584692,
'used': 0.56170344,
'useful': 0.34977943,
'utilization': 1.0091052,
'value': 0.7453479,
'values': 0.63835937,
'vector': 0.3917736,
'weaving': 0.11804886,
'web': 0.46383187,
'work': 0.29207155,
'write': 1.1660185,
'writing': 0.25973478,
'z': 0.3776876},
'text': 'that '
'are '
'used '
'for '
'ingest '
'autoscaling '
'in '
'Elasticsearch '
'are '
'ingestion '
'load '
'and '
'memory. '
'Ingestion '
'load '
'Ingestion '
'load '
'represents '
'the '
'number '
'of '
'threads '
'that '
'is '
'needed '
'to '
'cope '
'with '
'the '
'current '
'indexing '
'load. '
'The '
'autoscaling '
'metrics '
'API '
'exposes '
'a '
'list '
'of '
'ingestion '
'load '
'values, '
'one '
'for '
'each '
'indexing '
'node. '
'Note '
'that '
'as '
'the '
'write '
'thread '
'pools '
'(which '
'handle '
'indexing '
'requests) '
'are '
'sized '
'based '
'on '
'the '
'number '
'of '
'CPU '
'cores '
'on '
'the '
'node, '
'this '
'essentially '
'determines '
'the '
'total '
'number '
'of '
'cores '
'that '
'is '
'needed '
'in '
'the '
'cluster '
'to '
'handle '
'the '
'indexing '
'workload. '
'The '
'ingestion '
'load '
'on '
'each '
'indexing '
'node '
'consists '
'of '
'two '
'components: '
'Thread '
'pool '
'utilization: '
'the '
'average '
'number '
'of '
'threads '
'in '
'the '
'write '
'thread '
'pool '
'processing '
'indexing '
'requests '
'during '
'that '
'sampling '
'period. '
'Queued '
'ingestion '
'load: '
'the '
'estimated '
'number '
'of '
'threads '
'needed '
'to '
'handle '
'queued '
'write '
'requests. '
'The '
'ingestion '
'load '
'of '
'each '
'indexing '
'node '
'is '
'calculated '
'as '
'the '
'sum '
'of '
'these '
'two '
'values '
'for '
'all '
'the '
'three '
'write '
'thread '
'pools '
'. '
'The '
'total '
'ingestion '
'load '
'of '
'the '
'Elasticsearch '
'cluster '
'is '
'the '
'sum '
'of '
'the '
'ingestion '
'load '
'of '
'the '
'individual '
'nodes. '
'n '
'o '
'd '
'e '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
't '
'h '
'r '
'e '
'a '
'd '
'_ '
'p '
'o '
'o '
'l '
'_ '
'u '
't '
'i '
'l '
'i '
'z '
'a '
't '
'i '
'o '
'n '
'+ '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
't '
'o '
't '
'a '
'l '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
'n '
'o '
'd '
'e '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
'\\small '
'node\\_ingestion\\_load '
'= '
'\\sum(thread\\_pool\\_utilization '
'+ '
'queued\\_ingestion\\_load) '
'\\newline '
'total\\_ingestion\\_load '
'= '
'\\sum(node\\_ingestion\\_load) '
'n '
'o '
'd '
'e '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
't '
'h '
're '
'a '
'd '
'_ '
'p '
'oo '
'l '
'_ '
'u '
't '
'i '
'l '
'i '
'z '
'a '
't '
'i '
'o '
'n '
'+ '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
't '
'o '
't '
'a '
'l '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
'n '
'o '
'd '
'e '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
'Figure '
'2 '
': '
'ingestion'},
{'embeddings': {'##able': 0.5624876,
'##ba': 0.10684605,
'##d': 0.12233314,
'##est': 0.84587747,
'##ima': 0.2508807,
'##ing': 0.57414246,
'##ion': 1.1121849,
'##line': 1.1430916,
'##ma': 1.1706055,
'##w': 1.3673741,
'##ws': 0.33763555,
'10': 0.51392806,
'200': 0.73087466,
'30': 0.45019,
'60': 1.3045075,
'[UNK]': 0.2956499,
'_': 0.33742356,
'acceptable': 0.29635867,
'access': 0.23300913,
'accounting': 0.1906402,
'achieve': 0.19722655,
'algorithm': 1.1037958,
'algorithms': 0.26360378,
'allocation': 0.53156596,
'analysis': 0.41347402,
'apache': 0.54295164,
'api': 0.21713388,
'approximate': 0.51163644,
'arithmetic': 0.005784557,
'availability': 0.4917338,
'average': 0.8478212,
'batch': 0.08666975,
'blocking': 0.02501016,
'bot': 0.06050198,
'buffer': 0.40386045,
'bug': 0.055751722,
'busy': 1.3026394,
'calculate': 0.26999432,
'calculation': 0.74316484,
'capacity': 0.6725085,
'chess': 0.25134456,
'class': 0.328252,
'client': 0.23896244,
'clock': 1.125488,
'cluster': 0.5103067,
'component': 0.2536751,
'components': 0.78435194,
'computation': 0.62016183,
'compute': 0.06482519,
'computer': 0.32330835,
'concurrency': 0.011380989,
'configuration': 0.6887391,
'configured': 0.26263618,
'constant': 0.29082793,
'consumption': 0.16989039,
'cpu': 0.3717718,
'database': 0.13461274,
'e': 0.7789312,
'effect': 0.09419204,
'effort': 0.055172946,
'employee': 0.3274528,
'employees': 0.14320064,
'ensemble': 0.19942468,
'equation': 0.3787911,
'equivalent': 0.050270963,
'error': 0.12898737,
'es': 0.043630168,
'est': 0.20599021,
'estimate': 1.0792123,
'estimated': 0.39457676,
'estimates': 0.465428,
'estimation': 0.080784135,
'every': 0.16873945,
'excess': 1.0022457,
'excessive': 0.451759,
'execute': 0.59175754,
'executing': 0.091966435,
'execution': 1.3065349,
'existing': 0.6437884,
'exponential': 1.1467187,
'extra': 0.26056916,
'figure': 0.019528389,
'finish': 0.012790194,
'finished': 0.21236378,
'flow': 0.10995065,
'g': 0.43504617,
'gage': 0.4229588,
'group': 0.43960038,
'guild': 0.014967873,
'handle': 0.80899215,
'handling': 0.7681083,
'heap': 0.3867438,
'hours': 0.7462872,
'http': 0.20072725,
'implement': 0.16245411,
'implementation': 0.2408709,
'improve': 0.10136651,
'index': 1.2976965,
'indexed': 0.10614389,
'ing': 1.2063053,
'inventory': 0.25356865,
'java': 1.2153534,
'l': 0.48968774,
'lake': 0.27167574,
'lane': 0.54473066,
'length': 0.64622724,
'library': 0.08392323,
'line': 0.5581907,
'load': 1.5088638,
'loading': 0.5335804,
'machine': 0.3173762,
'manage': 0.5220977,
'managed': 0.45824686,
'management': 0.3230387,
'mass': 0.15742503,
'math': 0.81244004,
'maximum': 0.34374076,
'measure': 0.25600985,
'memory': 0.5085309,
'mining': 0.4451848,
'minute': 0.39483455,
'minutes': 0.22895378,
'moving': 0.76410496,
'mp': 0.046217,
'multiple': 0.10666605,
'n': 0.5416694,
'network': 0.3097243,
'new': 0.49582836,
'node': 1.1907045,
'number': 0.47905272,
'o': 0.47123736,
'operation': 0.19577809,
'optimal': 0.1733028,
'par': 0.09612937,
'percent': 0.1152151,
'performance': 0.74001515,
'pool': 1.7006081,
'poole': 0.36192703,
'pools': 1.0764378,
'predict': 0.38117534,
'probe': 0.2430691,
'process': 0.12230635,
'processing': 0.47061718,
'proportion': 0.2145018,
'proportional': 1.1204233,
'proposal': 0.1401456,
'q': 0.3259466,
'queue': 1.580318,
'r': 0.14266703,
'rank': 0.13613336,
'rate': 0.39469108,
'request': 1.1001134,
'requests': 0.63539153,
'resolution': 0.055606272,
'resource': 0.21417612,
'resources': 0.7937882,
'routing': 0.14261606,
'sample': 1.0720835,
'sampled': 1.0306277,
'samples': 1.2079935,
'sampling': 0.6740413,
'scala': 0.07395835,
'script': 0.10171158,
'second': 0.18827602,
'seconds': 0.817573,
'sequence': 0.49634397,
'serial': 0.033651996,
'server': 0.32002103,
'share': 0.27626935,
'sid': 0.27850676,
'size': 0.11843514,
'small': 0.75451213,
'speed': 0.30091006,
'sql': 0.31397846,
'statistical': 0.0100006005,
'strategy': 0.08963276,
'stream': 0.028335843,
'sum': 1.1407199,
'surplus': 0.15598625,
'swarm': 0.054142684,
'task': 1.2177191,
'tasks': 1.0780356,
'taylor': 0.24217507,
'technique': 0.0030198945,
'thread': 1.7842301,
'threads': 0.9916815,
'time': 0.9839317,
'timer': 0.19039534,
'times': 0.5299459,
'total': 0.40682667,
'traffic': 0.28910428,
'universe': 0.013594781,
'usage': 0.5520448,
'utilization': 1.6104044,
'value': 0.6036144,
'values': 0.33944046,
'w': 0.4972394,
'wait': 0.005872378,
'wall': 1.1351137,
'weaving': 0.13777943,
'web': 0.2821159,
'weighted': 1.1533256,
'worker': 1.0417976,
'workers': 1.2245823,
'z': 0.29032487},
'text': 'r '
'e '
'a '
'd '
'_ '
'p '
'o '
'o '
'l '
'_ '
'u '
't '
'i '
'l '
'i '
'z '
'a '
't '
'i '
'o '
'n '
'+ '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
't '
'o '
't '
'a '
'l '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
'n '
'o '
'd '
'e '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
'\\small '
'node\\_ingestion\\_load '
'= '
'\\sum(thread\\_pool\\_utilization '
'+ '
'queued\\_ingestion\\_load) '
'\\newline '
'total\\_ingestion\\_load '
'= '
'\\sum(node\\_ingestion\\_load) '
'n '
'o '
'd '
'e '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
't '
'h '
're '
'a '
'd '
'_ '
'p '
'oo '
'l '
'_ '
'u '
't '
'i '
'l '
'i '
'z '
'a '
't '
'i '
'o '
'n '
'+ '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
't '
'o '
't '
'a '
'l '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'∑ '
'( '
'n '
'o '
'd '
'e '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
') '
'Figure '
'2 '
': '
'ingestion '
'load '
'components '
'The '
'thread '
'pool '
'utilization '
'is '
'an '
'exponentially '
'weighted '
'moving '
'average '
'(EWMA) '
'of '
'the '
'number '
'of '
'busy '
'threads '
'in '
'the '
'thread '
'pool, '
'sampled '
'every '
'second. '
'The '
'EWMA '
'of '
'the '
'sampled '
'thread '
'pool '
'utilization '
'values '
'is '
'configured '
'such '
'that '
'the '
'sampled '
'values '
'of '
'the '
'past '
'10 '
'seconds '
'have '
'the '
'most '
'effect '
'on '
'the '
'thread '
'pool '
'utilization '
'component '
'of '
'the '
'ingestion '
'load '
'and '
'samples '
'older '
'than '
'60 '
'seconds '
'have '
'very '
'negligible '
'impact. '
'To '
'estimate '
'the '
'resources '
'required '
'to '
'handle '
'the '
'queued '
'indexing '
'requests '
'in '
'the '
'thread '
'pool, '
'we '
'need '
'to '
'have '
'an '
'estimate '
'for '
'how '
'long '
'each '
'queued '
'task '
'can '
'take '
'to '
'execute. '
'To '
'achieve '
'this, '
'each '
'thread '
'pool '
'also '
'provides '
'an '
'EWMA '
'of '
'the '
'request '
'execution '
'time. '
'The '
'request '
'execution '
'time '
'for '
'an '
'indexing '
'request '
'is '
'the '
'(wall-clock) '
'time '
'taken '
'for '
'the '
'request '
'to '
'finish '
'once '
'it '
'is '
'out '
'of '
'the '
'queue '
'and '
'a '
'worker '
'thread '
'starts '
'executing '
'it. '
'As '
'some '
'queueing '
'is '
'acceptable '
'and '
'should '
'be '
'manageable '
'by '
'the '
'thread '
'pool, '
'we '
'try '
'to '
'estimate '
'the '
'resources '
'needed '
'to '
'handle '
'the '
'excess '
'queueing. '
'We '
'consider '
'up '
'to '
'30s '
'worth '
'of '
'tasks '
'in '
'the '
'queue '
'manageable '
'by '
'the '
'existing '
'number '
'of '
'workers '
'and '
'account '
'for '
'an '
'extra '
'thread '
'proportional '
'to '
'this '
'value. '
'For '
'example, '
'if '
'the '
'average '
'task '
'execution '
'time '
'is '
'200ms, '
'we '
'estimate '
'that'},
{'embeddings': {'##d': 0.06352329,
'##est': 0.89852107,
'##estinal': 0.13183321,
'##ima': 0.40056115,
'##ing': 0.61320734,
'##ion': 0.72260284,
'##ling': 0.8949169,
'##load': 0.57369965,
'##m': 0.23721623,
'##ma': 1.4438714,
'##mas': 0.24820994,
'##mat': 0.24343531,
'##sca': 0.92204034,
'##w': 1.6598973,
'##ws': 0.6782139,
'10': 0.7749067,
'150': 1.2471286,
'200': 0.58304185,
'30': 1.076181,
'60': 1.1588365,
'_': 0.17651597,
'acceptable': 0.0395143,
'access': 0.05357292,
'accounting': 0.22549874,
'achieve': 0.040418815,
'algorithm': 0.9928478,
'algorithms': 0.08838318,
'allocation': 0.7647576,
'analysis': 0.428812,
'apache': 0.5859765,
'api': 0.016843364,
'approximate': 0.21684457,
'arithmetic': 0.053462975,
'array': 0.066098064,
'auto': 0.53497416,
'automatic': 0.20355695,
'availability': 0.6690054,
'average': 1.0341543,
'blocking': 0.1431715,
'buffer': 0.46087772,
'bug': 0.23163809,
'busy': 1.3082193,
'calculate': 0.2015065,
'calculation': 0.71491575,
'capacity': 0.8027149,
'checkpoint': 0.10162155,
'chess': 0.26765594,
'class': 0.5377411,
'client': 0.028412435,
'clock': 0.81897706,
'cluster': 0.6336233,
'component': 1.2550238,
'components': 1.4753778,
'computation': 0.5360401,
'compute': 0.09496682,
'computer': 0.48583803,
'computers': 0.082595915,
'computing': 0.0053236387,
'concept': 0.09244595,
'concurrency': 0.080570355,
'configuration': 0.63552403,
'configured': 0.49945095,
'constant': 0.15874276,
'consumption': 0.3705247,
'count': 0.15291668,
'cpu': 0.4727478,
'data': 0.5534523,
'database': 0.24513115,
'definition': 0.25252765,
'dew': 0.027248075,
'disadvantage': 0.043538865,
'disk': 1.0258542,
'during': 0.024176076,
'e': 1.3067937,
'each': 0.01788934,
'ec': 0.5695534,
'ee': 0.08090695,
'effect': 0.33151782,
'employee': 0.14918438,
'employees': 0.026578736,
'equation': 0.42684066,
'es': 0.18498634,
'est': 0.098570675,
'estimate': 0.83097947,
'estimated': 0.19130428,
'estimates': 0.04933924,
'every': 0.384432,
'excess': 0.44124436,
'execute': 0.56965685,
'execution': 1.092663,
'exponential': 1.2772857,
'extra': 0.3341091,
'finish': 0.47172138,
'finished': 0.5516902,
'flow': 0.1065439,
'fra': 0.5131407,
'gage': 0.41627494,
'group': 0.40121686,
'handle': 0.76723486,
'handling': 0.8265911,
'hardware': 0.007931168,
'heap': 0.055197764,
'hours': 0.5783272,
'http': 0.16334121,
'implement': 0.20851848,
'improve': 0.033503063,
'index': 1.351592,
'indexed': 1.2516088,
'ing': 1.2539797,
'inventory': 0.26884475,
'io': 0.49151403,
'is': 0.67021686,
'items': 0.30828458,
'java': 1.233984,
'lake': 0.37700737,
'lane': 0.35798323,
'lang': 0.11334816,
'length': 0.39039937,
'library': 0.0020271246,
'load': 1.839116,
'loading': 0.52925104,
'log': 0.026120221,
'ma': 0.37466413,
'machine': 0.41295668,
'managed': 0.016499385,
'management': 0.24261811,
'many': 0.0001822544,
'map': 0.16712263,
'mat': 0.08338378,
'math': 0.69625205,
'maximum': 0.34880605,
'mb': 0.37918818,
'measure': 0.14309268,
'memory': 0.58699423,
'metric': 0.113157846,
'mill': 0.087879546,
'minimum': 0.042228475,
'mining': 0.31173173,
'minute': 0.2855463,
'minutes': 0.037687548,
'mm': 0.04705554,
'move': 0.24638273,
'moving': 1.068798,
'mp': 0.339956,
'mt': 0.18115476,
'multi': 0.045562405,
'multiple': 0.2256053,
'n': 0.20722932,
'network': 0.2870649,
'node': 0.74391615,
'nodes': 0.40956134,
'number': 0.5414315,
'object': 0.36274558,
'old': 0.026420968,
'older': 0.14505674,
'operation': 0.137978,
'optimal': 0.03703803,
'par': 0.0058114612,
'parts': 0.011510156,
'past': 0.25731233,
'percent': 0.35817072,
'performance': 0.801656,
'pool': 1.8708751,
'poole': 0.2727913,
'pools': 1.2964886,
'population': 0.11810607,
'predict': 0.18177378,
'probe': 0.21369988,
'processing': 0.4105097,
'proportional': 0.6098035,
'q': 0.13568267,
'queue': 1.2824515,
'rank': 0.40675223,
'rate': 0.46714726,
'request': 0.949167,
'requests': 0.6644938,
'requirements': 0.3288823,
'resource': 0.4609863,
'resources': 0.9455237,
'routing': 0.18650433,
'sample': 1.0472832,
'sampled': 0.8309003,
'samples': 1.1415888,
'sampling': 0.45636305,
'scala': 0.12271185,
'scale': 0.3144392,
'second': 0.49777645,
'seconds': 0.7695267,
'sequence': 0.21608938,
'serial': 0.049026124,
'server': 0.37191278,
'share': 0.19251333,
'si': 0.020900367,
'sid': 0.41317028,
'size': 0.7470095,
'sizes': 0.060290556,
'small': 0.015217632,
'speed': 0.21846266,
'sql': 0.39542097,
'stack': 0.047259662,
'start': 0.15702806,
'statistical': 0.031916108,
'statistics': 0.08593676,
'storage': 0.034532573,
'store': 0.053150244,
'survey': 0.1747176,
'system': 0.08567025,
'table': 0.006464522,
'task': 1.1504556,
'tasks': 0.7951614,
'taylor': 0.14394312,
'term': 0.63525033,
'thirty': 0.26077473,
'thread': 2.0543768,
'threads': 1.1089593,
'tier': 1.207179,
'time': 0.68932414,
'timer': 0.14907645,
'times': 0.32087305,
'total': 0.22359692,
'traffic': 0.26179498,
'trial': 0.2198535,
'u': 0.064360306,
'unit': 0.13278264,
'usage': 0.6241088,
'utilization': 1.6971744,
'value': 0.66488856,
'values': 0.2064584,
'w': 0.81893605,
'wait': 0.103130125,
'wall': 1.0635448,
'weaving': 0.07162173,
'web': 0.23646998,
'weight': 0.030211551,
'weighted': 1.2184887,
'work': 0.23164386,
'worker': 0.7420831,
'workers': 1.0619413,
'ze': 0.40276462},
'text': 'load '
'components '
'The '
'thread '
'pool '
'utilization '
'is '
'an '
'exponentially '
'weighted '
'moving '
'average '
'(EWMA) '
'of '
'the '
'number '
'of '
'busy '
'threads '
'in '
'the '
'thread '
'pool, '
'sampled '
'every '
'second. '
'The '
'EWMA '
'of '
'the '
'sampled '
'thread '
'pool '
'utilization '
'values '
'is '
'configured '
'such '
'that '
'the '
'sampled '
'values '
'of '
'the '
'past '
'10 '
'seconds '
'have '
'the '
'most '
'effect '
'on '
'the '
'thread '
'pool '
'utilization '
'component '
'of '
'the '
'ingestion '
'load '
'and '
'samples '
'older '
'than '
'60 '
'seconds '
'have '
'very '
'negligible '
'impact. '
'To '
'estimate '
'the '
'resources '
'required '
'to '
'handle '
'the '
'queued '
'indexing '
'requests '
'in '
'the '
'thread '
'pool, '
'we '
'need '
'to '
'have '
'an '
'estimate '
'for '
'how '
'long '
'each '
'queued '
'task '
'can '
'take '
'to '
'execute. '
'To '
'achieve '
'this, '
'each '
'thread '
'pool '
'also '
'provides '
'an '
'EWMA '
'of '
'the '
'request '
'execution '
'time. '
'The '
'request '
'execution '
'time '
'for '
'an '
'indexing '
'request '
'is '
'the '
'(wall-clock) '
'time '
'taken '
'for '
'the '
'request '
'to '
'finish '
'once '
'it '
'is '
'out '
'of '
'the '
'queue '
'and '
'a '
'worker '
'thread '
'starts '
'executing '
'it. '
'As '
'some '
'queueing '
'is '
'acceptable '
'and '
'should '
'be '
'manageable '
'by '
'the '
'thread '
'pool, '
'we '
'try '
'to '
'estimate '
'the '
'resources '
'needed '
'to '
'handle '
'the '
'excess '
'queueing. '
'We '
'consider '
'up '
'to '
'30s '
'worth '
'of '
'tasks '
'in '
'the '
'queue '
'manageable '
'by '
'the '
'existing '
'number '
'of '
'workers '
'and '
'account '
'for '
'an '
'extra '
'thread '
'proportional '
'to '
'this '
'value. '
'For '
'example, '
'if '
'the '
'average '
'task '
'execution '
'time '
'is '
'200ms, '
'we '
'estimate '
'that '
'each '
'thread '
'is '
'able '
'to '
'handle '
'150 '
'indexing '
'requests '
'within '
'30s, '
'and '
'therefore '
'account '
'for '
'one '
'extra '
'thread '
'for '
'each '
'150 '
'queued '
'items. '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'q '
'u '
'e '
'u '
'e '
'_ '
's '
'i '
'z '
'e '
'× '
'a '
'v '
'e '
'r '
'a '
'g '
'e '
'_ '
'r '
'e '
'q '
'u '
'e '
's '
't '
'_ '
'e '
'x '
'e '
'c '
'u '
't '
'i '
'o '
'n '
'_ '
't '
'i '
'm '
'e '
'30 '
's '
'\\small '
'queued\\_ingestion\\_load '
'= '
'\\frac{queue\\_size '
'\\times '
'average\\_request\\_execution\\_time}{30s} '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'30 '
's '
'q '
'u '
'e '
'u '
'e '
'_ '
's '
'i '
'ze '
'× '
'a '
'v '
'er '
'a '
'g '
'e '
'_ '
're '
'q '
'u '
'es '
't '
'_ '
'e '
'x '
'ec '
'u '
't '
'i '
'o '
'n '
'_ '
't '
'im '
'e '
'\u200b '
'Note '
'that '
'since '
'the '
'indexing '
'nodes '
'rely '
'on '
'pushing '
'indexed '
'data '
'into '
'the '
'object '
'store '
'periodically, '
'we '
'do '
'not '
'need '
'to '
'scale '
'the '
'indexing '
'tier '
'based '
'on '
'the '
'total '
'size '
'of '
'the '
'indexed '
'data. '
'However, '
'the '
'disk '
'IO '
'requirements '
'of '
'the '
'indexing '
'workload '
'needs '
'to '
'be '
'considered '
'for '
'the '
'autoscaling '
'decisions. '
'The '
'ingestion '
'load '
'represents'},
{'embeddings': {'##d': 0.38506436,
'##est': 0.8363302,
'##frame': 0.039107077,
'##ing': 1.0441189,
'##ion': 1.1721121,
'##ler': 1.0595164,
'##ling': 0.99718106,
'##load': 0.8622203,
'##s': 0.26257822,
'##sca': 1.4883617,
'(': 0.04112861,
'120': 0.10787471,
'150': 1.5649581,
'200': 0.78864884,
'30': 1.3745978,
'300': 0.21148267,
'50': 0.031711366,
'500': 0.8493792,
'_': 0.24777141,
'accounting': 0.64968836,
'additional': 0.3232339,
'algorithm': 1.0360106,
'algorithms': 0.20798434,
'analysis': 0.25909927,
'analyze': 0.18533573,
'apache': 0.8096589,
'api': 1.3224775,
'approximate': 0.0154337585,
'array': 0.23401959,
'auto': 1.4535567,
'automatic': 0.7868701,
'availability': 0.21982048,
'available': 0.030020691,
'average': 0.098859586,
'basic': 0.2743477,
'blocking': 0.10501332,
'bot': 0.07765888,
'buffer': 0.36042303,
'calculate': 0.21506485,
'calculation': 0.81758976,
'capacity': 0.58354694,
'cassandra': 0.22208737,
'checkpoint': 0.031537656,
'chess': 0.6237735,
'class': 0.439471,
'clock': 0.54654706,
'cluster': 1.4933486,
'cod': 0.12783043,
'computation': 0.39954206,
'compute': 0.042445127,
'computer': 0.13797997,
'constant': 0.2067099,
'cpu': 0.5182024,
'crawl': 0.22104222,
'data': 0.51176333,
'database': 0.440294,
'determined': 0.23795621,
'disk': 0.5893501,
'e': 0.05990428,
'each': 0.46478215,
'equation': 0.008288982,
'er': 0.43452957,
'es': 0.14311427,
'estimate': 0.25439763,
'every': 0.1305604,
'execution': 0.7186893,
'exposed': 0.23602542,
'extra': 0.7385199,
'fixed': 0.11877214,
'forum': 0.3137529,
'fra': 1.0726693,
'fragment': 0.030604606,
'g': 0.026902322,
'gage': 0.12548852,
'guild': 0.27722847,
'handle': 0.8976072,
'handling': 0.69513077,
'heap': 0.26846212,
'hours': 0.7121461,
'http': 0.10318518,
'index': 1.6740144,
'indexed': 1.1180266,
'indices': 0.88624585,
'ing': 1.10228,
'integer': 0.2208937,
'inventory': 0.44952998,
'io': 0.85926545,
'item': 0.48019466,
'items': 0.7935411,
'java': 1.237859,
'lane': 0.39564016,
'length': 0.47680393,
'limit': 0.4967848,
'load': 1.2765044,
'loading': 0.25379905,
'm': 0.06343312,
'machine': 0.19301167,
'maintenance': 0.23043938,
'map': 0.07359305,
'mass': 0.08436136,
'master': 1.1724675,
'matching': 0.044185776,
'math': 0.71257645,
'max': 0.16343911,
'maximum': 0.8216195,
'mb': 0.74474645,
'measure': 0.22327076,
'memory': 1.4785702,
'metadata': 0.8341058,
'metric': 0.9043063,
'minimal': 0.36312523,
'minimum': 1.0762551,
'mining': 0.6374103,
'mp': 0.18194582,
'multi': 0.19790418,
'multiple': 0.08082614,
'n': 0.2315838,
'network': 0.5508067,
'node': 1.3963627,
'nodes': 0.73737425,
'number': 0.082121976,
'o': 0.11493757,
'object': 0.5812754,
'par': 0.023205614,
'per': 0.23101303,
'performance': 0.23446344,
'pool': 0.8049336,
'pools': 0.15594147,
'predict': 0.024841096,
'processing': 0.36487442,
'pushing': 0.20726342,
'q': 0.8291657,
'quarterly': 0.13623458,
'queue': 1.481917,
'rail': 0.078313634,
'ram': 0.28152135,
'rank': 0.3435108,
'ratio': 0.06241234,
're': 0.2784615,
'regional': 0.34884617,
'request': 0.99899644,
'requests': 0.99197084,
'requirement': 0.62241584,
'requirements': 0.674187,
'resolution': 0.02591185,
'routing': 0.19566713,
'scala': 0.17918167,
'scale': 0.15746343,
'seconds': 0.13917202,
'semi': 0.23686175,
'sequence': 0.5461212,
'ser': 0.08773902,
'serial': 0.29184434,
'server': 0.5091232,
'shards': 1.1462573,
'sid': 0.5460215,
'size': 0.5671189,
'small': 0.1666983,
'sort': 0.20719269,
'sql': 0.21473138,
'stack': 0.042597417,
'statistics': 0.019139726,
'storage': 0.11576759,
'strategy': 0.06358851,
'swarm': 0.08892168,
't': 0.15734711,
'task': 0.2625412,
'taylor': 0.059171513,
'thirty': 0.59235644,
'thread': 1.7254765,
'threads': 1.1326298,
'tier': 2.0103586,
'time': 0.5197543,
'times': 0.19328791,
'total': 0.9341554,
'trial': 1.0915743,
'ur': 0.041876547,
'value': 0.39162463,
'values': 0.10083909,
'wall': 0.93653333,
'web': 0.1397472,
'weeks': 0.027450949,
'within': 0.38789856,
'work': 0.1474287,
'workers': 0.30503651,
'write': 0.33134767,
'x': 0.027046092,
'z': 0.06591661,
'ze': 0.69916034},
'text': 'each '
'thread '
'is '
'able '
'to '
'handle '
'150 '
'indexing '
'requests '
'within '
'30s, '
'and '
'therefore '
'account '
'for '
'one '
'extra '
'thread '
'for '
'each '
'150 '
'queued '
'items. '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'i '
'n '
'g '
'e '
's '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'q '
'u '
'e '
'u '
'e '
'_ '
's '
'i '
'z '
'e '
'× '
'a '
'v '
'e '
'r '
'a '
'g '
'e '
'_ '
'r '
'e '
'q '
'u '
'e '
's '
't '
'_ '
'e '
'x '
'e '
'c '
'u '
't '
'i '
'o '
'n '
'_ '
't '
'i '
'm '
'e '
'30 '
's '
'\\small '
'queued\\_ingestion\\_load '
'= '
'\\frac{queue\\_size '
'\\times '
'average\\_request\\_execution\\_time}{30s} '
'q '
'u '
'e '
'u '
'e '
'd '
'_ '
'in '
'g '
'es '
't '
'i '
'o '
'n '
'_ '
'l '
'o '
'a '
'd '
'= '
'30 '
's '
'q '
'u '
'e '
'u '
'e '
'_ '
's '
'i '
'ze '
'× '
'a '
'v '
'er '
'a '
'g '
'e '
'_ '
're '
'q '
'u '
'es '
't '
'_ '
'e '
'x '
'ec '
'u '
't '
'i '
'o '
'n '
'_ '
't '
'im '
'e '
'\u200b '
'Note '
'that '
'since '
'the '
'indexing '
'nodes '
'rely '
'on '
'pushing '
'indexed '
'data '
'into '
'the '
'object '
'store '
'periodically, '
'we '
'do '
'not '
'need '
'to '
'scale '
'the '
'indexing '
'tier '
'based '
'on '
'the '
'total '
'size '
'of '
'the '
'indexed '
'data. '
'However, '
'the '
'disk '
'IO '
'requirements '
'of '
'the '
'indexing '
'workload '
'needs '
'to '
'be '
'considered '
'for '
'the '
'autoscaling '
'decisions. '
'The '
'ingestion '
'load '
'represents '
'both '
'CPU '
'requirements '
'of '
'the '
'indexing '
'nodes '
'as '
'well '
'as '
'disk '
'IO '
'since '
'both '
'CPU '
'and '
'IO '
'work '
'is '
'done '
'by '
'the '
'write '
'thread '
'pool '
'workers '
'and '
'we '
'rely '
'on '
'the '
'wall '
'clock '
'time '
'to '
'estimate '
'the '
'required '
'time '
'to '
'handle '
'the '
'queued '
'requests. '
'Each '
'indexing '
'node '
'calculates '
'its '
'ingestion '
'load '
'and '
'publishes '
'this '
'value '
'to '
'the '
'master '
'node '
'periodically. '
'The '
'master '
'node '
'serves '
'the '
'per '
'node '
'ingestion '
'load '
'values '
'via '
'the '
'autoscaling '
'metrics '
'API '
'to '
'the '
'autoscaler. '
'Memory '
'The '
'memory '
'metrics '
'exposed '
'by '
'the '
'autoscaling '
'metrics '
'API '
'are '
'node '
'memory '
'and '
'tier '
'memory. '
'The '
'node '
'memory '
'represents '
'the '
'minimum '
'memory '
'requirement '
'for '
'each '
'indexing '
'node '
'in '
'the '
'cluster. '
'The '
'tier '
'memory '
'metric '
'represents '
'the '
'minimum '
'total '
'memory '
'that '
'should '
'be '
'available '
'in '
'the '
'indexing '
'tier. '
'Note '
'that '
'these '
'values '
'only '
'indicate '
'the '
'minimum '
'to '
'ensure '
'that '
'each '
'node '
'is '
'able '
'to '
'handle '
'the '
'basic '
'indexing '
'workload '
'and '
'hold '
'the '
'cluster '
'and '
'indices '
'metadata, '
'while '
'ensuring '
'that '
'the '
'tier '
'includes '
'enough '
'nodes '
'to '
'accommodate '
'all '
'index '
'shards. '
'Node '
'memory '
'must '
'have '
'a '
'minimum '
'of '
'500MB '
'to '
'be '
'able '
'to '
'handle '
'indexing '
'workloads '
', '
'as '
'well '
'as '
'a '
'fixed '
'amount '
'of '
'memory '
'per '
'each '
'index '
'. '
'This '
'ensures '
'all '
'nodes '
'can '
'hold '
'metadata '
'for '
'the '
'cluster, '
'which '
'includes '
'metadata '
'for '
'every '
'index. '
'Tier '
'memory '
'is '
'determined '
'by '
'accounting '
'for '
'the '
'memory'},
{'embeddings': {'##d': 0.055720266,
'##est': 0.87620574,
'##ging': 0.12167851,
'##id': 0.007303444,
'##ing': 1.0664626,
'##ion': 0.5800176,
'##ler': 1.1925261,
'##ling': 1.0163201,
'##load': 0.81047934,
'##mb': 0.41285288,
'##rch': 0.9021695,
'##rd': 1.5396098,
'##rds': 0.47700712,
'##s': 0.033316635,
'##sca': 1.5766962,
'##sea': 1.0991455,
'500': 0.8151243,
'6': 0.5519658,
'accounting': 0.74103206,
'algorithm': 1.0231093,
'algorithms': 0.065428115,
'allocated': 0.19617477,
'amazon': 0.31502825,
'analysis': 0.5597703,
'analyze': 0.30770445,
'apache': 0.8908353,
'api': 1.1461797,
'approximate': 0.21645284,
'archive': 0.013153568,
'array': 0.047213156,
'auto': 1.3802772,
'automatic': 0.7499421,
'availability': 0.10610637,
'basic': 0.5700848,
'blocking': 0.03154505,
'bot': 0.2956401,
'brain': 0.13824557,
'brick': 0.34880513,
'broken': 0.1587869,
'buffer': 0.27810082,
'bug': 0.019329984,
'cad': 0.010832788,
'calculate': 0.71264565,
'calculated': 0.19991197,
'calculation': 0.90854484,
'capacity': 0.13310817,
'cassandra': 0.269642,
'checkpoint': 0.33004454,
'chess': 0.6517597,
'class': 0.40205157,
'clock': 1.2123855,
'cluster': 1.5899432,
'clusters': 0.21755162,
'computation': 0.3360238,
'compute': 0.15521479,
'computer': 0.4586727,
'computers': 0.09730453,
'core': 0.18051882,
'cores': 0.54003507,
'cpu': 1.4255431,
'data': 0.7048903,
'database': 0.5640705,
'depend': 0.08640857,
'deploy': 0.116062716,
'deployed': 0.16281521,
'deployment': 1.375697,
'dev': 0.16744493,
'disk': 1.2671278,
'display': 0.10427013,
'done': 0.057584852,
'each': 0.44890955,
'elastic': 1.3546548,
'estimate': 1.1541563,
'estimated': 0.4820726,
'estimates': 0.68956727,
'execution': 0.025004579,
'expose': 0.3791655,
'exposed': 1.4152902,
'exposing': 0.2018034,
'exposure': 0.22712028,
'field': 0.43335024,
'fixed': 0.3727484,
'fragment': 0.3541149,
'fragments': 0.19871251,
'framework': 0.0067325183,
'gage': 0.062432837,
'gb': 0.23573099,
'guild': 0.06864197,
'handle': 0.6664566,
'handling': 0.79544353,
'hardware': 0.15463935,
'hash': 0.056183893,
'host': 0.49334934,
'hours': 0.23847345,
'hu': 0.12027907,
'index': 1.84248,
'indexed': 0.5543888,
'indices': 0.8364849,
'ing': 1.1731079,
'integration': 0.43307945,
'interface': 0.13424914,
'inventory': 0.43660846,
'io': 1.1710184,
'java': 1.1948129,
'kb': 0.275635,
'lane': 0.065143116,
'lang': 0.07760714,
'length': 0.19545008,
'limit': 0.14939034,
'load': 1.068046,
'loading': 0.3452746,
'machine': 0.28579098,
'maintenance': 0.24792214,
'management': 0.016834572,
'mandatory': 0.09757359,
'map': 0.33999705,
'mapped': 0.4253768,
'mapping': 0.7739739,
'master': 1.514614,
'math': 0.62235314,
'maximum': 0.4592383,
'mb': 0.8386821,
'measure': 0.35868418,
'memory': 1.4037786,
'metadata': 0.57345796,
'metric': 1.0478114,
'minimal': 0.55310273,
'minimum': 1.1779544,
'mining': 0.60987383,
'monitor': 0.41601682,
'monitoring': 0.80379987,
'multiple': 0.0046412363,
'need': 0.13691676,
'needs': 0.09020152,
'network': 0.5226748,
'node': 1.5207812,
'nodes': 0.9873411,
'number': 0.08917359,
'o': 0.47437057,
'open': 0.9998891,
'operation': 0.059715636,
'parameters': 0.06929999,
'per': 1.2698478,
'performance': 0.27903107,
'pool': 1.1343037,
'pools': 0.5005684,
'predict': 0.15172759,
'processing': 0.34928247,
'processor': 0.06942589,
'provided': 0.33421612,
'published': 0.35502988,
'queue': 1.4328028,
'ram': 0.07832895,
'rank': 0.09849679,
'regional': 0.023943441,
'request': 0.58130133,
'requests': 0.4985438,
'require': 0.054292977,
'required': 0.20457663,
'requirement': 0.9255918,
'requirements': 1.1021699,
'resolution': 0.2503146,
'resource': 0.22062841,
'resources': 0.7977981,
'scala': 0.046379413,
'scale': 0.34393448,
'scaling': 0.5871495,
'script': 0.07091305,
'search': 0.2748066,
'semi': 0.19345926,
'sequence': 0.2634719,
'serial': 0.281783,
'serve': 0.3122354,
'server': 0.62030464,
'sha': 1.412181,
'shards': 1.2690446,
'sid': 0.5395205,
'size': 0.37528938,
'software': 0.2301807,
'sql': 0.28173122,
'storage': 0.17134488,
'sum': 0.48667532,
'swarm': 0.09873215,
'task': 0.15503421,
'thread': 1.2720325,
'threads': 0.5098314,
'tier': 2.0405457,
'time': 0.691699,
'timer': 0.3272765,
'total': 0.853305,
'trial': 0.75489986,
'value': 0.55824566,
'values': 0.18979663,
'wall': 1.5562296,
'walls': 0.57668746,
'web': 0.12833436,
'workers': 0.30275372,
'write': 0.8986184},
'text': 'both '
'CPU '
'requirements '
'of '
'the '
'indexing '
'nodes '
'as '
'well '
'as '
'disk '
'IO '
'since '
'both '
'CPU '
'and '
'IO '
'work '
'is '
'done '
'by '
'the '
'write '
'thread '
'pool '
'workers '
'and '
'we '
'rely '
'on '
'the '
'wall '
'clock '
'time '
'to '
'estimate '
'the '
'required '
'time '
'to '
'handle '
'the '
'queued '
'requests. '
'Each '
'indexing '
'node '
'calculates '
'its '
'ingestion '
'load '
'and '
'publishes '
'this '
'value '
'to '
'the '
'master '
'node '
'periodically. '
'The '
'master '
'node '
'serves '
'the '
'per '
'node '
'ingestion '
'load '
'values '
'via '
'the '
'autoscaling '
'metrics '
'API '
'to '
'the '
'autoscaler. '
'Memory '
'The '
'memory '
'metrics '
'exposed '
'by '
'the '
'autoscaling '
'metrics '
'API '
'are '
'node '
'memory '
'and '
'tier '
'memory. '
'The '
'node '
'memory '
'represents '
'the '
'minimum '
'memory '
'requirement '
'for '
'each '
'indexing '
'node '
'in '
'the '
'cluster. '
'The '
'tier '
'memory '
'metric '
'represents '
'the '
'minimum '
'total '
'memory '
'that '
'should '
'be '
'available '
'in '
'the '
'indexing '
'tier. '
'Note '
'that '
'these '
'values '
'only '
'indicate '
'the '
'minimum '
'to '
'ensure '
'that '
'each '
'node '
'is '
'able '
'to '
'handle '
'the '
'basic '
'indexing '
'workload '
'and '
'hold '
'the '
'cluster '
'and '
'indices '
'metadata, '
'while '
'ensuring '
'that '
'the '
'tier '
'includes '
'enough '
'nodes '
'to '
'accommodate '
'all '
'index '
'shards. '
'Node '
'memory '
'must '
'have '
'a '
'minimum '
'of '
'500MB '
'to '
'be '
'able '
'to '
'handle '
'indexing '
'workloads '
', '
'as '
'well '
'as '
'a '
'fixed '
'amount '
'of '
'memory '
'per '
'each '
'index '
'. '
'This '
'ensures '
'all '
'nodes '
'can '
'hold '
'metadata '
'for '
'the '
'cluster, '
'which '
'includes '
'metadata '
'for '
'every '
'index. '
'Tier '
'memory '
'is '
'determined '
'by '
'accounting '
'for '
'the '
'memory '
'overhead '
'of '
'the '
'field '
'mappings '
'of '
'the '
'indices '
'and '
'the '
'amount '
'of '
'memory '
'needed '
'for '
'each '
'open '
'shard '
'allocated '
'on '
'a '
'node '
'in '
'the '
'cluster. '
'Currently, '
'the '
'per-shard '
'memory '
'requirement '
'uses '
'a '
'fixed '
'estimate '
'of '
'6MB. '
'We '
'plan '
'to '
'refine '
'this '
'value. '
'The '
'estimate '
'for '
'the '
'memory '
'requirements '
'for '
'the '
'mappings '
'of '
'each '
'index '
'is '
'calculated '
'by '
'one '
'of '
'the '
'data '
'nodes '
'that '
'hosts '
'a '
'shard '
'of '
'the '
'index. '
'The '
'calculated '
'estimates '
'are '
'sent '
'to '
'the '
'master '
'node. '
'Whenever '
'there '
'is '
'a '
'mapping '
'change '
'this '
'estimate '
'is '
'updated '
'and '
'published '
'to '
'the '
'master '
'node '
'again. '
'The '
'master '
'node '
'serves '
'the '
'node '
'and '
'total '
'memory '
'metrics '
'based '
'on '
'these '
'information '
'via '
'the '
'autoscaling '
'metrics '
'API '
'to '
'the '
'autoscaler. '
'Scaling '
'the '
'cluster '
'The '
'autoscaler '
'is '
'responsible '
'for '
'monitoring '
'the '
'Elasticsearch '
'cluster '
'via '
'the '
'exposed '
'metrics, '
'calculating '
'the '
'desirable '
'cluster '
'size '
'to '
'adapt '
'to '
'the '
'indexing '
'workload, '
'and '
'updating '
'the '
'deployment '
'accordingly. '
'This '
'is '
'done '
'by '
'calculating '
'the '
'total '
'required '
'CPU '
'and '
'memory '
'resources '
'based '
'on '
'the '
'ingestion '
'load '
'and '
'memory '
'metrics. '
'The '
'sum '
'of '
'all '
'the '
'ingestion '
'load '
'per '
'node '
'values '
'determines '
'the '
'total '
'number '
'of '
'CPU '
'cores '
'needed '
'for '
'the '
'indexing '
'tier. '
'The '
'calculated '
'CPU '
'requirement '
'and '
'the '
'provided '
'minimum '
'node '
'and '
'tier '
'memory '
'resources '
'are '
'mapped '
'to '
'a '
'predetermined '
'set'},
{'embeddings': {'##ber': 0.9460652,
'##d': 0.10023495,
'##es': 0.14341043,
'##gb': 0.6906553,
'##ine': 0.9458122,
'##ing': 0.42145026,
'##ler': 1.2356958,
'##ling': 0.63835293,
'##load': 0.2904571,
'##mb': 0.6970242,
'##net': 0.7010928,
'##pu': 1.0257086,
'##rch': 1.0700952,
'##rd': 1.6493205,
'##rds': 0.6754141,
'##rt': 0.12942569,
'##sca': 1.4853197,
'##sea': 1.4192088,
'##vc': 1.405061,
'100': 0.26849923,
'16': 0.19268984,
'160': 0.2302431,
'1600': 0.8732733,
'32': 1.2120824,
'6': 0.70548016,
'64': 1.202607,
'algorithm': 0.937971,
'allocated': 0.73692024,
'allocation': 0.4625666,
'amazon': 0.86137766,
'analysis': 0.58160084,
'analyze': 0.023657316,
'apache': 0.85805637,
'api': 0.9369967,
'approximate': 0.15172462,
'auto': 1.225151,
'automatic': 0.7224918,
'availability': 0.3053787,
'bot': 0.33649588,
'brick': 0.28021842,
'buffer': 0.27807808,
'bug': 0.12689802,
'calculate': 0.56475216,
'calculated': 0.2805605,
'calculating': 0.18157567,
'calculation': 1.0562031,
'capacity': 0.19689727,
'certification': 0.030283952,
'checkpoint': 0.1251825,
'chess': 0.38721076,
'class': 0.044428803,
'closed': 0.20298174,
'cluster': 1.8217679,
'clusters': 0.40412048,
'computation': 0.27228907,
'compute': 0.157462,
'computer': 0.07424284,
'cores': 0.28018573,
'cpu': 0.874331,
'criteria': 0.20424062,
'cube': 0.078070216,
'currently': 0.26391146,
'data': 0.57366157,
'database': 0.5346718,
'deploy': 0.31853938,
'deployed': 0.23235346,
'deployment': 1.38996,
'desirable': 0.25084683,
'desired': 0.05757945,
'determine': 0.07967118,
'determined': 0.38774973,
'dimensions': 0.3834306,
'disk': 0.7686433,
'display': 0.044948753,
'domain': 0.05484484,
'each': 0.026949435,
'elastic': 1.7217911,
'equation': 0.07899539,
'estimate': 1.0816743,
'estimated': 0.2908085,
'estimates': 0.7743369,
'existing': 0.50358754,
'exposed': 0.91814655,
'field': 1.4176838,
'fields': 0.56111515,
'fixed': 0.653671,
'forest': 0.088545434,
'gage': 0.23066506,
'gb': 0.7216355,
'hardware': 0.5457616,
'honey': 0.13710178,
'host': 0.32896483,
'hu': 0.022061992,
'implement': 0.19801763,
'index': 1.5813339,
'indexed': 0.33440682,
'indicator': 0.07646061,
'indices': 1.0497515,
'ing': 0.44711637,
'integration': 0.38794386,
'inventory': 0.55072165,
'java': 1.0091366,
'kb': 0.31603098,
'ku': 1.2214607,
'largest': 0.55517995,
'length': 0.1961873,
'limit': 0.12602727,
'linear': 0.13019355,
'load': 0.7046929,
'map': 0.6723943,
'mapped': 0.6155787,
'mapping': 0.95820665,
'maps': 0.19839133,
'master': 1.3583598,
'math': 0.52316844,
'maximum': 0.17016214,
'mb': 0.8793483,
'measure': 0.37326512,
'memory': 1.3331418,
'metric': 0.9261499,
'minimum': 0.4176075,
'mining': 0.42999497,
'monitor': 0.34513482,
'monitoring': 0.6307714,
'multi': 0.3034215,
'network': 0.67814016,
'node': 1.2861586,
'nodes': 0.6710798,
'open': 1.3986069,
'optimal': 0.0624708,
'overhead': 0.69991654,
'parameters': 0.11732358,
'pattern': 0.005440311,
'per': 1.2889819,
'performance': 0.14103872,
'poll': 0.52450436,
'polling': 0.3777002,
'polls': 0.60389787,
'predict': 0.038165692,
'published': 0.06970011,
'radar': 0.004892402,
'ram': 0.1705884,
'rank': 0.1464829,
'ratio': 0.6063533,
'reconciliation': 0.4469912,
'ref': 0.5476266,
'requirement': 0.92776734,
'requirements': 1.1151919,
'resolution': 0.34558743,
'resource': 0.21023308,
'resources': 0.925664,
'scale': 1.1254972,
'scaled': 0.25958243,
'scaling': 1.3571583,
'scope': 0.007439173,
'script': 0.108936414,
'search': 0.4840181,
'serial': 0.38776705,
'server': 0.36229628,
'sha': 1.6222633,
'sid': 0.4845318,
'since': 0.0958648,
'size': 1.1212213,
'sizes': 0.8831621,
'software': 0.10655975,
'sort': 0.23242046,
'specification': 0.36318856,
'specifications': 0.36570984,
'storage': 0.16639474,
'swarm': 0.012647891,
'target': 0.097013876,
'tier': 1.3347368,
'total': 0.2700686,
'trial': 0.48382765,
'up': 0.009041203,
'value': 0.5148574,
'version': 0.00331044,
'vote': 0.19521642,
'voting': 0.32694972,
'web': 0.43445045,
'which': 0.22146864},
'text': 'overhead '
'of '
'the '
'field '
'mappings '
'of '
'the '
'indices '
'and '
'the '
'amount '
'of '
'memory '
'needed '
'for '
'each '
'open '
'shard '
'allocated '
'on '
'a '
'node '
'in '
'the '
'cluster. '
'Currently, '
'the '
'per-shard '
'memory '
'requirement '
'uses '
'a '
'fixed '
'estimate '
'of '
'6MB. '
'We '
'plan '
'to '
'refine '
'this '
'value. '
'The '
'estimate '
'for '
'the '
'memory '
'requirements '
'for '
'the '
'mappings '
'of '
'each '
'index '
'is '
'calculated '
'by '
'one '
'of '
'the '
'data '
'nodes '
'that '
'hosts '
'a '
'shard '
'of '
'the '
'index. '
'The '
'calculated '
'estimates '
'are '
'sent '
'to '
'the '
'master '
'node. '
'Whenever '
'there '
'is '
'a '
'mapping '
'change '
'this '
'estimate '
'is '
'updated '
'and '
'published '
'to '
'the '
'master '
'node '
'again. '
'The '
'master '
'node '
'serves '
'the '
'node '
'and '
'total '
'memory '
'metrics '
'based '
'on '
'these '
'information '
'via '
'the '
'autoscaling '
'metrics '
'API '
'to '
'the '
'autoscaler. '
'Scaling '
'the '
'cluster '
'The '
'autoscaler '
'is '
'responsible '
'for '
'monitoring '
'the '
'Elasticsearch '
'cluster '
'via '
'the '
'exposed '
'metrics, '
'calculating '
'the '
'desirable '
'cluster '
'size '
'to '
'adapt '
'to '
'the '
'indexing '
'workload, '
'and '
'updating '
'the '
'deployment '
'accordingly. '
'This '
'is '
'done '
'by '
'calculating '
'the '
'total '
'required '
'CPU '
'and '
'memory '
'resources '
'based '
'on '
'the '
'ingestion '
'load '
'and '
'memory '
'metrics. '
'The '
'sum '
'of '
'all '
'the '
'ingestion '
'load '
'per '
'node '
'values '
'determines '
'the '
'total '
'number '
'of '
'CPU '
'cores '
'needed '
'for '
'the '
'indexing '
'tier. '
'The '
'calculated '
'CPU '
'requirement '
'and '
'the '
'provided '
'minimum '
'node '
'and '
'tier '
'memory '
'resources '
'are '
'mapped '
'to '
'a '
'predetermined '
'set '
'of '
'cluster '
'sizes. '
'Each '
'cluster '
'size '
'determines '
'the '
'number '
'of '
'nodes '
'and '
'the '
'CPU, '
'memory '
'and '
'disk '
'size '
'of '
'each '
'node. '
'All '
'nodes '
'within '
'a '
'certain '
'cluster '
'size '
'have '
'the '
'same '
'hardware '
'specification. '
'There '
'is '
'a '
'fixed '
'ratio '
'between '
'CPU, '
'memory '
'and '
'disk, '
'thus '
'always '
'scaling '
'all '
'3 '
'resources '
'linearly. '
'The '
'existing '
'cluster '
'sizes '
'for '
'the '
'indexing '
'tier '
'are '
'based '
'on '
'node '
'sizes '
'starting '
'from '
'4GB/2vCPU/100GB '
'disk '
'to '
'64GB/32vCPU/1600GB '
'disk. '
'Once '
'the '
'Elasticsearch '
'cluster '
'scales '
'up '
'to '
'the '
'largest '
'node '
'size '
'(64GB '
'memory), '
'any '
'further '
'scale-up '
'adds '
'new '
'64GB '
'nodes, '
'allowing '
'a '
'cluster '
'to '
'scale '
'up '
'to '
'32 '
'nodes '
'of '
'64GB. '
'Note '
'that '
'this '
'is '
'not '
'a '
'hard '
'upper '
'bound '
'on '
'the '
'number '
'of '
'Elasticsearch '
'nodes '
'in '
'the '
'cluster '
'and '
'can '
'be '
'increased '
'if '
'necessary. '
'Every '
'5 '
'seconds '
'the '
'autoscaler '
'polls '
'metrics '
'from '
'the '
'master '
'node, '
'calculates '
'the '
'desirable '
'cluster '
'size '
'and '
'if '
'it '
'is '
'different '
'from '
'the '
'current '
'cluster '
'size, '
'it '
'updates '
'the '
'Elasticsearch '
'Kubernetes '
'Deployment '
'accordingly. '
'Note '
'that '
'the '
'actual '
'reconciliation '
'of '
'the '
'deployment '
'towards '
'the '
'desired '
'cluster '
'size '
'and '
'adding '
'and '
'removing '
'the '
'Elasticsearch '
'nodes '
'to '
'achieve '
'this '
'is '
'done '
'by '
'Kubernetes. '
'In '
'order '
'to '
'avoid '
'very '
'short-lived '
'changes '
'to '
'the'},
{'embeddings': {'##ber': 0.03804658,
'##es': 0.1512185,
'##gb': 0.6443679,
'##hi': 0.36000288,
'##ika': 0.07467539,
'##ing': 0.6129379,
'##ler': 1.1574837,
'##less': 0.5735957,
'##ling': 1.1661593,
'##load': 0.62337583,
'##net': 0.58226395,
'##oya': 1.7074469,
'##pu': 1.1345644,
'##rch': 1.0119687,
'##sca': 1.5153302,
'##sea': 1.4253823,
'##vc': 1.4631956,
'100': 0.55265766,
'15': 0.052379817,
'16': 0.33394203,
'160': 0.118766,
'1600': 0.8028694,
'32': 1.1772103,
'4': 0.16181825,
'64': 1.4588842,
'algorithm': 0.94727564,
'always': 0.38941032,
'amazon': 0.89331883,
'analysis': 0.4050502,
'analyze': 0.023668261,
'andersen': 0.49676144,
'apache': 0.80054885,
'ariel': 0.4422102,
'auto': 1.2729144,
'automatic': 0.7698037,
'automatically': 0.04643825,
'availability': 0.49544457,
'available': 0.19981025,
'blog': 0.50581634,
'boat': 0.4211383,
'bot': 0.44343898,
'bug': 0.16439897,
'calculate': 0.44946215,
'calculating': 0.21078831,
'calculation': 0.91136605,
'calculations': 0.35172287,
'capacity': 0.32551798,
'certification': 0.96537966,
'certified': 0.86568826,
'change': 0.091490604,
'checkpoint': 0.13703609,
'chess': 0.30361477,
'class': 0.12189255,
'cloud': 0.36273655,
'cluster': 2.1554685,
'clusters': 0.84253734,
'competition': 0.0070358375,
'component': 0.16093102,
'components': 0.688979,
'computation': 0.0109849,
'computer': 0.37449652,
'computers': 0.29611063,
'constant': 0.21192689,
'cpu': 0.9483953,
'crawl': 0.061979044,
'data': 0.29847682,
'database': 0.53361094,
'define': 0.30592072,
'deployment': 1.1050912,
'desirable': 0.28776327,
'determination': 0.25265238,
'determine': 0.4538456,
'determined': 0.5666302,
'determines': 0.02666208,
'dimensions': 0.43506965,
'disadvantage': 0.40544793,
'disk': 1.0043706,
'domain': 0.08386699,
'down': 1.1079221,
'each': 0.20502539,
'elastic': 2.0313072,
'engineer': 0.41261968,
'engineering': 0.43656224,
'existing': 0.82118076,
'expensive': 0.10213457,
'factors': 0.04067958,
'fernandez': 1.1611929,
'fixed': 0.6458474,
'forest': 0.07132318,
'francisco': 1.0563725,
'garcia': 0.13344267,
'gb': 0.6862939,
'global': 0.0054082987,
'hardware': 0.7944886,
'hen': 0.9853478,
'honey': 0.081156164,
'hour': 0.0074544367,
'hours': 0.24539681,
'hu': 0.06941744,
'implement': 0.23772681,
'implementation': 0.07986039,
'improve': 0.2981144,
'increase': 0.7570058,
'increasing': 0.25063965,
'index': 1.358504,
'indexed': 0.29916498,
'ing': 0.49232894,
'integration': 0.20372295,
'inventory': 0.49392712,
'java': 0.96544707,
'jose': 0.014233379,
'ku': 1.0064884,
'large': 0.009199611,
'largest': 0.5853634,
'latest': 0.075750045,
'learning': 0.14278692,
'length': 0.2575359,
'limit': 0.27284575,
'linear': 0.99686086,
'load': 0.78078943,
'loading': 0.09809506,
'log': 0.053032227,
'lopez': 0.37077188,
'machine': 0.1154489,
'maintenance': 0.24795005,
'management': 0.28454626,
'map': 0.12368915,
'master': 1.0599743,
'math': 0.39245087,
'maximum': 0.37043598,
'mb': 0.65867126,
'measure': 0.401138,
'mechanism': 0.5363481,
'memory': 1.0781962,
'metric': 0.9361899,
'mining': 0.4610803,
'minute': 0.7122368,
'minutes': 0.03330799,
'multiple': 0.28440112,
'network': 0.70334154,
'new': 0.36585885,
'node': 1.1508181,
'nodes': 0.6786249,
'number': 0.46848533,
'online': 0.10060778,
'operation': 0.013929884,
'optimal': 0.052087568,
'overhead': 0.12910955,
'performance': 0.10508823,
'po': 0.030801829,
'poll': 0.032789562,
'polling': 0.08606442,
'polls': 0.31255096,
'predict': 0.038815167,
'process': 0.32648584,
'processing': 0.13010792,
'quan': 0.30870175,
'rank': 0.23912333,
'ratio': 1.1149174,
'ratios': 0.17480499,
'ready': 0.7220055,
'reconciliation': 0.03476886,
'reduce': 0.48650545,
'regulation': 0.14490134,
'requirements': 0.26383802,
'resource': 0.48044914,
'resources': 0.99925154,
'sale': 0.23320372,
'same': 0.04602473,
'scala': 0.34763098,
'scale': 1.3520039,
'scaled': 0.373489,
'scales': 0.23150739,
'scaling': 1.3547646,
'scope': 0.24351352,
'sea': 0.012636473,
'search': 0.5437506,
'seconds': 0.21717648,
'serial': 0.084758565,
'server': 0.66100806,
'si': 0.13631321,
'sid': 0.4065147,
'size': 1.4813008,
'sizes': 1.1315687,
'software': 0.053653706,
'sort': 0.34857363,
'specification': 0.47748893,
'specifications': 0.54209507,
'square': 0.0464906,
'storage': 0.2826658,
'strategy': 0.105019435,
'swarm': 0.08799058,
'three': 0.0456386,
'tier': 1.2590698,
'torre': 0.033106416,
'total': 0.15115097,
'trainer': 0.28730983,
'training': 0.91525143,
'trial': 0.40092948,
'unit': 0.12670164,
'up': 0.48489103,
'user': 0.5006898,
'users': 0.35868,
'vote': 0.16288216,
'voting': 0.2478986,
'web': 0.44947043},
'text': 'of '
'cluster '
'sizes. '
'Each '
'cluster '
'size '
'determines '
'the '
'number '
'of '
'nodes '
'and '
'the '
'CPU, '
'memory '
'and '
'disk '
'size '
'of '
'each '
'node. '
'All '
'nodes '
'within '
'a '
'certain '
'cluster '
'size '
'have '
'the '
'same '
'hardware '
'specification. '
'There '
'is '
'a '
'fixed '
'ratio '
'between '
'CPU, '
'memory '
'and '
'disk, '
'thus '
'always '
'scaling '
'all '
'3 '
'resources '
'linearly. '
'The '
'existing '
'cluster '
'sizes '
'for '
'the '
'indexing '
'tier '
'are '
'based '
'on '
'node '
'sizes '
'starting '
'from '
'4GB/2vCPU/100GB '
'disk '
'to '
'64GB/32vCPU/1600GB '
'disk. '
'Once '
'the '
'Elasticsearch '
'cluster '
'scales '
'up '
'to '
'the '
'largest '
'node '
'size '
'(64GB '
'memory), '
'any '
'further '
'scale-up '
'adds '
'new '
'64GB '
'nodes, '
'allowing '
'a '
'cluster '
'to '
'scale '
'up '
'to '
'32 '
'nodes '
'of '
'64GB. '
'Note '
'that '
'this '
'is '
'not '
'a '
'hard '
'upper '
'bound '
'on '
'the '
'number '
'of '
'Elasticsearch '
'nodes '
'in '
'the '
'cluster '
'and '
'can '
'be '
'increased '
'if '
'necessary. '
'Every '
'5 '
'seconds '
'the '
'autoscaler '
'polls '
'metrics '
'from '
'the '
'master '
'node, '
'calculates '
'the '
'desirable '
'cluster '
'size '
'and '
'if '
'it '
'is '
'different '
'from '
'the '
'current '
'cluster '
'size, '
'it '
'updates '
'the '
'Elasticsearch '
'Kubernetes '
'Deployment '
'accordingly. '
'Note '
'that '
'the '
'actual '
'reconciliation '
'of '
'the '
'deployment '
'towards '
'the '
'desired '
'cluster '
'size '
'and '
'adding '
'and '
'removing '
'the '
'Elasticsearch '
'nodes '
'to '
'achieve '
'this '
'is '
'done '
'by '
'Kubernetes. '
'In '
'order '
'to '
'avoid '
'very '
'short-lived '
'changes '
'to '
'the '
'cluster '
'size, '
'we '
'account '
'for '
'a '
'10% '
'headroom '
'when '
'calculating '
'the '
'desired '
'cluster '
'size '
'during '
'a '
'scale '
'down '
'and '
'a '
'scale '
'down '
'takes '
'effect '
'only '
'if '
'all '
'desired '
'cluster '
'size '
'calculations '
'within '
'the '
'past '
'15 '
'minute '
'have '
'indicated '
'a '
'scale-down. '
'Currently, '
'the '
'time '
'that '
'it '
'takes '
'for '
'an '
'increase '
'in '
'the '
'metrics '
'to '
'lead '
'to '
'the '
'first '
'Elasticsearch '
'node '
'being '
'added '
'to '
'the '
'cluster '
'and '
'ready '
'to '
'process '
'indexing '
'load '
'is '
'under '
'1 '
'minute. '
'Conclusion '
'In '
'this '
'blog '
'post, '
'we '
'explained '
'how '
'ingest '
'autoscaling '
'works '
'in '
'Elasticsearch, '
'the '
'different '
'components '
'involved, '
'and '
'the '
'metrics '
'used '
'to '
'quantify '
'the '
'resources '
'needed '
'to '
'handle '
'the '
'indexing '
'workload. '
'We '
'believe '
'that '
'such '
'an '
'autoscaling '
'mechanism '
'is '
'crucial '
'to '
'reduce '
'the '
'operational '
'overhead '
'of '
'an '
'Elasticsearch '
'cluster '
'for '
'the '
'users '
'by '
'automatically '
'increasing '
'the '
'available '
'resources '
'in '
'the '
'cluster '
'when '
'necessary. '
'Furthermore, '
'it '
'leads '
'to '
'cost '
'reduction '
'by '
'scaling '
'down '
'the '
'cluster '
'when '
'the '
'available '
'resources '
'in '
'the '
'cluster '
'are '
'not '
'required '
'anymore. '
'Ready '
'to '
'try '
'this '
'out '
'on '
'your '
'own? '
'Start '
'a '
'free '
'trial '
'. '
'Want '
'to '
'get '
'Elastic '
'certified? '
'Find '
'out '
'when '
'the '
'next '
'Elasticsearch '
'Engineer '
'training '
'is '
'running! '
'Pooya '
'Salehi '
'Henning '
'Andersen '
'Francisco '
'Fernández '
'Castaño '
'11 '
'min '
'read '
'29 '
'July '
'2024 '
'Elastic '
'Cloud '
'Serverless '
'Share '
'Twitter '
'Facebook '
'LinkedIn '
'Recommended '
'Articles '
'Elastic '
'Cloud'},
{'embeddings': {'##4': 0.5609497,
'##down': 0.011559885,
'##est': 1.1421111,
'##hi': 0.0060656513,
'##ing': 0.48465544,
'##ler': 0.12595108,
'##less': 1.3963115,
'##lessly': 0.76121324,
'##ling': 1.03232,
'##load': 0.6918682,
'##oya': 0.56508857,
'##rch': 0.94580704,
'##room': 1.397477,
'##sca': 1.4164101,
'##sea': 1.4075159,
'10': 0.005647892,
'15': 1.0004816,
'16': 0.0726173,
'202': 0.79451597,
'account': 0.054787852,
'accounting': 0.2977837,
'advantage': 0.13797385,
'after': 0.04746113,
'algorithm': 0.84724355,
'amazon': 0.7599511,
'analysis': 0.4048887,
'analyze': 0.12881227,
'andersen': 0.091110215,
'anya': 0.031511437,
'apache': 0.8387389,
'architect': 0.57877886,
'archive': 0.027499544,
'august': 0.523268,
'auto': 1.4506402,
'automatic': 0.94025064,
'availability': 0.348747,
'available': 0.05306761,
'blog': 0.8397168,
'bot': 0.38508278,
'bug': 0.1267487,
'build': 0.776895,
'building': 0.7504958,
'built': 0.19563165,
'calculate': 0.3598465,
'calculating': 0.11605539,
'calculation': 0.8540975,
'calculations': 0.57275534,
'capacity': 0.3109483,
'cave': 0.29021654,
'certification': 0.64684826,
'certified': 0.26541537,
'checkpoint': 0.06267695,
'chess': 0.22270066,
'class': 0.044449553,
'client': 0.05088419,
'cloud': 0.9856347,
'cluster': 1.8377897,
'clustered': 0.18159664,
'clusters': 0.79538465,
'collapse': 0.29267746,
'component': 0.012821147,
'components': 0.50653857,
'computer': 0.22416146,
'cost': 0.06086615,
'crawl': 0.27863678,
'data': 0.23600358,
'database': 0.386357,
'decrease': 0.29198787,
'deployment': 0.4085412,
'desired': 0.04168813,
'development': 0.0050133946,
'dimensions': 0.10934332,
'disadvantage': 0.33458805,
'domain': 0.16470446,
'down': 1.343148,
'downs': 0.2709486,
'drop': 0.19782026,
'during': 0.4177895,
'effect': 0.39730436,
'elastic': 1.9854976,
'engineer': 0.58167315,
'engineering': 0.5884908,
'ensemble': 0.007619722,
'facebook': 0.3225428,
'fernandez': 0.42895493,
'fifteen': 0.10546452,
'first': 0.50220585,
'forest': 0.14911638,
'framework': 0.047809396,
'free': 0.3561092,
'global': 0.09408311,
'group': 0.14574468,
'handling': 0.30345336,
'head': 0.117694445,
'hour': 0.3250166,
'hours': 0.70438623,
'implement': 0.13235687,
'implementation': 0.13236406,
'important': 0.055658367,
'improve': 0.2550515,
'increase': 0.74923754,
'increasing': 0.3597461,
'index': 1.4273754,
'indexed': 0.2932871,
'ing': 1.2874681,
'introduced': 0.10785041,
'inventory': 0.65916276,
'java': 0.88944626,
'july': 0.14186577,
'large': 0.06278902,
'latest': 0.068817586,
'learning': 0.12424224,
'length': 0.030345708,
'limit': 0.14073928,
'load': 1.0610044,
'loading': 0.39865428,
'loss': 0.11432742,
'machine': 0.029201662,
'maintenance': 0.15768714,
'management': 0.31734702,
'math': 0.406777,
'maximum': 0.13483465,
'measure': 0.5081328,
'mechanism': 0.8204686,
'memory': 1.0461255,
'metric': 0.9943368,
'mining': 0.5402124,
'minute': 0.92393905,
'minutes': 0.3759728,
'moment': 0.11160666,
'morris': 0.060925715,
'network': 0.51853234,
'node': 0.99145895,
'online': 0.36771652,
'operation': 0.28533393,
'overhead': 0.086819395,
'patience': 0.11310515,
'perfect': 0.12382903,
'performance': 0.06312573,
'process': 0.5356137,
'processing': 0.55718875,
'production': 0.05736718,
'project': 0.14496073,
'prototype': 0.31378728,
'quan': 0.22408743,
'ready': 0.25202373,
'reduce': 0.5264253,
'reduction': 0.037918843,
'research': 0.0142833255,
'resource': 0.09839988,
'resources': 0.7532266,
'rights': 0.08338795,
'room': 0.84089494,
'rs': 0.47752637,
'scala': 0.17796026,
'scale': 1.6349432,
'scaled': 0.39957505,
'scales': 0.24761787,
'scaling': 1.3751862,
'scope': 0.009172562,
'search': 0.6669978,
'seconds': 0.11594447,
'serial': 0.21314114,
'server': 1.1875997,
'servers': 0.3761195,
'share': 0.21588095,
'shrink': 0.08177304,
'si': 0.039096646,
'sid': 0.26323187,
'site': 0.27832702,
'size': 1.2518198,
'sizes': 0.68347317,
'small': 0.021309003,
'software': 0.21712899,
'sort': 0.46309024,
'step': 0.13614927,
'storage': 0.33423752,
'strategy': 0.2746019,
'swarm': 0.18959516,
'task': 0.12210263,
'time': 0.3716685,
'traffic': 0.0044686934,
'training': 0.56078845,
'trial': 0.30781624,
'tutor': 0.18126883,
'twitter': 0.7352328,
'useful': 0.07486964,
'user': 0.61840165,
'users': 0.5178945,
'wait': 0.12994274,
'weaving': 0.09568315,
'web': 0.3402482,
'website': 0.17116618,
'work': 0.38590312,
'working': 0.040917397,
'works': 0.2640411,
'years': 0.057129644},
'text': 'cluster '
'size, '
'we '
'account '
'for '
'a '
'10% '
'headroom '
'when '
'calculating '
'the '
'desired '
'cluster '
'size '
'during '
'a '
'scale '
'down '
'and '
'a '
'scale '
'down '
'takes '
'effect '
'only '
'if '
'all '
'desired '
'cluster '
'size '
'calculations '
'within '
'the '
'past '
'15 '
'minute '
'have '
'indicated '
'a '
'scale-down. '
'Currently, '
'the '
'time '
'that '
'it '
'takes '
'for '
'an '
'increase '
'in '
'the '
'metrics '
'to '
'lead '
'to '
'the '
'first '
'Elasticsearch '
'node '
'being '
'added '
'to '
'the '
'cluster '
'and '
'ready '
'to '
'process '
'indexing '
'load '
'is '
'under '
'1 '
'minute. '
'Conclusion '
'In '
'this '
'blog '
'post, '
'we '
'explained '
'how '
'ingest '
'autoscaling '
'works '
'in '
'Elasticsearch, '
'the '
'different '
'components '
'involved, '
'and '
'the '
'metrics '
'used '
'to '
'quantify '
'the '
'resources '
'needed '
'to '
'handle '
'the '
'indexing '
'workload. '
'We '
'believe '
'that '
'such '
'an '
'autoscaling '
'mechanism '
'is '
'crucial '
'to '
'reduce '
'the '
'operational '
'overhead '
'of '
'an '
'Elasticsearch '
'cluster '
'for '
'the '
'users '
'by '
'automatically '
'increasing '
'the '
'available '
'resources '
'in '
'the '
'cluster '
'when '
'necessary. '
'Furthermore, '
'it '
'leads '
'to '
'cost '
'reduction '
'by '
'scaling '
'down '
'the '
'cluster '
'when '
'the '
'available '
'resources '
'in '
'the '
'cluster '
'are '
'not '
'required '
'anymore. '
'Ready '
'to '
'try '
'this '
'out '
'on '
'your '
'own? '
'Start '
'a '
'free '
'trial '
'. '
'Want '
'to '
'get '
'Elastic '
'certified? '
'Find '
'out '
'when '
'the '
'next '
'Elasticsearch '
'Engineer '
'training '
'is '
'running! '
'Pooya '
'Salehi '
'Henning '
'Andersen '
'Francisco '
'Fernández '
'Castaño '
'11 '
'min '
'read '
'29 '
'July '
'2024 '
'Elastic '
'Cloud '
'Serverless '
'Share '
'Twitter '
'Facebook '
'LinkedIn '
'Recommended '
'Articles '
'Elastic '
'Cloud '
'Serverless '
'• '
'15 '
'May '
'2024 '
'Building '
'Elastic '
'Cloud '
'Serverless '
'Explore '
'the '
'architectural '
'decisions '
'we '
'made '
'along '
'the '
'journey '
'of '
'building '
'Elastic '
'Cloud '
'Serverless. '
'Jason '
'Tedor '
'Pooya '
'Salehi '
'Henning '
'Andersen '
'Francisco '
'Fernández '
'Castaño '
'11 '
'min '
'read '
'29 '
'July '
'2024 '
'Elastic '
'Cloud '
'Serverless '
'Share '
'Twitter '
'Facebook '
'LinkedIn '
'Jump '
'to '
'Ingest '
'autoscaling '
'overview '
'Metrics '
'Ingestion '
'load '
'Memory '
'Scaling '
'the '
'cluster '
'Show '
'more '
'Sitemap '
'RSS '
'Feed '
'Search '
'Labs '
'Repo '
'Elastic.co '
'©2024. '
'Elasticsearch '
'B.V. '
'All '
'Rights '
'Reserved.'}],
'inference_id': 'my-elser-model',
'model_settings': {'task_type': 'sparse_embedding'}}},
'title': 'Elasticsearch ingest autoscaling — '
'Search Labs',
'url': 'https://www.elastic.co/search-labs/blog/elasticsearch-ingest-autoscaling',
'url_host': 'www.elastic.co',
'url_path': '/search-labs/blog/elasticsearch-ingest-autoscaling',
'url_path_dir1': 'search-labs',
'url_path_dir2': 'blog',
'url_path_dir3': 'elasticsearch-ingest-autoscaling',
'url_port': 443,
'url_scheme': 'https'}}],
'max_score': 1.2861483,
'total': {'relation': 'eq', 'value': 228}},
'timed_out': False,
'took': 2}