Elastic Huggingface Integration Millions Of Documents With Cohere Reranking

Huggingface Integration Millions Of Documents With Cohere Reranking

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIhugging-faceintegrationschatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

alph-notebooks/elasticsearch-labs / huggingface-integration-millions-of-documents-with-cohere-reranking.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Semantic Search using the Inference API with the Hugging Face Inference Endpoints Service

Learn how to use the Inference API with the Hugging Face Inference Endpoint service for semantic search.

🧰 Requirements

For this example, you will need:

An Elastic deployment:
- We'll be using Elastic serverless for this example (available with a free trial)
Elasticsearch 8.14 or above.
A paid Hugging Face Inference Endpoint is required to use the Inference API with the Hugging Face Inference Endpoint service.

Create Elastic Cloud deployment or serverless project

If you don't have an Elastic Cloud deployment, sign up here for a free trial.

Install packages and connect with Elasticsearch Client

To get started, we'll need to connect to our Elastic deployment using the Python client (version 8.12.0 or above). Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.

First we need to pip install the following packages:

elasticsearch

[ ]

Next, we need to import the modules we need. 🔐 NOTE: getpass enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

[1]

Now we can instantiate the Python Elasticsearch client.

First we prompt the user for their password and Cloud ID. Then we create a client object that instantiates an instance of the Elasticsearch class.

[2]

Test the Client

Before you continue, confirm that the client has connected with this test.

[3]

{'name': 'serverless', 'cluster_name': 'd3ae40d244564c39961aa942d9d47f84', 'cluster_uuid': 'poKWeRbiS--nyD43R_NROw', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'}

Refer to the documentation to learn how to connect to a self-managed deployment.

Read this page to learn how to connect using API keys.

Create the inference endpoint object

Let's create the inference endpoint by using the Create inference API.

You'll need an Hugging Face API key (access token) for this that you can find in your Hugging Face account under the Access Tokens.

You will also need to have created a Hugging Face Inference Endpoint service instance and noted the url of your instance. For this notebook, we deployed the multilingual-e5-small model.

[5]

ObjectApiResponse({'inference_id': 'my_hf_endpoint_object', 'task_type': 'text_embedding', 'service': 'hugging_face', 'service_settings': {'url': 'https://yb0j0ol2xzvro0oc.us-east-1.aws.endpoints.huggingface.cloud', 'similarity': 'dot_product', 'dimensions': 384, 'rate_limit': {'requests_per_minute': 3000}}, 'task_settings': {}})

[6]

ObjectApiResponse({'text_embedding': [{'embedding': [0.026027203, -0.011120652, -0.048804738, -0.108695105, 0.06134937, -0.003066093, 0.053232085, 0.103629395, 0.046043355, 0.0055427994, 0.036174323, 0.022110537, 0.084891565, -0.008215214, -0.017915571, 0.041923355, 0.048264034, -0.0404355, -0.02609504, -0.023076748, 0.0077286777, 0.023034474, 0.010379155, 0.06257496, 0.025658935, 0.040398516, -0.059809092, 0.032451782, 0.020798752, -0.053219322, -0.0447653, -0.033474423, 0.085040554, -0.051343303, 0.081006914, 0.026895791, -0.031822708, -0.06217641, 0.069435075, -0.055062667, -0.014967285, -0.0040517864, 0.03874908, 0.07854211, 0.017526977, 0.040629108, -0.023190023, 0.056913305, -0.06422566, -0.009403182, -0.06666503, 0.035270344, 0.004515737, 0.07347306, 0.011125566, -0.07184689, -0.08095445, -0.04214626, -0.108447045, -0.019494658, 0.06303337, 0.019757038, -0.014584281, 0.060923614, 0.06465893, 0.108431116, 0.04072316, 0.03705652, -0.06975359, -0.050562095, -0.058487326, 0.05989619, 0.008454561, -0.02706363, -0.017974045, 0.030698266, 0.046484154, -0.06212431, 0.009513307, -0.056369964, -0.052940592, -0.05834985, -0.02096531, 0.03910419, -0.054484386, 0.06231919, 0.044607673, -0.064030685, 0.067746714, -0.0291515, 0.06992093, 0.06300958, -0.07530936, -0.06167211, -0.0681666, -0.042375665, -0.05200085, 0.058336657, 0.039630838, -0.03444309, 0.030615594, -0.042388055, 0.03127304, -0.059075136, -0.05925558, 0.019864058, 0.0311022, -0.11285156, 0.02264027, -0.0676216, 0.011842404, -0.0157365, 0.06580391, 0.023665493, -0.05072435, -0.039492164, -0.06390325, -0.067074455, 0.032680944, -0.05243909, 0.06721114, -0.005195616, -0.0458316, -0.046202496, -0.07942237, -0.011754681, 0.026515028, 0.04761297, 0.08130492, 0.0118014645, 0.025956452, 0.039976373, 0.050196614, 0.052609406, 0.063223615, 0.06121741, -0.028745022, 0.0008677591, 0.038760003, -0.021240402, -0.073974326, 0.0548761, -0.047403768, 0.025582938, 0.0585596, 0.056284837, 0.08381001, -0.02149303, 0.09447917, -0.04940235, 0.018470071, -0.044996567, 0.08062048, 0.05162519, 0.053831138, -0.052980945, -0.08226773, -0.068137355, 0.028439872, 0.049932946, -0.07633764, -0.08649836, -0.07108301, 0.017650153, -0.065348, -0.038191773, 0.040068675, 0.05870959, -0.04707911, -0.04340612, -0.044621766, 0.030800574, -0.042227603, 0.0604754, 0.010891958, 0.057460006, -0.046362966, 0.046009373, 0.07293652, 0.09398854, -0.017035728, -0.010618687, -0.09326647, -0.03877647, -0.026517635, -0.047411792, -0.073266074, 0.033911563, 0.0642687, -0.02208107, 0.0040624263, -0.003194478, -0.082016475, -0.088730805, -0.084694624, -0.03364641, -0.05026475, 0.051665384, 0.058177516, 0.02759865, -0.034461632, 0.0027396793, 0.013807217, 0.040009033, 0.06346369, 0.05832441, -0.07451158, 0.028601868, -0.022494016, 0.04229324, 0.027883757, -0.0673137, -0.07119014, 0.047188714, -0.033077974, -0.028302893, -0.028704679, 0.043902606, -0.05147592, 0.045782477, 0.08077521, -0.01782404, 0.0242885, -0.0711172, -0.023565968, 0.041291755, 0.084907316, -0.101972945, -0.038989857, 0.025122978, -0.014144972, -0.010975231, -0.0357049, -0.09243826, -0.023552464, -0.08525497, -0.018912667, 0.049455214, 0.06532829, -0.031223357, -0.013451132, -0.00037671064, 0.04600707, -0.057603396, 0.08035837, -0.026429964, -0.0962299, 0.022606302, -0.0116137, 0.062264528, 0.033446472, -0.06123555, -0.09909991, -0.07459225, -0.018707436, 0.028753517, 0.06808565, 0.023965191, -0.04717076, 0.026551146, 0.019655682, -0.009233348, 0.10465723, 0.046420176, 0.03295103, 0.053024694, -0.03854051, -0.0058735567, -0.061238136, -0.048678573, -0.05362055, 0.048028357, 0.003013557, -0.06505121, -0.020536456, -0.020093206, 0.014102229, 0.10254222, -0.027084326, -0.061477777, 0.03478813, -0.00029115603, 0.053552967, 0.056773122, 0.048566766, 0.027371235, -0.015398839, 0.0511229, -0.03932426, -0.043879736, -0.03872225, -0.08171432, 0.01703992, -0.04535995, 0.03194781, 0.011413799, 0.036786903, 0.021306055, -0.06722324, 0.034231987, -0.027529748, -0.059552487, 0.050244797, 0.08905617, -0.071323626, 0.05047076, 0.003429174, 0.034673557, 0.009984501, 0.056842286, 0.0683513, 0.023990847, -0.04053898, -0.022724004, 0.026175855, 0.027319307, -0.055451974, -0.053907238, -0.05359307, -0.035025068, -0.03776361, -0.02973751, -0.037610233, -0.051089168, 0.04428633, 0.06276192, -0.03754498, -0.060270913, 0.043127347, 0.016669549, 0.024885416, -0.027190097, -0.011614101, 0.077848606, -0.007924398, -0.061833344, -0.015071012, 0.023127502, -0.07634841, -0.015780756, 0.031652045, 0.0031123296, -0.032643825, 0.05640234, -0.02685534, -0.04942714, 0.048498664, 0.00043902535, -0.043975227, 0.017389799, 0.07734344, -0.090009265, 0.019997133, 0.10055134, -0.05671741, 0.048755262, -0.02514076, -0.011394784, 0.049053214, 0.04264309, -0.06451125, -0.029034287, 0.07762039, 0.06809162, 0.059983794, 0.035379365, -0.007960272, 0.019705113, -0.02518122, -0.05767321, 0.038523413, 0.081652805, -0.032829504, -0.0023197657, -0.018218426, -0.0885769, -0.094963886, 0.057851806, -0.041729856, -0.045802936, 0.0570079, 0.047811687, 0.017810043, 0.09373594]}]})

IMPORTANT: If you use Elasticsearch 8.12, you must change inference_id in the snippet above to model_id!

Create an ingest pipeline with an inference processor

Create an ingest pipeline with an inference processor by using the put_pipeline method. Reference the inference_id created above as model_id to infer on the data that is being ingested by the pipeline.

[7]

ObjectApiResponse({'acknowledged': True})

Let's note a few important parameters from that API call:

inference: A processor that performs inference using a machine learning model.
model_id: Specifies the ID of the inference endpoint to be used. In this example, the inference ID is set to my_hf_endpoint_object. Use the inference ID you defined when created the inference task.
input_output: Specifies input and output fields.
input_field: Field name from which the dense_vector representation is created.
output_field: Field name which contains inference results.

Create index

The mapping of the destination index - the index that contains the embeddings that the model will create based on your input text - must be created. The destination index must have a field with the dense_vector field type to index the output of the model we deployed in Hugging Face (multilingual-e5-small).

Let's create an index named hf-endpoint-index with the mappings we need.

[8]

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'hf-endpoint-index'})

If you are using Elasticsearch serverless or v8.15+ then you will have access to the new `semantic_text` field

semantic_text has significantly faster ingest times and is recommended.

https://github.com/elastic/elasticsearch/blob/main/docs/reference/mapping/types/semantic-text.asciidoc

[9]

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'hf-semantic-text-index'})

Insert Documents

In this example, we want to show the power of using GPUs in Hugging Face's Inference Endpoint service by indexing millions of multilingual documents from the miracl corpus. The speed at which these documents ingest will depend on whether you use a semantic text field (faster) or an ingest pipeline (slower) and will also depend on how much hardware your rent for your Hugging Face inference endpoint. Using a semantic_text field with a single T4 GPU, it may take about 3 hours to index 1 million documents.

[10]

miracl-corpus.py:   0%|          | 0.00/3.15k [00:00<?, ?B/s]

README.md:   0%|          | 0.00/6.85k [00:00<?, ?B/s]

Loading dataset shards:   0%|          | 0/28 [00:00<?, ?it/s]

[11]

Docs uplaoded: 1000
Docs uplaoded: 2000

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[11], line 27
     17             # if you are using an ingest pipeline instead of a
     18             # semantic text field, use this instead:
     19             # documents.append(
   (...)
     23             #     }
     24             # )
     26 try:
---> 27     response = helpers.bulk(client, documents, raise_on_error=False, timeout="60s")
     28     print("Docs uplaoded:", (j + 1) * MAX_BULK_SIZE)
     30 except Exception as e:

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py:531, in bulk(client, actions, stats_only, ignore_status, *args, **kwargs)
    529 # make streaming_bulk yield successful results so we can count them
    530 kwargs["yield_ok"] = True
--> 531 for ok, item in streaming_bulk(
    532     client, actions, ignore_status=ignore_status, span_name="helpers.bulk", *args, **kwargs  # type: ignore[misc]
    533 ):
    534     # go through request-response pairs and detect failures
    535     if not ok:
    536         if not stats_only:

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py:445, in streaming_bulk(client, actions, chunk_size, max_chunk_bytes, raise_on_error, expand_action_callback, raise_on_exception, max_retries, initial_backoff, max_backoff, yield_ok, ignore_status, span_name, *args, **kwargs)
    442     time.sleep(min(max_backoff, initial_backoff * 2 ** (attempt - 1)))
    444 try:
--> 445     for data, (ok, info) in zip(
    446         bulk_data,
    447         _process_bulk_chunk(
    448             client,
    449             bulk_actions,
    450             bulk_data,
    451             otel_span,
    452             raise_on_exception,
    453             raise_on_error,
    454             ignore_status,
    455             *args,
    456             **kwargs,
    457         ),
    458     ):
    459         if not ok:
    460             action, info = info.popitem()

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py:343, in _process_bulk_chunk(client, bulk_actions, bulk_data, otel_span, raise_on_exception, raise_on_error, ignore_status, *args, **kwargs)
    339     ignore_status = (ignore_status,)
    341 try:
    342     # send the actual request
--> 343     resp = client.bulk(*args, operations=bulk_actions, **kwargs)  # type: ignore[arg-type]
    344 except ApiError as e:
    345     gen = _process_bulk_chunk_error(
    346         error=e,
    347         bulk_data=bulk_data,
   (...)
    350         raise_on_error=raise_on_error,
    351     )

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/utils.py:446, in _rewrite_parameters.<locals>.wrapper.<locals>.wrapped(*args, **kwargs)
    443         except KeyError:
    444             pass
--> 446 return api(*args, **kwargs)

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/__init__.py:717, in Elasticsearch.bulk(self, operations, body, index, error_trace, filter_path, human, pipeline, pretty, refresh, require_alias, routing, source, source_excludes, source_includes, timeout, wait_for_active_shards)
    712 __body = operations if operations is not None else body
    713 __headers = {
    714     "accept": "application/json",
    715     "content-type": "application/x-ndjson",
    716 }
--> 717 return self.perform_request(  # type: ignore[return-value]
    718     "PUT",
    719     __path,
    720     params=__query,
    721     headers=__headers,
    722     body=__body,
    723     endpoint_id="bulk",
    724     path_parts=__path_parts,
    725 )

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py:271, in BaseClient.perform_request(self, method, path, params, headers, body, endpoint_id, path_parts)
    255 def perform_request(
    256     self,
    257     method: str,
   (...)
    264     path_parts: Optional[Mapping[str, Any]] = None,
    265 ) -> ApiResponse[Any]:
    266     with self._otel.span(
    267         method,
    268         endpoint_id=endpoint_id,
    269         path_parts=path_parts or {},
    270     ) as otel_span:
--> 271         response = self._perform_request(
    272             method,
    273             path,
    274             params=params,
    275             headers=headers,
    276             body=body,
    277             otel_span=otel_span,
    278         )
    279         otel_span.set_elastic_cloud_metadata(response.meta.headers)
    280         return response

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elasticsearch/_sync/client/_base.py:316, in BaseClient._perform_request(self, method, path, params, headers, body, otel_span)
    313 else:
    314     target = path
--> 316 meta, resp_body = self.transport.perform_request(
    317     method,
    318     target,
    319     headers=request_headers,
    320     body=body,
    321     request_timeout=self._request_timeout,
    322     max_retries=self._max_retries,
    323     retry_on_status=self._retry_on_status,
    324     retry_on_timeout=self._retry_on_timeout,
    325     client_meta=self._client_meta,
    326     otel_span=otel_span,
    327 )
    329 # HEAD with a 404 is returned as a normal response
    330 # since this is used as an 'exists' functionality.
    331 if not (method == "HEAD" and meta.status == 404) and (
    332     not 200 <= meta.status < 299
    333     and (
   (...)
    337     )
    338 ):

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elastic_transport/_transport.py:342, in Transport.perform_request(self, method, target, body, headers, max_retries, retry_on_status, retry_on_timeout, request_timeout, client_meta, otel_span)
    340 try:
    341     otel_span.set_node_metadata(node.host, node.port, node.base_url, target)
--> 342     resp = node.perform_request(
    343         method,
    344         target,
    345         body=request_body,
    346         headers=request_headers,
    347         request_timeout=request_timeout,
    348     )
    349     _logger.info(
    350         "%s %s%s [status:%s duration:%.3fs]"
    351         % (
   (...)
    357         )
    358     )
    360     if method != "HEAD":

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/elastic_transport/_node/_http_urllib3.py:167, in Urllib3HttpNode.perform_request(self, method, target, body, headers, request_timeout)
    164 else:
    165     body_to_send = None
--> 167 response = self.pool.urlopen(
    168     method,
    169     target,
    170     body=body_to_send,
    171     retries=Retry(False),
    172     headers=request_headers,
    173     **kw,  # type: ignore[arg-type]
    174 )
    175 response_headers = HttpHeaders(response.headers)
    176 data = response.data

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py:789, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, preload_content, decode_content, **response_kw)
    786 response_conn = conn if not release_conn else None
    788 # Make the request on the HTTPConnection object
--> 789 response = self._make_request(
    790     conn,
    791     method,
    792     url,
    793     timeout=timeout_obj,
    794     body=body,
    795     headers=headers,
    796     chunked=chunked,
    797     retries=retries,
    798     response_conn=response_conn,
    799     preload_content=preload_content,
    800     decode_content=decode_content,
    801     **response_kw,
    802 )
    804 # Everything went great!
    805 clean_exit = True

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py:536, in HTTPConnectionPool._make_request(self, conn, method, url, body, headers, retries, timeout, chunked, response_conn, preload_content, decode_content, enforce_content_length)
    534 # Receive the response from the server
    535 try:
--> 536     response = conn.getresponse()
    537 except (BaseSSLError, OSError) as e:
    538     self._raise_timeout(err=e, url=url, timeout_value=read_timeout)

File ~/workplace/elasticsearch-labs/.venv/lib/python3.11/site-packages/urllib3/connection.py:464, in HTTPConnection.getresponse(self)
    461 from .response import HTTPResponse
    463 # Get the response from http.client.HTTPConnection
--> 464 httplib_response = super().getresponse()
    466 try:
    467     assert_header_parsing(httplib_response.msg)

File ~/.pyenv/versions/3.11.4/lib/python3.11/http/client.py:1378, in HTTPConnection.getresponse(self)
   1376 try:
   1377     try:
-> 1378         response.begin()
   1379     except ConnectionError:
   1380         self.close()

File ~/.pyenv/versions/3.11.4/lib/python3.11/http/client.py:318, in HTTPResponse.begin(self)
    316 # read until we get a non-100 response
    317 while True:
--> 318     version, status, reason = self._read_status()
    319     if status != CONTINUE:
    320         break

File ~/.pyenv/versions/3.11.4/lib/python3.11/http/client.py:279, in HTTPResponse._read_status(self)
    278 def _read_status(self):
--> 279     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
    280     if len(line) > _MAXLINE:
    281         raise LineTooLong("status line")

File ~/.pyenv/versions/3.11.4/lib/python3.11/socket.py:706, in SocketIO.readinto(self, b)
    704 while True:
    705     try:
--> 706         return self._sock.recv_into(b)
    707     except timeout:
    708         self._timeout_occurred = True

File ~/.pyenv/versions/3.11.4/lib/python3.11/ssl.py:1278, in SSLSocket.recv_into(self, buffer, nbytes, flags)
   1274     if flags != 0:
   1275         raise ValueError(
   1276           "non-zero flags not allowed in calls to recv_into() on %s" %
   1277           self.__class__)
-> 1278     return self.read(nbytes, buffer)
   1279 else:
   1280     return super().recv_into(buffer, nbytes, flags)

File ~/.pyenv/versions/3.11.4/lib/python3.11/ssl.py:1134, in SSLSocket.read(self, len, buffer)
   1132 try:
   1133     if buffer is not None:
-> 1134         return self._sslobj.read(len, buffer)
   1135     else:
   1136         return self._sslobj.read(len)

KeyboardInterrupt:

Semantic search

After the dataset has been enriched with the embeddings, you can query the data using semantic search. Pass a query_vector_builder to the k-nearest neighbor (kNN) vector search API, and provide the query text and the model you have used to create the embeddings.

[12]

[13]


ID: DDbC4pEBhYre9Ocn7zIr
Score: 0.92574656
Text: Orodha ya nchi kufuatana na wakazi

ID: bjbC4pEBhYre9OcnzC3U
Score: 0.9159906
Text: Intercontinental Cup

ID: njbC4pEBhYre9OcnzC3U
Score: 0.91523564
Text: รายการจัดเรียงตามทวีปและประเทศ

ID: bDbC4pEBhYre9Ocn3jBM
Score: 0.9142189
Text: a b c ĉ d e f g ĝ h ĥ i j ĵ k l m n o p r s ŝ t u ŭ v z

ID: 8jbD4pEBhYre9OcnDTSL
Score: 0.9127883
Text: With Australia:
With Adelaide United:

ID: MzbC4pEBhYre9Ocn_TQ1
Score: 0.9116771
Text: Más información en .

ID: _DbC4pEBhYre9Ocn7zEr
Score: 0.9106927
Text: (AS)= Asia (AF)= Afrika (NA)= Amerika ya kaskazini (SA)= Amerika ya kusini (A)= Antaktika (EU)= Ulaya na (AU)= Australia na nchi za Pasifiki.

ID: fDbC4pEBhYre9Ocn7zEr
Score: 0.9096315
Text: Stadi za lugha ya mazungumzo ni kuzungumza na kusikiliza.

ID: DDbC4pEBhYre9Ocn3jBL
Score: 0.90771043
Text: "*(Meksiko mara nyingi huhesabiwa katika Amerika ya Kati kwa sababu za kiutamaduni)"

ID: IjbC4pEBhYre9Ocn3i9L
Score: 0.9070151
Text: Englan is a small village in the district of Wokha, in the Nagaland state of India. Its name literally means "The Path of the Sun". It is one of the main centers of the district and is an active center of the Lotha language and culture.

[17]

ObjectApiResponse({'inference_id': 'my_cohere_rerank_endpoint', 'task_type': 'rerank', 'service': 'cohere', 'service_settings': {'model_id': 'rerank-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}}, 'task_settings': {'top_n': 100, 'return_documents': True}})

[18]

[19]


ID: _DbC4pEBhYre9Ocn7zEr
Score: 0.1766716
Text: (AS)= Asia (AF)= Afrika (NA)= Amerika ya kaskazini (SA)= Amerika ya kusini (A)= Antaktika (EU)= Ulaya na (AU)= Australia na nchi za Pasifiki.

ID: zDbC4pEBhYre9OcnzC7V
Score: 0.06394842
Text: Waingereza nao wakatawala Afrika Mashariki na Kusini, na kuwa sehemu ya Sudan na Somalia, Uganda, Kenya, Tanzania (chini ya jina la Tanganyika), Zanzibar, Nyasaland, Rhodesia, Bechuanaland, Basutoland na Swaziland chini ya utawala wao na baada ya kushinda katika vita huko Afrika ya Kusini walitawala Transvaal, Orange Free State, Cape Colony na Natal, na huko Afrika ya Magharibi walitawala Gambia, Sierra Leone, the Gold Coast na Nigeria.

ID: bDbC4pEBhYre9Ocn3jBM
Score: 0.013532149
Text: a b c ĉ d e f g ĝ h ĥ i j ĵ k l m n o p r s ŝ t u ŭ v z

ID: LDbD4pEBhYre9OcnHje5
Score: 0.010130412
Text: Mifano maarufu ya bunge ni Majumba ya Bunge mjini London, Kongresi mjini Washingtin D.C., Bundestag mjini Berlin na Duma nchini Moscow, Parlamento Italiano mjini Roma na "Assemblée nationale" mjini Paris. Kwa kanuni ya serikali wakilishi watu hupigia kura wanasiasa ili watimize "matakwa" yao. Ingawa nchi kama Israeli, Ugiriki, Uswidi na Uchina zina nyumba moja ya bunge, nchi nyingi zina nyumba mbili za bunge, kumaanisha kuwa zina nyumba mbili za kibunge zinazochaguliwa tofauti. Katika 'nyumba ya chini' wanasiasa wanachaguliwa kuwakilisha maeneo wakilishi bungeni. 'Nymba ya juu' kawaida huchaguliwa kuwakilisha majimbo katika mfumo wa majimbo (kama vile nchii Australia, Ujerumani au Marekani) au upigaji kura tofauti katika katika mfumo wa umoja (kama vile nchini Ufaransa). Nchini Uingereza nyumba ya juu inachaguliwa na na serikali kama nyumba ya marudio. Ukosoaji mmoja wa mifumo yenye nyumba mbili yenye nyumba mbili zilizochaguliwa ni kuwa nyumba ya juu na ya chini huenda zikafanana. Utetezi wa tangu jadi wa mifumo ya nyumba mbili nni kuwa chumba cha juu huwa kama nyumba ya marekebisho. Hili linaweza kupunguza uonevu na dhuluma katika hatua ya kiserikali", 101

ID: lzbC4pEBhYre9Ocn7zIr
Score: 0.0033897832
Text: इसके अलावा हिन्दी और संस्कृत में

ID: wDbC4pEBhYre9Ocn7zIr
Score: 0.0025311112
Text: 2. التزام بريطانيا وفرنسا وفيما بعد إيطاليا بإدارة دولية لفلسطين.

ID: IjbC4pEBhYre9Ocn3i9L
Score: 0.0023596606
Text: Englan is a small village in the district of Wokha, in the Nagaland state of India. Its name literally means "The Path of the Sun". It is one of the main centers of the district and is an active center of the Lotha language and culture.

ID: jTbD4pEBhYre9OcnDTWL
Score: 0.0022694687
Text: ఇండియా గేటు

ID: 4zbC4pEBhYre9Ocn_TM0
Score: 0.0018458483
Text: Más información en la web de la Generalidad Valenciana o en la web de la FEDME

ID: 8jbD4pEBhYre9OcnDTSL
Score: 0.0016875096
Text: With Australia:
With Adelaide United:

NOTE: The value of model_id in the query_vector_builder must match the value of inference_id you created in the first step.