Notebooks
M
MongoDB
Evaluation Of Representation Capacity Retention With Mongodb Voyageai

Evaluation Of Representation Capacity Retention With Mongodb Voyageai

advanced_techniquesagentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

AI Developer's Guide: Efficient Vector Search in MongoDB Atlas with Automatic Quantization and Voyage AI Embeddings

Open In Colab

Read the Article

Watch the Webinar


Introduction

What's included in this notebook?

  • Data loading and preparation
  • Vector search index creation
  • Data ingestion
  • Vector search operation
  • Retrieving documents and analysing results
  • Representational Capacity Retention
  • Evaluating metrics such as recall, retention, and latency
  • Visualizing search performance and trade-offs

Glossary:

  • Vector Search: A technique used to search for documents in a vector database by comparing the query vector to the vectors in the database.
  • Embedding: A vector representation of a text or image.
  • Quantization: A technique used to reduce the precision of a vector by converting it to a lower precision.
  • ENN: Exact Nearest Neighbour
  • ANN: Approximate Nearest Neighbour
  • Float32: A floating point number with 32 bits.
  • Scalar Quantization: A technique used to reduce the precision of a vector by converting it to a lower precision.
  • Binary Quantization: A technique used to reduce the precision of a vector by converting it to a lower precision.
  • Representational Capacity Retention: The ability of a vector to retain the information of the original vector.

In this guide, we demonstrate how to leverage MongoDB Atlas Search with automatic quantization and Voyage AI embeddings to build a scalable, high-performance vector search pipeline.

By compressing the embedding space—whether through scalar or binary quantization—you can dramatically reduce memory usage while retaining the vast majority of retrieval accuracy compared to a float32 baseline.

These techniques not only cut operational costs but also improve throughput, allowing you to handle larger workloads or more complex queries.

Furthermore, MongoDB Atlas’s integration of indexing, querying, and storage provides a unified environment for rapid prototyping, testing, and production deployment, all backed by robust, enterprise-ready infrastructure.

Step 1: Importing Libraries

Install the necessary libraries for the notebook

  • pymongo: MongoDB Python driver, this will be used to connect to the MongoDB Atlas cluster.
  • voyageai: Voyage AI Python client. This will be used to generate the embeddings for the wikipedia data.
  • pandas: Data manipulation and analysis, this will be used to load the wikipedia data and prepare it for the vector search.
  • datasets: Load and manage datasets, this will be used to load the wikipedia data.
  • matplotlib: Plotting and visualizing data, this will be used to visualize the data.
[1]
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
eland 8.17.0 requires pandas<2,>=1.5, but you have pandas 2.3.0 which is incompatible.

[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.

Creating the function set_env_securely to securely get and set environment variables. This is a helper function to get and set environment variables securely.

[2]

Step 2: Data Loading and Preparation

The dataset used in this notebook is the wikipedia-22-12-en-voyage-embed dataset. This dataset contains the wikipedia data with the embeddings for each document.

The structure of the dataset is as follows:

	
{
  "_id": {
    "$oid": "67b850ebf6f7ad9038cd4ece"
  },
  "id": 1,
  "title": "YouTube",
  "text": "YouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim ... videos were being uploaded at a rate of more than 500 hours of content per minute.",
  "url": "https://en.wikipedia.org/wiki?curid=3524766",
  "wiki_id": 3524766,
  "views": 5409.56103515625,
  "paragraph_id": 0,
  "langs": 184,
  "embedding": [
    -0.027068108320236206,
    0.023762645199894905,
    ...
    0.002724801190197468,
    -0.003213807474821806,
    0.025735605508089066
  ]
}


  • _id: The unique identifier for the document.
  • id: The unique identifier for the document.
  • title: The title of the document.
  • text: The text of the document.
  • url: The url of the document.
  • wiki_id: The wikipedia id of the document.
  • views: The number of views of the document.
  • paragraph_id: The paragraph id of the document.
  • langs: The number of languages in the document.
  • embedding: The embedding for the document. This contains the 1024 dimensional vector for the document.

This notebook also uses the wikipedia-22-12-en-annotation dataset. This dataset contains the annotation data for the wikipedia data. The annotation data contains the ground truth for the wikipedia data. The annotation data is used to evaluate the performance of the vector search.

The structure of the annotation data is as follows:

	

{
  "_id": {
    "$oid": "67890c9fb0e20ecbe725655c"
  },
  "id": 1,
  "wiki_id": 3524766,
  "queries": {
    "sentences": [
      "YouTube is a global online video sharing and social media platform headquartered in San Bruno, California.",
      "It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim.",
      "It is owned by Google, and is the second most visited website, after Google Search.",
      "YouTube has more than 2.5 billion monthly users who collectively watch more than one billion hours of videos each day.",
      ", videos were being uploaded at a rate of more than 500 hours of content per minute."
    ],
    "key_phrases": [
      "second visited website",
      "billion hours videos",
      "video sharing social",
      "owned google",
      "search youtube"
    ],
    "questions": [
      "How many users does YouTube have?",
      "What is Google Search?",
      "What is Google?",
      "What is YouTube?",
      "When was YouTube launched?",
      "Where does YouTube rank among most visited websites?",
      "Who is Chad Hurley?",
      "Who is Jawed Karim?",
      "Who is Steve Chen?",
      "Who owns YouTube?"
    ],
    "partial_info": [
      "YouTube has users",
      "YouTube is a",
      "headquartered in San Bruno, California"
    ]
  }
}

  • _id: The unique identifier for the document.
  • id: The unique identifier for the document.
  • wiki_id: The wikipedia id of the document.
  • queries: The queries for the document. This contains the key phrases, questions, partial information and sentences for the document.
  • key_phrases: The key phrases for the document. These are the key phrases that are used to evaluate the performance of the vector search.
  • questions: The questions for the document. These are the questions that are used to evaluate the performance of the vector search.
  • partial_info: The partial information for the document. This is the partial information that is used to evaluate the performance of the vector search.
  • sentences: The sentences for the document. This is the sentences that are used to evaluate the performance of the vector search.
[3]
/Users/richmondalake/miniconda3/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[4]
[5]
[6]
[7]
[8]

Step 3: Data Preparation

  1. Identify all the embedding columns
  2. Create a new column 'embedding' that is a list of all the embedding values for each row
  3. Drop the original embedding columns
  4. Convert the embedding field in the dataset to a list of BSON objects
[9]
[10]

Since we are creating scenarios for the optimizaton of vecor search, which includes the optimizatioon of vector data storage and retrival, we need to convert the embedding field in the dataset to a list of BSON objects.

BSON Objects are a binary representation of the data that is used to store the data in the database.

More specficlly,we recommend the BSON binData vector subtype for the following use cases:

  • You need to index quantized vector output from embedding models.
  • You have a large number of float vectors but want to reduce the storage footprint (such as disk and memory usage) of the database.

Benefits The BinData vector format requires about three times less disk space in your cluster compared to arrays of elements. It allows you to index your vectors with alternate types such as int1 or int8 vectors, reducing the memory needed to build the Atlas Vector Search index for your collection. It reduces the RAM for mongot by 3.75x for scalar and by 24x for binary; the vector values shrink by 4x and 32x respectively, but the Hierarchical Navigable Small Worlds graph itself doesn't shrink.

In this notebook, we will convert the embeddings to the BSON binData vector format by using the bson.binary module.

[11]
[12]
[13]

Step 4: Embedding Generation with Voyage AI

In this step, we will generate the embeddings for the wikipedia data using the Voyage AI API.

We will use the voyage-3-large model to generate the embeddings.

One importnat thing to note is that althoguh you are expected to have credit card for the voyage api, your first 200 million tokens are free for every account, and subsequent usage is priced on a per-token basis.

Go here for more information on getting your API key and setting it in the environment variables.

[14]

The get_embedding function is used to generate the embeddings for the text using the voyage-3-large model.

The function takes a text string and a task prefix as input and returns the embedding vector as a list of floats.

The function also takes an optional argument input_type which can be set to "document" or "query" to specify the type of input to the model.

[15]

Step 5: MongoDB (Operational and Vector Database)

MongoDB acts as both an operational and vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.

Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

[16]
[17]
[18]
Connection to MongoDB successful
Collection 'wikipedia-22-12-en 2' already exists.
Collection 'wikipedia-22-12-en-annotation 2' already exists.
[19]
DeleteResult({'n': 87200, 'electionId': ObjectId('7fffffff0000000000000047'), 'opTime': {'ts': Timestamp(1749471793, 950), 't': 71}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1749471793, 950), 'signature': {'hash': b'\xb9A\xe6\xdc\xbdS^\xb5\x01\x0c\x990@!\x19\xe6\x81\xd6\xc4\xd1', 'keyId': 7456452796569616385}}, 'operationTime': Timestamp(1749471793, 950)}, acknowledged=True)

Step 6: Vector Search Index Creation

In this step, we will create the vector search index for the wikipedia data.

We will create 3 vector search indexes:

  1. Scalar Quantized Index
  2. Binary Quantized Index
  3. Float32 ANN Index

The scalar quantized index will use the scalar quantization method to quantize the embeddings.

The binary quantized index will use the binary quantization method to quantize the embeddings.

The float32 ann index will use the float32 ann method to quantize the embeddings.

[20]
[21]
[22]
[23]
Creating index 'vector_index_scalar_quantized'...
Waiting for 60 seconds to allow index 'vector_index_scalar_quantized' to be created...
60-second wait completed for index 'vector_index_scalar_quantized'.
Creating index 'vector_index_binary_quantized'...
Waiting for 60 seconds to allow index 'vector_index_binary_quantized' to be created...
60-second wait completed for index 'vector_index_binary_quantized'.
'vector_index_binary_quantized'
[24]
Creating index 'vector_index_float32_ann'...
Waiting for 60 seconds to allow index 'vector_index_float32_ann' to be created...
60-second wait completed for index 'vector_index_float32_ann'.
'vector_index_float32_ann'

Step 7: Data Ingestion

[25]
Data ingestion into MongoDB completed

Step 8: Vector Search Operation

In this step, we will perform the vector search operation on the wikipedia data.

We will use the custom_vector_search function to perform the vector search operation.

The function takes a user query, a collection, an embedding path, a vector search index name, a top_k value, a num_candidates value and a use_full_precision value as input and returns the results of the vector search operation.

One thing to note is that the use_full_precision value is set to False by default.

This means that the vector search operation will use the approximate search.

The use_full_precision value is set to True when we want to use the exact search.

[26]

Step 8: Retrieving Documents and Analysing Results

In this step, we will retrieve the documents and analyse the results.

We will use the custom_vector_search function to perform the vector search operation.

The function takes a user query, a collection, an embedding path, a vector search index name, a top_k value, a num_candidates value and a use_full_precision value as input and returns the results of the vector search operation.

[27]
[28]
[29]

The format_time function is used to format the latency_ms to a human-readable format.

[30]

Step 9: Measuring Latency with Varying Top-K and NumCandidates

In this step, we will measure the latency of the vector search operation with varying top-k and num_candidates.

We will use the measure_latency_with_varying_topk function to measure the latency of the vector search operation.

The function takes a user query, a collection, a vector search index name, a use_full_precision value, a top_k_values value and a num_candidates_values value as input and returns the results of the vector search operation.

What we expect is that the latency will increase as the top_k and num_candidates values increase.

This is because the vector search operation will have to search a larger number of documents and the search will take longer.

But we also expect that the latency to be higher for full fidelity search (use_full_precision=True) than for approximate search (use_full_precision=False).

Full fidelity search will take longer than approximate search because it will have to search the entire dataset, using the full precision float32 vectors and exact nearest neighbor search.

We also expect the latency of quantized search to be lower than full fidelity search because the quantized search will use the approximate search and the quantized vectors.

[31]
[32]
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 25, Latency:    8.102ms (raw: 8.102367 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 50, Latency:  264.387ms (raw: 264.387028 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 100, Latency:  303.772ms (raw: 303.772164 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 200, Latency:  202.445ms (raw: 202.444986 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 500, Latency:  546.927ms (raw: 546.926669 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 1000, Latency:  323.732ms (raw: 323.731974 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 2000, Latency:  201.237ms (raw: 201.237178 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 5000, Latency:  128.343ms (raw: 128.342857 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 10000, Latency:  119.334ms (raw: 119.33442 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 25, Latency:    4.462ms (raw: 4.461643 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 50, Latency:    6.712ms (raw: 6.711822 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 100, Latency:    7.822ms (raw: 7.822359 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 200, Latency:   10.573ms (raw: 10.573314 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 500, Latency:   66.002ms (raw: 66.001868 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 1000, Latency:   30.830ms (raw: 30.829953 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 2000, Latency:  117.028ms (raw: 117.02757 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 5000, Latency:   61.688ms (raw: 61.687681 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 10000, Latency:  171.554ms (raw: 171.554356 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 50, Latency:   10.435ms (raw: 10.434606 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 100, Latency:   30.230ms (raw: 30.229553 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 200, Latency:   20.659ms (raw: 20.658992 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 500, Latency:   23.669ms (raw: 23.66949 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 1000, Latency:   40.447ms (raw: 40.44702 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 2000, Latency:   50.370ms (raw: 50.370444 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 5000, Latency:   60.009ms (raw: 60.009454 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 10000, Latency:   81.870ms (raw: 81.870338 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 100, Latency:    6.891ms (raw: 6.890629 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 200, Latency:    9.140ms (raw: 9.140325 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 500, Latency:   26.801ms (raw: 26.800929 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 1000, Latency:   52.601ms (raw: 52.600895 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 2000, Latency:   54.940ms (raw: 54.940293 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 5000, Latency:   99.185ms (raw: 99.184904 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 10000, Latency:  155.201ms (raw: 155.201213 ms), Precision: _float32_ann
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 25, Latency:    4.164ms (raw: 4.164131 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 50, Latency:  194.091ms (raw: 194.090833 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 100, Latency:   82.622ms (raw: 82.621924 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 200, Latency:   95.155ms (raw: 95.155332 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 500, Latency:   58.143ms (raw: 58.142574 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 1000, Latency:   45.272ms (raw: 45.271535 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 2000, Latency:   94.707ms (raw: 94.707237 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 5000, Latency:   82.077ms (raw: 82.077434 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 10000, Latency:  101.233ms (raw: 101.233297 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 25, Latency:    2.972ms (raw: 2.971775 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 50, Latency:    5.299ms (raw: 5.299045 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 100, Latency:    7.559ms (raw: 7.558593 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 200, Latency:   10.669ms (raw: 10.669066 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 500, Latency:   19.261ms (raw: 19.261089 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 1000, Latency:   37.835ms (raw: 37.834576 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 2000, Latency:   75.095ms (raw: 75.095169 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 5000, Latency:  262.643ms (raw: 262.6434 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 10000, Latency:   80.700ms (raw: 80.700224 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 50, Latency:    5.482ms (raw: 5.482333 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 100, Latency:    7.834ms (raw: 7.834495 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 200, Latency:   11.187ms (raw: 11.18685 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 500, Latency:   22.131ms (raw: 22.130558 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 1000, Latency:   33.811ms (raw: 33.811001 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 2000, Latency:   45.426ms (raw: 45.425795 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 5000, Latency:   65.002ms (raw: 65.002094 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 10000, Latency:   82.072ms (raw: 82.072382 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 100, Latency:    8.346ms (raw: 8.34633 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 200, Latency:   11.413ms (raw: 11.413226 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 500, Latency:   19.647ms (raw: 19.646612 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 1000, Latency:   32.943ms (raw: 32.942568 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 2000, Latency:   45.591ms (raw: 45.590646 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 5000, Latency:   65.565ms (raw: 65.565393 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 10000, Latency:  114.636ms (raw: 114.635506 ms), Precision: _scalar_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 25, Latency:   11.947ms (raw: 11.946616 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 50, Latency:   15.638ms (raw: 15.638395 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 100, Latency:   17.503ms (raw: 17.503189 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 200, Latency:   33.095ms (raw: 33.095378 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 500, Latency:   47.318ms (raw: 47.317649 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 1000, Latency:   64.461ms (raw: 64.460698 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 2000, Latency:  114.577ms (raw: 114.577266 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 5000, Latency:  139.052ms (raw: 139.051991 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 10000, Latency:  339.793ms (raw: 339.792947 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 25, Latency:   21.652ms (raw: 21.651937 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 50, Latency:   11.997ms (raw: 11.997145 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 100, Latency:   15.662ms (raw: 15.661963 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 200, Latency:   28.806ms (raw: 28.805801 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 500, Latency:   42.907ms (raw: 42.907482 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 1000, Latency:   70.720ms (raw: 70.719694 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 2000, Latency:  116.457ms (raw: 116.456809 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 5000, Latency:  145.429ms (raw: 145.429363 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 10000, Latency:  229.288ms (raw: 229.287517 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 50, Latency:   18.681ms (raw: 18.681198 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 100, Latency:   16.066ms (raw: 16.066314 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 200, Latency:   31.516ms (raw: 31.516289 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 500, Latency:   42.799ms (raw: 42.799341 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 1000, Latency:  142.828ms (raw: 142.827674 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 2000, Latency:  113.050ms (raw: 113.050431 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 5000, Latency:  139.300ms (raw: 139.300291 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 10000, Latency:  201.480ms (raw: 201.480052 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 100, Latency:   16.316ms (raw: 16.316202 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 200, Latency:   37.335ms (raw: 37.335221 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 500, Latency:   43.856ms (raw: 43.855881 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 1000, Latency:  111.402ms (raw: 111.40233 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 2000, Latency:  221.530ms (raw: 221.529805 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 5000, Latency:  276.394ms (raw: 276.394373 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 10000, Latency:  256.698ms (raw: 256.697625 ms), Precision: _binary_
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 25, Latency:     6.822s (raw: 6822.45 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 50, Latency: 1m 20.264s (raw: 80264.08 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 100, Latency:     9.075s (raw: 9074.83 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 200, Latency:     5.632s (raw: 5631.68 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 500, Latency:     7.734s (raw: 7733.92 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 1000, Latency:     5.667s (raw: 5666.58 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 2000, Latency:     6.600s (raw: 6600.35 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 5000, Latency:    13.391s (raw: 13391.310000000001 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 5, NumCandidates: 10000, Latency:     7.313s (raw: 7312.530000000001 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 25, Latency:     5.320s (raw: 5319.57 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 50, Latency:     4.816s (raw: 4815.58 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 100, Latency:     3.641s (raw: 3640.9700000000003 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 200, Latency:     2.734s (raw: 2734.28 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 500, Latency:     2.692s (raw: 2691.76 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 1000, Latency:     2.711s (raw: 2711.0699999999997 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 2000, Latency:     2.694s (raw: 2693.95 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 5000, Latency:     2.573s (raw: 2573.31 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 10, NumCandidates: 10000, Latency:     3.368s (raw: 3367.8599999999997 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 50, Latency:     2.704s (raw: 2703.93 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 100, Latency:     2.528s (raw: 2527.7699999999995 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 200, Latency:     2.464s (raw: 2463.52 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 500, Latency:     2.890s (raw: 2889.86 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 1000, Latency:     2.727s (raw: 2727.1800000000003 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 2000, Latency:     2.318s (raw: 2318.26 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 5000, Latency:     3.760s (raw: 3760.2999999999997 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 50, NumCandidates: 10000, Latency:     2.561s (raw: 2560.71 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 100, Latency:     2.639s (raw: 2638.67 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 200, Latency:     2.697s (raw: 2696.77 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 500, Latency:     2.773s (raw: 2773.32 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 1000, Latency:     3.155s (raw: 3154.5499999999997 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 2000, Latency:     2.833s (raw: 2832.8599999999997 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 5000, Latency:     2.645s (raw: 2645.02 ms), Precision: _float32_ENN
Conducting vector search operation with the following parameters:
Top-K: 100, NumCandidates: 10000, Latency:     2.717s (raw: 2717.08 ms), Precision: _float32_ENN
[33]
OutputOutputOutputOutput

We employ logarithmic scaling for both axes in our latency analysis because search performance data typically spans multiple orders of magnitude. When comparing different precision types (scalar, binary, float32_ann) across varying numbers of candidates, the latency values can range from milliseconds to seconds, while candidate counts may vary from hundreds to millions.

Linear plots would compress smaller values and make it difficult to observe performance trends across the full range(as we see above).

Logarithmic scaling transforms exponential relationships into linear ones, making it easier to identify proportional changes, compare relative performance improvements, and detect patterns that would otherwise be obscured.

This visualization approach is particularly valuable for understanding how each precision type scales with increasing workload and for identifying the optimal operating ranges where certain methods outperform others.

[ ]
OutputOutputOutputOutput

Step 10: Measuring Representational Capacity and Retention

In this step, we will measure the representational capacity and retention of the vector search operation.

We will use the measure_representational_capacity_retention_against_float_enn function to measure the representational capacity and retention of the vector search operation.

We first create the baseline search using the full precision float32 vectors and exact nearest neighbor search.

We then create the quantized search using the quantized vectors and approximate search.

We then compute the retention of the quantized search compared to the baseline search.

We should observe that retention is maintained within a reasonable range for the quantized search.

For example, if the representational capacity is low, it means that the vector search operation is not able to capture the semantic meaning of the query and the results are not accurate, which implies that the quantization is not effective, and even further implies that the initial embedding model used is not effective for the quantization process.

This is why it's important to consider utilizing embedding models that are quantization aware, meaning that during the training process, the model is specifically optimized to produce embeddings that maintain their semantic properties even after quantization.

Quantization-aware training incorporates the quantization process directly into the training loop, allowing the model to learn parameters that minimize information loss when vectors are later compressed. This approach creates embeddings with distributions that are more resilient to quantization effects, preserving more of the critical semantic relationships between vectors even at lower bit representations.

In practice, this means the model learns to distribute information across the embedding dimensions in ways that anticipate how quantization will affect the vector space.

[34]
[35]
Loaded 1 ground truth annotations
Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 0.8000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 0.8333

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _float32_ann: [523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _float32_ann: [31460, 523032, 3524766, 4429395, 8135890, 9988187, 12561015, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 0.8000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _float32_ann: [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _float32_ann: [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _float32_ann: [31460, 142056, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 55523, 57317, 142056, 192481, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Quantized IDs: _float32_ann: [3382, 30274, 31460, 55523, 57317, 142056, 192481, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 55523, 57317, 142056, 192481, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Quantized IDs: _float32_ann: [3382, 30274, 31460, 55523, 57317, 142056, 192481, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 55523, 57317, 142056, 192481, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Quantized IDs: _float32_ann: [3382, 30274, 31460, 55523, 57317, 142056, 192481, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _float32_ann: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _float32_ann: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _float32_ann: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _float32_ann: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _float32_ann: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _float32_ann: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _float32_ann: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _float32_ann: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _float32_ann: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _float32_ann: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 32727796, 33039125, 42652013, 45492650, 48370461]
  Retention: 0.8462

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461, 56822861]
  Retention: 0.9333

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _float32_ann: [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _float32_ann: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _float32_ann: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _float32_ann: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _float32_ann: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _float32_ann: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 2844938, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650]
  Retention: 0.8889

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 2844938, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650]
  Retention: 0.8889

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 18839, 2844938, 3524766, 3829005, 5710507, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 0.9167

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _float32_ann: [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 42652013, 45492650]
  Retention: 0.7500

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 0.9000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 0.9000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 0.9000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 0.9412

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 0.9412

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _float32_ann: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _float32_ann: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 3524766, 5710507, 32727796, 42652013, 45492650, 59872594]
  Retention: 0.8750

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 42652013, 45492650, 59872594]
  Retention: 0.8000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Retention: 0.9333

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _float32_ann: [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Retention: 1.0000

Overall Average Retention for top_k 5, num_candidates 25: 0.9600
Overall Average Retention for top_k 5, num_candidates 50: 1.0000
Overall Average Retention for top_k 5, num_candidates 100: 1.0000
Overall Average Retention for top_k 5, num_candidates 200: 1.0000
Overall Average Retention for top_k 5, num_candidates 500: 1.0000
Overall Average Retention for top_k 10, num_candidates 25: 0.9167
Overall Average Retention for top_k 10, num_candidates 50: 1.0000
Overall Average Retention for top_k 10, num_candidates 100: 1.0000
Overall Average Retention for top_k 10, num_candidates 200: 1.0000
Overall Average Retention for top_k 10, num_candidates 500: 1.0000
Overall Average Retention for top_k 50, num_candidates 50: 0.8620
Overall Average Retention for top_k 50, num_candidates 100: 0.9578
Overall Average Retention for top_k 50, num_candidates 200: 0.9800
Overall Average Retention for top_k 50, num_candidates 500: 1.0000
Overall Average Retention for top_k 100, num_candidates 100: 0.9182
Overall Average Retention for top_k 100, num_candidates 200: 0.9749
Overall Average Retention for top_k 100, num_candidates 500: 1.0000
Loaded 1 ground truth annotations
Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _scalar_: [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 0.9412

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _scalar_: [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _scalar_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _scalar_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _scalar_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _scalar_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _scalar_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _scalar_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _scalar_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _scalar_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _scalar_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _scalar_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 31460, 1923870, 3524766, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 0.8462

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 18839, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461, 56822861]
  Retention: 0.9333

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 18839, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461, 56822861]
  Retention: 0.9333

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _scalar_: [14539, 18839, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461, 56822861]
  Retention: 0.9333

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _scalar_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _scalar_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _scalar_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _scalar_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _scalar_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650]
  Retention: 0.7778

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650]
  Retention: 0.7778

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 18839, 3524766, 3829005, 5710507, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 0.8333

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _scalar_: [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 59872594]
  Retention: 0.9000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _scalar_: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _scalar_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 3524766, 32727796, 42652013, 45492650, 59872594]
  Retention: 0.7500

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 42652013, 45492650, 59872594]
  Retention: 0.8667

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 30274, 1923870, 3524766, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Retention: 0.9333

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _scalar_: [14539, 30274, 1923870, 3524766, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Retention: 0.9333

Overall Average Retention for top_k 5, num_candidates 25: 1.0000
Overall Average Retention for top_k 5, num_candidates 50: 1.0000
Overall Average Retention for top_k 5, num_candidates 100: 1.0000
Overall Average Retention for top_k 5, num_candidates 200: 1.0000
Overall Average Retention for top_k 5, num_candidates 500: 1.0000
Overall Average Retention for top_k 10, num_candidates 25: 1.0000
Overall Average Retention for top_k 10, num_candidates 50: 1.0000
Overall Average Retention for top_k 10, num_candidates 100: 1.0000
Overall Average Retention for top_k 10, num_candidates 200: 1.0000
Overall Average Retention for top_k 10, num_candidates 500: 1.0000
Overall Average Retention for top_k 50, num_candidates 50: 0.8430
Overall Average Retention for top_k 50, num_candidates 100: 0.9556
Overall Average Retention for top_k 50, num_candidates 200: 1.0000
Overall Average Retention for top_k 50, num_candidates 500: 1.0000
Overall Average Retention for top_k 100, num_candidates 100: 0.9267
Overall Average Retention for top_k 100, num_candidates 200: 0.9733
Overall Average Retention for top_k 100, num_candidates 500: 0.9733
Loaded 1 ground truth annotations
Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _binary_: [69323, 3524766, 42652013, 45492650]
  Retention: 0.7500

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _binary_: [69323, 3524766, 42652013, 45492650]
  Retention: 0.7500

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _binary_: [69323, 3524766, 12561015, 45492650]
  Retention: 0.7500

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _binary_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 45492650]
  Quantized IDs: _binary_: [69323, 523032, 3524766, 45492650]
  Retention: 1.0000

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _binary_: [69323, 3524766, 33039125, 42652013, 45492650, 58920328]
  Retention: 0.5714

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _binary_: [69323, 3524766, 33039125, 42652013, 45492650, 51021695]
  Retention: 0.5714

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _binary_: [69323, 3524766, 12561015, 42652013, 45492650]
  Retention: 0.7143

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _binary_: [69323, 523032, 3524766, 12561015, 42652013, 45492650]
  Retention: 0.8571

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [69323, 523032, 3524766, 8135890, 12561015, 42652013, 45492650]
  Quantized IDs: _binary_: [69323, 523032, 3524766, 12561015, 42652013, 45492650]
  Retention: 0.8571

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [31460, 69323, 408743, 608982, 3524766, 33039125, 42652013, 45492650, 51021695, 56822861, 58920328]
  Retention: 0.5294

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [30274, 31460, 69323, 408743, 3524766, 12561015, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 0.5882

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [31460, 69323, 408743, 523032, 3524766, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 0.7647

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [31460, 69323, 142056, 408743, 523032, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [31460, 69323, 408743, 523032, 3524766, 4429395, 6479315, 9988187, 12561015, 14995351, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 0.8824

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [14539, 18839, 30274, 31460, 69323, 408743, 608982, 3524766, 6667920, 9988187, 12561015, 33039125, 33548254, 42652013, 45492650, 50654292, 51021695, 56822861, 58920328]
  Retention: 0.4800

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [14539, 30274, 31460, 55523, 69323, 408743, 523032, 608982, 3340088, 3524766, 6479315, 8135890, 9988187, 12561015, 14995351, 31591547, 32727796, 33039125, 33548254, 42652013, 45492650, 51021695, 58920328, 59872594]
  Retention: 0.7200

Query: 'Karim said the inspiration for YouTube first came from the Super Bowl XXXVIII halftime show controversy when Janet Jackson's breast was briefly exposed by Justin Timberlake during the halftime show.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3382, 30274, 31460, 57317, 69323, 142056, 192481, 408743, 523032, 3340088, 3524766, 4429395, 5710507, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 32727796, 33039125, 42652013, 45492650, 51021695, 58920328]
  Quantized IDs: _binary_: [3382, 30274, 31460, 55523, 57317, 69323, 142056, 408743, 523032, 3340088, 3524766, 4429395, 6479315, 8135890, 9988187, 12561015, 14995351, 16477368, 33039125, 42652013, 45492650, 51021695, 58920328]
  Retention: 0.8800

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _binary_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _binary_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _binary_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _binary_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766]
  Quantized IDs: _binary_: [3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _binary_: [1923870, 3524766, 16161443, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _binary_: [14539, 1923870, 3524766, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _binary_: [14539, 1923870, 3524766, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _binary_: [14539, 1923870, 3524766, 48370461]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [1923870, 3524766]
  Quantized IDs: _binary_: [1923870, 3524766]
  Retention: 1.0000

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 31460, 302167, 1923870, 3524766, 12561015, 16161443, 28430970, 32727796, 33039125, 42652013, 45492650, 48370461, 51021695, 56822861]
  Retention: 0.8462

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 31460, 1923870, 3524766, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 0.9231

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 31460, 1923870, 3524766, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 0.9231

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 31460, 1923870, 3524766, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Retention: 0.9231

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 19457, 31460, 44534, 302167, 1494648, 1923870, 3524766, 3829005, 6891537, 12561015, 16161443, 18618509, 28430970, 31591547, 32727796, 32826316, 33039125, 33548254, 42652013, 44382466, 45492650, 48370461, 50654292, 51021695, 56822861, 58920328]
  Retention: 0.8667

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 31460, 1494648, 1923870, 3524766, 3829005, 12561015, 16161443, 19344654, 28430970, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461, 56822861]
  Retention: 0.8667

Query: 'Karim could not easily find video clips of the incident and the 2004 Indian Ocean Tsunami online, which led to the idea of a video-sharing site.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 31460, 1923870, 3524766, 3829005, 5710507, 12561015, 16161443, 32727796, 33039125, 33548254, 42652013, 45492650, 48370461]
  Quantized IDs: _binary_: [14539, 18839, 31460, 1923870, 3524766, 3829005, 12561015, 16161443, 19344654, 32727796, 32826316, 33039125, 33548254, 42652013, 45492650, 48370461, 56822861]
  Retention: 0.9333

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _binary_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _binary_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _binary_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 45492650]
  Quantized IDs: _binary_: [3524766, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [2844938, 3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 2844938, 3524766, 3829005, 12561015, 33548254, 42652013, 44382466, 45492650, 56822861, 59872594]
  Retention: 0.6667

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 18839, 2844938, 3524766, 3829005, 32727796, 33039125, 42652013, 45492650]
  Retention: 0.6667

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 18839, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 2844938, 3524766, 5710507, 7529378, 12561015, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 18839, 1092923, 2844938, 3524766, 3829005, 5710507, 12561015, 22409046, 31591547, 32727796, 33039125, 33548254, 42652013, 44382466, 45492650, 56822861, 59872594]
  Retention: 0.9167

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 18839, 1092923, 2844938, 3340088, 3524766, 3829005, 5710507, 7529378, 12561015, 22409046, 32727796, 33039125, 42652013, 45492650]
  Retention: 1.0000

Query: 'Hurley and Chen said that the original idea for YouTube was a video version of an online dating service, and had been influenced by the website Hot or Not.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Quantized IDs: _binary_: [14539, 18839, 2844938, 3524766, 3829005, 5710507, 7529378, 12561015, 32727796, 33039125, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [42652013, 45492650]
  Retention: 0.6667

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 0.7500

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 0.7500

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 42652013, 45492650]
  Retention: 0.7500

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 3829005, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 3829005, 42652013, 45492650]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _binary_: [195809, 3524766, 32727796, 42652013, 44554748, 45492650, 53039739, 57545953, 59872594]
  Retention: 0.6000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _binary_: [30274, 3524766, 3829005, 31591547, 32727796, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 0.9000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _binary_: [30274, 3524766, 3829005, 5710507, 9988187, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Quantized IDs: _binary_: [30274, 3524766, 3829005, 5710507, 31591547, 42652013, 44554748, 45492650, 53039739, 59872594]
  Retention: 1.0000

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _binary_: [14539, 30274, 195809, 3524766, 3829005, 5710507, 7529378, 22409046, 31591547, 32727796, 33039125, 42652013, 44554748, 44972049, 45492650, 53039739, 53594450, 56822861, 57545953, 58920328, 59872594]
  Retention: 0.7647

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _binary_: [30274, 3524766, 3829005, 5710507, 7529378, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 0.9412

Query: 'They created posts on Craigslist asking attractive women to upload videos of themselves to YouTube in exchange for a $100 reward.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Quantized IDs: _binary_: [30274, 3524766, 3829005, 5710507, 8135890, 9988187, 12561015, 31591547, 32727796, 40712897, 42652013, 44554748, 44972049, 45492650, 53039739, 56822861, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 5, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 25
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 10, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [3524766, 32727796, 42652013, 45492650]
  Quantized IDs: _binary_: [3524766, 32727796, 42652013, 45492650]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 50
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [3524766, 5710507, 12561015, 32727796, 32826316, 33039125, 42652013, 45492650, 56822861, 58920328, 59872594]
  Retention: 0.8750

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [14539, 1923870, 3524766, 9988187, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 0.8750

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 50, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [14539, 3524766, 5710507, 12561015, 32727796, 42652013, 45492650, 59872594]
  Retention: 1.0000

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 100
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 31591547, 32727796, 32826316, 33039125, 42652013, 44554748, 45492650, 56822861, 58920328, 59872594]
  Retention: 0.8667

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 200
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [14539, 30274, 1923870, 2844938, 3524766, 3829005, 5710507, 9988187, 12561015, 32727796, 32826316, 33548254, 42652013, 45492650, 51021695, 59872594]
  Retention: 0.9333

Query: 'Difficulty in finding enough dating videos led to a change of plans, with the site's founders deciding to accept uploads of any video.' | top_k: 100, num_candidates: 500
  Ground Truth wiki_id: 3524766
  Baseline IDs (Float32): [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 59872594]
  Quantized IDs: _binary_: [14539, 30274, 1923870, 3524766, 3829005, 5710507, 9988187, 12561015, 22409046, 32727796, 32826316, 33548254, 42652013, 45492650, 51021695, 59872594]
  Retention: 1.0000

Overall Average Retention for top_k 5, num_candidates 25: 0.8833
Overall Average Retention for top_k 5, num_candidates 50: 0.9500
Overall Average Retention for top_k 5, num_candidates 100: 0.9500
Overall Average Retention for top_k 5, num_candidates 200: 1.0000
Overall Average Retention for top_k 5, num_candidates 500: 1.0000
Overall Average Retention for top_k 10, num_candidates 25: 0.8643
Overall Average Retention for top_k 10, num_candidates 50: 0.8643
Overall Average Retention for top_k 10, num_candidates 100: 0.8929
Overall Average Retention for top_k 10, num_candidates 200: 0.9714
Overall Average Retention for top_k 10, num_candidates 500: 0.9714
Overall Average Retention for top_k 50, num_candidates 50: 0.7034
Overall Average Retention for top_k 50, num_candidates 100: 0.7906
Overall Average Retention for top_k 50, num_candidates 200: 0.9376
Overall Average Retention for top_k 50, num_candidates 500: 0.9611
Overall Average Retention for top_k 100, num_candidates 100: 0.7789
Overall Average Retention for top_k 100, num_candidates 200: 0.8922
Overall Average Retention for top_k 100, num_candidates 500: 0.9627
[36]
OutputOutputOutputOutput

Detailed Average Retention Results:

_float32_ann Embedding:

Top-K: 5
   NumCandidates: 25, Retention: 0.9600
   NumCandidates: 50, Retention: 1.0000
   NumCandidates: 100, Retention: 1.0000
   NumCandidates: 200, Retention: 1.0000
   NumCandidates: 500, Retention: 1.0000

Top-K: 10
   NumCandidates: 25, Retention: 0.9167
   NumCandidates: 50, Retention: 1.0000
   NumCandidates: 100, Retention: 1.0000
   NumCandidates: 200, Retention: 1.0000
   NumCandidates: 500, Retention: 1.0000

Top-K: 50
   NumCandidates: 50, Retention: 0.8620
   NumCandidates: 100, Retention: 0.9578
   NumCandidates: 200, Retention: 0.9800
   NumCandidates: 500, Retention: 1.0000

Top-K: 100
   NumCandidates: 100, Retention: 0.9182
   NumCandidates: 200, Retention: 0.9749
   NumCandidates: 500, Retention: 1.0000

_scalar_ Embedding:

Top-K: 5
   NumCandidates: 25, Retention: 1.0000
   NumCandidates: 50, Retention: 1.0000
   NumCandidates: 100, Retention: 1.0000
   NumCandidates: 200, Retention: 1.0000
   NumCandidates: 500, Retention: 1.0000

Top-K: 10
   NumCandidates: 25, Retention: 1.0000
   NumCandidates: 50, Retention: 1.0000
   NumCandidates: 100, Retention: 1.0000
   NumCandidates: 200, Retention: 1.0000
   NumCandidates: 500, Retention: 1.0000

Top-K: 50
   NumCandidates: 50, Retention: 0.8430
   NumCandidates: 100, Retention: 0.9556
   NumCandidates: 200, Retention: 1.0000
   NumCandidates: 500, Retention: 1.0000

Top-K: 100
   NumCandidates: 100, Retention: 0.9267
   NumCandidates: 200, Retention: 0.9733
   NumCandidates: 500, Retention: 0.9733

_binary_ Embedding:

Top-K: 5
   NumCandidates: 25, Retention: 0.8833
   NumCandidates: 50, Retention: 0.9500
   NumCandidates: 100, Retention: 0.9500
   NumCandidates: 200, Retention: 1.0000
   NumCandidates: 500, Retention: 1.0000

Top-K: 10
   NumCandidates: 25, Retention: 0.8643
   NumCandidates: 50, Retention: 0.8643
   NumCandidates: 100, Retention: 0.8929
   NumCandidates: 200, Retention: 0.9714
   NumCandidates: 500, Retention: 0.9714

Top-K: 50
   NumCandidates: 50, Retention: 0.7034
   NumCandidates: 100, Retention: 0.7906
   NumCandidates: 200, Retention: 0.9376
   NumCandidates: 500, Retention: 0.9611

Top-K: 100
   NumCandidates: 100, Retention: 0.7789
   NumCandidates: 200, Retention: 0.8922
   NumCandidates: 500, Retention: 0.9627

In this guide, we demonstrate how to leverage MongoDB Atlas Search with automatic quantization and Voyage AI embeddings to build a scalable, high-performance vector search pipeline. By compressing the embedding space—whether through scalar or binary quantization—you can dramatically reduce memory usage while retaining the vast majority of retrieval accuracy compared to a float32 baseline.

These techniques not only cut operational costs but also improve throughput, allowing you to handle larger workloads or more complex queries.

Furthermore, MongoDB Atlas’s integration of indexing, querying, and storage provides a unified environment for rapid prototyping, testing, and production deployment, all backed by robust, enterprise-ready infrastructure.