Bq With Qdrant
Binary Quantization with Qdrant
This notebook demonstrates/evaluates the search performance of Qdrant with Binary Quantization. We will use Qdrant Cloud to index and search the embeddings. This demo can be carried out on a free-tier Qdrant cluster as well.
Set Up Binary Quantization
Let's install the 2 Python packages we'll work with.
For the demo, We use samples from the Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K dataset. The dataset includes embeddings generated using OpenAI's text-embedding-3-small model.
You can use your own datasets for this evaluation by adjusting the config values below.
We select 100 records at random from the dataset. We then use the embeddings of the queries to search for the nearest neighbors in the dataset.
Configure Credentials
Setup A Qdrant Collection
Let's create a Qdrant collection to index our vectors. We set on_disk in the vectors config to True offload the original vectors to disk to save memory.
Evaluate Results
Parameters: Oversampling, Rescoring, and Search Limits
For each record, we run a parameter sweep over the number of oversampling, rescoring, and search limits. We can then understand the impact of these parameters on search accuracy and efficiency. Our experiment was designed to assess the impact of Binary Quantization under various conditions, based on the following parameters:
-
Oversampling: By oversampling, we can limit the loss of information inherent in quantization. We experimented with different oversampling factors, and identified the impact on the accuracy and efficiency of search. Spoiler: higher oversampling factors tend to improve the accuracy of searches. However, they usually require more computational resources.
-
Rescoring: Rescoring refines the first results of an initial binary search. This process leverages the original high-dimensional vectors to refine the search results, always improving accuracy. We toggled rescoring on and off to measure effectiveness, when combined with Binary Quantization. We also measured the impact on search performance.
-
Search Limits: We specify the number of results from the search process. We experimented with various search limits to measure their impact the accuracy and efficiency. We explored the trade-offs between search depth and performance. The results provide insight for applications with different precision and speed requirements.
Parameterized Search
We will compare the exact search performance with the approximate search performance.
View The Results
We can now tabulate our results across the ranges of oversampling and rescoring.
Results
Here are some key observations, which analyzes the impact of rescoring (True or False):
-
Significantly Improved Accuracy:
- Enabling rescoring (
True) consistently results in higher accuracy scores compared to when rescoring is disabled (False). - The improvement in accuracy is true across various search limits (10, 20, 50, 100).
- Enabling rescoring (
-
Model and Dimension Specific Observations:
- Th results suggest a diminishing return on accuracy improvement with higher oversampling in lower dimension spaces.
-
Influence of Search Limit:
- The performance gain from rescoring seems to be relatively stable across different search limits, suggesting that rescoring consistently enhances accuracy regardless of the number of top results considered.
In summary, enabling rescoring dramatically improves search accuracy across all tested configurations. It is crucial feature for applications where precision is paramount. The consistent performance boost provided by rescoring underscores its value in refining search results, particularly when working with complex, high-dimensional data. This enhancement is critical for applications that demand high accuracy, such as semantic search, content discovery, and recommendation systems, where the quality of search results directly impacts user experience and satisfaction.
Leveraging Binary Quantization: Best Practices
We recommend the following best practices for leveraging Binary Quantization:
- Oversampling: Use an oversampling factor of 3 for the best balance between accuracy and efficiency. This factor is suitable for a wide range of applications.
- Rescoring: Enable rescoring to improve the accuracy of search results.
- RAM: Store the full vectors and payload on disk. Limit what you load from memory to the binary quantization index. This helps reduce the memory footprint and improve the overall efficiency of the system. The incremental latency from the disk read is negligible compared to the latency savings from the binary scoring in Qdrant, which uses SIMD instructions where possible.