Notebooks
L
LanceDB
Hybrid Search

Hybrid Search

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingssaas_examplesfine-tuningexamplesdeep-learningpython_notebookgpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

πŸ” Hybrid Search with LanceDB Cloud

πŸš€ If you haven’t signed up for LanceDB Cloud yet, click here to get started!

This notebook demonstrates how to implement hybrid search using LanceDB Cloud, combining the power of vector embeddings and full-text search with custom business logic. Designed for real-world search applications, this example leverages:

  • OpenAI Embeddings for semantic understanding
  • LanceDB Cloud for managed vector storage
  • BeIR Benchmark Dataset for scientific document retrieval evaluation

πŸš€ Key Features

ComponentImplementation
Hybrid SearchVector + FTS with RRF Reranking
Custom FiltersDomain-specific result filtering
Managed InfrastructureLanceDB Cloud
Scientific FocusSCIDOCS Dataset

Step 1: Install Required Libraries

[ ]

Step 2: Obtain the API key from the dashboard and Connect to LanceDB Cloud

  • Get the db uri

db uri starts with db://, which can be obtained from the project page on the dashboard. In the following example, db uri is db://test-sfifxz.

db-uri.png

  • Get the API Key Obtain a LanceDB Cloud API key by clicking on the GENERATE API KEY from the table page.

πŸ’‘ Copy the code block for connecting to LanceDB Cloud that is shown at the last step of API key generation. image.png

  • Connect to LanceDB Cloud

Copy and paste the db uri and the api key from the previous steps, or directly paste the code block for LanceDB Cloud connection.

[ ]
[ ]

paste your OPEN_AI_KEY

[ ]

Step 3: Import libraries

[ ]

Step 4: Load Chunks of data from BeIR Dataset

Note: This is a dataset built specially for retrieval tasks to see how good your search is working

[ ]

Step 5: Connect to LanceDB Cloud and store embeddings

[ ]

Step 6: Build a Full Text Search (FTS) index

ℹ️ Note that a FTS index is required for performing a hybrid search

[ ]

⚠️ WARNING: create_fts_index is asynchonous so it returns when indexing is in progress. We provide the list_indices and index_stats APIs to check index status. The index name is formed by appending β€œ_idx” to the column name. Note that list_indices will not return any information until the index has fully ingested and indexed all available data.

[ ]
⏳ Waiting for text_idx to be ready...
⏳ Waiting for text_idx to be ready...
⏳ Waiting for text_idx to be ready...
⏳ Waiting for text_idx to be ready...
βœ… text_idx is ready!
IndexStatistics(num_indexed_rows=64, num_unindexed_rows=0, index_type='FTS', distance_type=None, num_indices=None)

Step 7: Search from a random Text

Let's first perform a Full-Text search

[ ]

Now let's perform a vector search

[ ]

Now let's perform a hybrid search to combine the results from the full-text search and vector search.

[ ]

Next, let's define a customer reranker to rank the hybrid search results.

[ ]