Notebooks
L
LanceDB
Anthropic Contextual RAG

Anthropic Contextual RAG

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningContextual-RAGexamplesdeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Contextual RAG

In this notebook, we'll explore Contextual Retrieval, a technique to improve the accuracy of vector search by providing additional context for the chunks of a document, by inputting both the document and the chunk to an LLM and asking it to provide a succinct context for the chunk within the document.

This is a way to combat the lost context problem that occurs in chunking, e.g., if a text is split into sentences, the context of later sentences as they relate to earlier sentences is lost.

The idea here is to do these things:

  1. For each document, make chunks (Nothing new. Just like Vanilla RAG)
  2. For each Chunk you created, as an LLM create a context of that Chunk (You see this is new!)
  3. Append that context to the original chunk
  4. Create BM-25 and Vector Index based on those chunks for Hybrid Search (New to you? See this amazing blog by LanceDB on hybrid search)
  5. Search as usual!

Change Runtime with GPU to run this notebook

Install Dependencies

[1]
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.4/44.4 kB 3.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 383.5/383.5 kB 17.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 58.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 29.2/29.2 MB 47.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 245.3/245.3 kB 20.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 72.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 471.6/471.6 kB 28.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 92.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 10.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.4/76.4 kB 7.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.0/78.0 kB 7.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 318.9/318.9 kB 24.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 94.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 11.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 17.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.7/98.7 kB 9.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.4 MB/s eta 0:00:00
[2]
--2024-10-07 09:03:31--  https://raw.githubusercontent.com/anthropics/anthropic-cookbook/refs/heads/main/skills/contextual-embeddings/data/codebase_chunks.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1126046 (1.1M) [text/plain]
Saving to: ‘./data/codebase_chunks.json’

codebase_chunks.jso 100%[===================>]   1.07M  --.-KB/s    in 0.03s   

2024-10-07 09:03:32 (41.2 MB/s) - ‘./data/codebase_chunks.json’ saved [1126046/1126046]

Set OPENAI and Anthropic API KEY as env variable

[3]

Data Loading and Chunking

[4]
Debugging Mode: Using few doc samples only 
Processing 29 chunks from 5 docs:   0%|          | 0/5 [00:00<?, ?it/s]

Vanilla RAG

[5]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]
config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]
README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]
sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]
model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]
tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]
vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]
1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]
[6]
[7]

Contextual Retrieval with Prompt Caching

[9]
[10]

Let's search with Contextual Retrieval and see the difference

[11]

Here we are seeing the difference between the results while using normal retrieval and contextual retrieval with prompt caching and Hybrid search and LanceDB reranking API.

[ ]