Multimodal Rag Mongodb Voyage Ai
agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag
Export
Building Multimodal RAG Applications with MongoDB and Voyage AI
In this notebook, you will learn how to build multimodal RAG applications using Voyage AI's multimodal embedding models and Google's multimodal LLMs.
Additionally, we will evaluate Voyage AI's VLM-based embedding model against CLIP-based embedding models on our dataset.
Step 1: Install required libraries
- pymongo: Python driver for MongoDB
- voyageai: Python client for Voyage AI
- google-genai: Python library to access Google's embedding models and LLMs via Google AI Studio
- google-cloud-storage: Python client for Google Cloud Storage
- sentence-transformers: Python library to use open-source ML models from Hugging Face
- PyMuPDF: Python library for analyzing and manipulating PDFs
- Pillow: A Python imaging library
- tqdm: Show progress bars for loops in Python
- tenacity: Python library for easily adding retries to functions
[1]
Step 2: Setup prerequisites
- Set the MongoDB connection string: Follow the steps here to get the connection string from the Atlas UI.
- Set the Voyage AI API key: Follow the steps here to get a Voyage AI API key.
- Set a Gemini API key: Follow the steps here to get a Gemini API key via Google AI Studio.
- [In a separate terminal] Setup Application Default Credentials (ADC): Follow the steps here to configure ADC via the Google Cloud CLI.
[2]
MongoDB
[3]
Enter your MongoDB connection string: ········
Voyage AI
[4]
Enter your Voyage AI API key: ········
[5]
Enter your Gemini API key: ········
Step 3: Read PDF from URL
[6]
[7]
Step 4: Store PDF images in GCS and extract metadata for MongoDB
[8]
[9]
[10]
[11]
[12]
[13]
100%|██████████| 22/22 [00:10<00:00, 2.18it/s]
Step 5: Add embeddings to the MongoDB documents
[14]
[15]
[16]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
[17]
[18]
[19]
[20]
100%|██████████| 22/22 [00:29<00:00, 1.33s/it]
[21]
dict_keys(['gcs_key', 'width', 'height', 'voyage_embedding', 'clip_embedding'])
Step 6: Ingest documents into MongoDB
[22]
[23]
{'ok': 1.0,
, '$clusterTime': {'clusterTime': Timestamp(1743655584, 1),
, 'signature': {'hash': b'\xcf1\xccO*\\\xd2\x08\xbf\x147\xe0h\x8b{\xfb \xf5$?',
, 'keyId': 7456513059255746561}},
, 'operationTime': Timestamp(1743655584, 1)} [24]
[25]
[26]
DeleteResult({'n': 22, 'electionId': ObjectId('7fffffff0000000000000027'), 'opTime': {'ts': Timestamp(1743655585, 21), 't': 39}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1743655585, 22), 'signature': {'hash': b'\x7f\x9f\x93\xc9zJ\x8f\xf9\xafO\xeb.\x04\xf3{t"}\xf5\xe4', 'keyId': 7456513059255746561}}, 'operationTime': Timestamp(1743655585, 21)}, acknowledged=True) [27]
InsertManyResult([ObjectId('67ee12a1a224539fbe2019e5'), ObjectId('67ee12a1a224539fbe2019e6'), ObjectId('67ee12a1a224539fbe2019e7'), ObjectId('67ee12a1a224539fbe2019e8'), ObjectId('67ee12a1a224539fbe2019e9'), ObjectId('67ee12a1a224539fbe2019ea'), ObjectId('67ee12a1a224539fbe2019eb'), ObjectId('67ee12a1a224539fbe2019ec'), ObjectId('67ee12a1a224539fbe2019ed'), ObjectId('67ee12a1a224539fbe2019ee'), ObjectId('67ee12a1a224539fbe2019ef'), ObjectId('67ee12a1a224539fbe2019f0'), ObjectId('67ee12a1a224539fbe2019f1'), ObjectId('67ee12a1a224539fbe2019f2'), ObjectId('67ee12a1a224539fbe2019f3'), ObjectId('67ee12a1a224539fbe2019f4'), ObjectId('67ee12a1a224539fbe2019f5'), ObjectId('67ee12a1a224539fbe2019f6'), ObjectId('67ee12a1a224539fbe2019f7'), ObjectId('67ee12a1a224539fbe2019f8'), ObjectId('67ee12a1a224539fbe2019f9'), ObjectId('67ee12a1a224539fbe2019fa')], acknowledged=True) Step 7: Create a vector search index
[28]
[29]
'vector_index'
Step 8: Retrieve documents using vector search
[30]
[31]
[32]
0.7585833072662354
0.7482262253761292
0.7399106025695801
0.7107774019241333
0.69964599609375
['multimodal-rag/1.png', , 'multimodal-rag/13.png', , 'multimodal-rag/14.png', , 'multimodal-rag/7.png', , 'multimodal-rag/4.png']
[33]
0.6344423294067383
0.6320553421974182
0.6312342882156372
0.6270501017570496
0.6267095804214478
['multimodal-rag/1.png', , 'multimodal-rag/7.png', , 'multimodal-rag/14.png', , 'multimodal-rag/8.png', , 'multimodal-rag/5.png']
Step 9: Create a multimodal RAG app
[34]
[35]
[36]
[37]
[38]
'DeepSeek-R1 achieves a Pass@1 accuracy of 79.8% on AIME 2024, 97.3% on MATH-500, 90.8% on MMLU, and 71.5% on GPQA Diamond. It outperforms OpenAI-o1-1217 on MATH-500 and Codeforces. It performs slightly better than DeepSeek-V3 on SWE-bench Verified. However, its performance is slightly below that of OpenAI-o1-1217 on benchmarks like MMLU-Pro and GPQA Diamond.\n'
[39]
"Based on the provided context, here's a summary of the Pass@1 accuracy of Deepseek R1 against other models:\n\n* **DeepSeek-R1 vs. OpenAI-01-mini and OpenAI-01-0912:** DeepSeek-R1 outperforms both OpenAI-01-mini and OpenAI-01-0912 on AIME 2024, MATH-500, and GPOA Diamond benchmarks.\n* **DeepSeek-R1 vs. Distilled Models:** DeepSeek-R1-Distill-Qwen-7B outperforms Qwen32B-Preview on all evaluation metrics. DeepSeek-R1-14B surpasses QwQ32B-Preview on all evaluation metrics. DeepSeek-R1-Distill-Llama-70B exceeds OpenAI-01-1217 on entire non benchmark tasks.\n* **DeepSeek-R1 vs. DeepSeek-V3:** DeepSeek-R1 demonstrates outstanding performance on tasks requiring long-context understanding, substantially outperforming DeepSeek-V3 on long-context benchmarks.\n* **DeepSeek-R1 vs. OpenAI-1217:** DeepSeek-R1 performs comparably on par with OpenAI-1217, surpassing other models by a large margin."
Step 10: Evaluating retrieval and generation
[40]
[41]
[42]
[43]
[44]
[45]
[ ]
[53]
Model: voyage
100%|██████████| 5/5 [00:25<00:00, 5.20s/it]
MRR: 1.0 Avg. Recall @5: 0.68 Avg. alignment: 3.2
[54]
Model: clip
100%|██████████| 5/5 [00:24<00:00, 5.00s/it]
MRR: 0.8 Avg. Recall @5: 0.56 Avg. alignment: 3.2
[55]
Model: voyage
100%|██████████| 5/5 [00:25<00:00, 5.05s/it]
MRR: 0.8666666666666666 Avg. Recall @5: 0.52 Avg. alignment: 3.8
[56]
Model: clip
100%|██████████| 5/5 [00:22<00:00, 4.58s/it]
MRR: 0.8 Avg. Recall @5: 0.32 Avg. alignment: 2.4
[57]
Model: voyage
100%|██████████| 10/10 [00:49<00:00, 4.97s/it]
MRR: 0.9333333333333332 Avg. Recall @5: 0.6 Avg. alignment: 3.5
[58]
Model: clip
100%|██████████| 10/10 [00:46<00:00, 4.67s/it]
MRR: 0.8 Avg. Recall @5: 0.4400000000000001 Avg. alignment: 2.7