MongoDB Multimodal Rag Mongodb Voyage Ai

Multimodal Rag Mongodb Voyage Ai

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

alph-notebooks/mongodb-genai-showcase / multimodal_rag_mongodb_voyage_ai.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Building Multimodal RAG Applications with MongoDB and Voyage AI

In this notebook, you will learn how to build multimodal RAG applications using Voyage AI's multimodal embedding models and Google's multimodal LLMs.

Additionally, we will evaluate Voyage AI's VLM-based embedding model against CLIP-based embedding models on our dataset.

Step 1: Install required libraries

pymongo: Python driver for MongoDB
voyageai: Python client for Voyage AI
google-genai: Python library to access Google's embedding models and LLMs via Google AI Studio
google-cloud-storage: Python client for Google Cloud Storage
sentence-transformers: Python library to use open-source ML models from Hugging Face
PyMuPDF: Python library for analyzing and manipulating PDFs
Pillow: A Python imaging library
tqdm: Show progress bars for loops in Python
tenacity: Python library for easily adding retries to functions

[1]

Step 2: Setup prerequisites

Set the MongoDB connection string: Follow the steps here to get the connection string from the Atlas UI.
Set the Voyage AI API key: Follow the steps here to get a Voyage AI API key.
Set a Gemini API key: Follow the steps here to get a Gemini API key via Google AI Studio.
[In a separate terminal] Setup Application Default Credentials (ADC): Follow the steps here to configure ADC via the Google Cloud CLI.

[2]

MongoDB

[3]

Enter your MongoDB connection string:  ········

Voyage AI

[4]

Enter your Voyage AI API key:  ········

Google

[5]

Enter your Gemini API key:  ········

Step 3: Read PDF from URL

[6]

[7]

Step 4: Store PDF images in GCS and extract metadata for MongoDB

[8]

[9]

[10]

[11]

[12]

[13]

100%|██████████| 22/22 [00:10<00:00,  2.18it/s]

Step 5: Add embeddings to the MongoDB documents

[14]

[15]

[16]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.

[17]

[18]

[19]

[20]

100%|██████████| 22/22 [00:29<00:00,  1.33s/it]

[21]

dict_keys(['gcs_key', 'width', 'height', 'voyage_embedding', 'clip_embedding'])

Step 6: Ingest documents into MongoDB

[22]

[23]

{'ok': 1.0,
, '$clusterTime': {'clusterTime': Timestamp(1743655584, 1),
,  'signature': {'hash': b'\xcf1\xccO*\\\xd2\x08\xbf\x147\xe0h\x8b{\xfb \xf5$?',
,   'keyId': 7456513059255746561}},
, 'operationTime': Timestamp(1743655584, 1)}

[24]

[25]

[26]

DeleteResult({'n': 22, 'electionId': ObjectId('7fffffff0000000000000027'), 'opTime': {'ts': Timestamp(1743655585, 21), 't': 39}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1743655585, 22), 'signature': {'hash': b'\x7f\x9f\x93\xc9zJ\x8f\xf9\xafO\xeb.\x04\xf3{t"}\xf5\xe4', 'keyId': 7456513059255746561}}, 'operationTime': Timestamp(1743655585, 21)}, acknowledged=True)

[27]

InsertManyResult([ObjectId('67ee12a1a224539fbe2019e5'), ObjectId('67ee12a1a224539fbe2019e6'), ObjectId('67ee12a1a224539fbe2019e7'), ObjectId('67ee12a1a224539fbe2019e8'), ObjectId('67ee12a1a224539fbe2019e9'), ObjectId('67ee12a1a224539fbe2019ea'), ObjectId('67ee12a1a224539fbe2019eb'), ObjectId('67ee12a1a224539fbe2019ec'), ObjectId('67ee12a1a224539fbe2019ed'), ObjectId('67ee12a1a224539fbe2019ee'), ObjectId('67ee12a1a224539fbe2019ef'), ObjectId('67ee12a1a224539fbe2019f0'), ObjectId('67ee12a1a224539fbe2019f1'), ObjectId('67ee12a1a224539fbe2019f2'), ObjectId('67ee12a1a224539fbe2019f3'), ObjectId('67ee12a1a224539fbe2019f4'), ObjectId('67ee12a1a224539fbe2019f5'), ObjectId('67ee12a1a224539fbe2019f6'), ObjectId('67ee12a1a224539fbe2019f7'), ObjectId('67ee12a1a224539fbe2019f8'), ObjectId('67ee12a1a224539fbe2019f9'), ObjectId('67ee12a1a224539fbe2019fa')], acknowledged=True)

Step 7: Create a vector search index

[28]

[29]

'vector_index'

Step 8: Retrieve documents using vector search

[30]

[31]

[32]

0.7585833072662354

0.7482262253761292

0.7399106025695801

0.7107774019241333

0.69964599609375

['multimodal-rag/1.png',
, 'multimodal-rag/13.png',
, 'multimodal-rag/14.png',
, 'multimodal-rag/7.png',
, 'multimodal-rag/4.png']

[33]

0.6344423294067383

0.6320553421974182

0.6312342882156372

0.6270501017570496

0.6267095804214478

['multimodal-rag/1.png',
, 'multimodal-rag/7.png',
, 'multimodal-rag/14.png',
, 'multimodal-rag/8.png',
, 'multimodal-rag/5.png']

Step 9: Create a multimodal RAG app

[34]

[35]

[36]

[37]

[38]

'DeepSeek-R1 achieves a Pass@1 accuracy of 79.8% on AIME 2024, 97.3% on MATH-500, 90.8% on MMLU, and 71.5% on GPQA Diamond. It outperforms OpenAI-o1-1217 on MATH-500 and Codeforces. It performs slightly better than DeepSeek-V3 on SWE-bench Verified. However, its performance is slightly below that of OpenAI-o1-1217 on benchmarks like MMLU-Pro and GPQA Diamond.\n'

[39]

"Based on the provided context, here's a summary of the Pass@1 accuracy of Deepseek R1 against other models:\n\n*   **DeepSeek-R1 vs. OpenAI-01-mini and OpenAI-01-0912:** DeepSeek-R1 outperforms both OpenAI-01-mini and OpenAI-01-0912 on AIME 2024, MATH-500, and GPOA Diamond benchmarks.\n*   **DeepSeek-R1 vs. Distilled Models:** DeepSeek-R1-Distill-Qwen-7B outperforms Qwen32B-Preview on all evaluation metrics. DeepSeek-R1-14B surpasses QwQ32B-Preview on all evaluation metrics. DeepSeek-R1-Distill-Llama-70B exceeds OpenAI-01-1217 on entire non benchmark tasks.\n*   **DeepSeek-R1 vs. DeepSeek-V3:** DeepSeek-R1 demonstrates outstanding performance on tasks requiring long-context understanding, substantially outperforming DeepSeek-V3 on long-context benchmarks.\n*   **DeepSeek-R1 vs. OpenAI-1217:** DeepSeek-R1 performs comparably on par with OpenAI-1217, surpassing other models by a large margin."

Step 10: Evaluating retrieval and generation

[40]

[41]

[42]

[43]

[44]

[45]

[ ]

[53]

Model: voyage

100%|██████████| 5/5 [00:25<00:00,  5.20s/it]

MRR: 1.0
Avg. Recall @5: 0.68
Avg. alignment: 3.2

[54]

Model: clip

100%|██████████| 5/5 [00:24<00:00,  5.00s/it]

MRR: 0.8
Avg. Recall @5: 0.56
Avg. alignment: 3.2

[55]

Model: voyage

100%|██████████| 5/5 [00:25<00:00,  5.05s/it]

MRR: 0.8666666666666666
Avg. Recall @5: 0.52
Avg. alignment: 3.8

[56]

Model: clip

100%|██████████| 5/5 [00:22<00:00,  4.58s/it]

MRR: 0.8
Avg. Recall @5: 0.32
Avg. alignment: 2.4

[57]

Model: voyage

100%|██████████| 10/10 [00:49<00:00,  4.97s/it]

MRR: 0.9333333333333332
Avg. Recall @5: 0.6
Avg. alignment: 3.5

[58]

Model: clip

100%|██████████| 10/10 [00:46<00:00,  4.67s/it]

MRR: 0.8
Avg. Recall @5: 0.4400000000000001
Avg. alignment: 2.7