Notebooks
M
MongoDB
Retrieval Strategies Mongodb Llamaindex

Retrieval Strategies Mongodb Llamaindex

advanced_techniquesagentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

Open In Colab

View Article

Optimizing for relevance using MongoDB and LlamaIndex

In this notebook, we will explore and tune different retrieval options in MongoDB's LlamaIndex integration to get the most relevant results.

Step 1: Install libraries

  • pymongo: Python package to interact with MongoDB databases and collections

- **llama-index**: Python package for the LlamaIndex LLM framework

- **llama-index-llms-openai**: Python package to use OpenAI models via their LlamaIndex integration

- **llama-index-vector-stores-mongodb**: Python package for MongoDB’s LlamaIndex integration

[1]

[notice] A new release of pip is available: 23.2.1 -> 24.2
[notice] To update, run: pip install --upgrade pip

Step 2: Setup prerequisites

  • Set the MongoDB connection string: Follow the steps here to get the connection string from the Atlas UI.

  • Set the OpenAI API key: Steps to obtain an API key as here

[3]
[18]
[4]

Step 3: Load and process the dataset

[19]
[20]
[21]
[22]
[23]
[24]
Title: The Perils of Pauline
Plot: Young Pauline is left a lot of money when her wealthy uncle dies. However, her uncle's secretary has been named as her guardian until she marries, at which time she will officially take possession of her inheritance. Meanwhile, her "guardian" and his confederates constantly come up with schemes to get rid of Pauline so that he can get his hands on the money himself.
Cast: Pearl White, Crane Wilbur, Paul Panzer, Edward Josè
Genres: Action
Languages: English
Rating: 7.6
[26]
{'title': 'The Perils of Pauline', 'rating': 7.6, 'languages': ['English']}

Step 4: Create MongoDB Atlas vector store

[27]
[28]
[29]
[31]

Step 5: Create Atlas Search indexes

[32]
[33]
[37]
Duplicate index found for model <pymongo.operations.SearchIndexModel object at 0x31d4c33d0>. Skipping index creation.
Duplicate index found for model <pymongo.operations.SearchIndexModel object at 0x31d4c1c60>. Skipping index creation.

Step 6: Get movie recommendations

[35]

Full-text search

[36]
Title: Hellboy II: The Golden Army | Rating: 7.0 | Relevance Score: 5.93734884262085
Title: The Matrix Revolutions | Rating: 6.7 | Relevance Score: 4.574477195739746
Title: The Matrix | Rating: 8.7 | Relevance Score: 4.387373924255371
Title: Go with Peace Jamil | Rating: 6.9 | Relevance Score: 3.5394840240478516
Title: Terminator Salvation | Rating: 6.7 | Relevance Score: 3.3378987312316895

Vector search

[53]
Title: Death Machine | Rating: 5.7 | Relevance Score: 0.7407287359237671
Title: Real Steel | Rating: 7.1 | Relevance Score: 0.7364246845245361
Title: Soldier | Rating: 5.9 | Relevance Score: 0.7282171249389648
Title: Terminator 3: Rise of the Machines | Rating: 6.4 | Relevance Score: 0.7266112565994263
Title: Last Action Hero | Rating: 6.2 | Relevance Score: 0.7250100374221802

Hybrid search

[54]
Title: Hellboy II: The Golden Army | Rating: 7.0 | Relevance Score: 0.5
Title: Death Machine | Rating: 5.7 | Relevance Score: 0.5
Title: The Matrix Revolutions | Rating: 6.7 | Relevance Score: 0.25
Title: Real Steel | Rating: 7.1 | Relevance Score: 0.25
Title: Soldier | Rating: 5.9 | Relevance Score: 0.16666666666666666
[55]
Title: Death Machine | Rating: 5.7 | Relevance Score: 0.7
Title: Real Steel | Rating: 7.1 | Relevance Score: 0.35
Title: Hellboy II: The Golden Army | Rating: 7.0 | Relevance Score: 0.30000000000000004
Title: Soldier | Rating: 5.9 | Relevance Score: 0.2333333333333333
Title: Terminator 3: Rise of the Machines | Rating: 6.4 | Relevance Score: 0.175
[56]
Title: Hellboy II: The Golden Army | Rating: 7.0 | Relevance Score: 0.7
Title: The Matrix Revolutions | Rating: 6.7 | Relevance Score: 0.35
Title: Death Machine | Rating: 5.7 | Relevance Score: 0.3
Title: The Matrix | Rating: 8.7 | Relevance Score: 0.2333333333333333
Title: Go with Peace Jamil | Rating: 6.9 | Relevance Score: 0.175

Combining metadata filters with search

[57]
[58]
[59]
Title: Real Steel | Rating: 7.1 | Relevance Score: 0.7
Title: T2 3-D: Battle Across Time | Rating: 7.8 | Relevance Score: 0.35
Title: The Matrix | Rating: 8.7 | Relevance Score: 0.30000000000000004
Title: Predator | Rating: 7.8 | Relevance Score: 0.2333333333333333
Title: Transformers | Rating: 7.1 | Relevance Score: 0.175