Notebooks
M
MongoDB
Rag With Gemma2 Mongodb Open Models

Rag With Gemma2 Mongodb Open Models

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

RAG Pipeline With Gemma 2, MongoDB and Hugging Face [Open Models]

Open In Colab

Set Up Libraries

[4]
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 547.8/547.8 kB 7.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.0/13.0 MB 41.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 53.9 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.1/227.1 kB 28.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 MB 40.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 17.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 kB 9.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 25.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 20.0 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.7/307.7 kB 36.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 70.5 MB/s eta 0:00:00
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which is incompatible.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 16.1.0 which is incompatible.
google-colab 1.0.0 requires pandas==2.0.3, but you have pandas 2.2.2 which is incompatible.
google-colab 1.0.0 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible.
ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 16.1.0 which is incompatible.
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.3/9.3 MB 23.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 309.4/309.4 kB 5.8 MB/s eta 0:00:00

Data Loading

[5]
Downloading readme:   0%|          | 0.00/6.17k [00:00<?, ?B/s]

Data Cleaning

[6]

Number of missing values in each column after removal:
plot                    0
runtime                14
genres                  0
fullplot                0
directors              12
writers                13
countries               0
poster                 78
languages               1
cast                    1
title                   0
num_mflix_comments      0
rated                 279
imdb                    0
awards                  0
type                    0
metacritic            893
plot_embedding          1
dtype: int64

Embedding Generation

[7]
modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]
README.md:   0%|          | 0.00/67.9k [00:00<?, ?B/s]
sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/619 [00:00<?, ?B/s]
model.safetensors:   0%|          | 0.00/670M [00:00<?, ?B/s]
tokenizer_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]
vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]
1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]
Generating embeddings: 100%|██████████| 1452/1452 [00:36<00:00, 39.52it/s]
[8]

MongoDB Vector Database and Connection Setup

MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
  3. Create the database: movie_rec_sys.
  4. Within the database movie_collection, create the collection ‘listings_reviews’.
  5. Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.

Your vector search index created on MongoDB Atlas should look like below:

	{
  "fields": [
    {
      "numDimensions": 1024,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}


Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

[13]
[18]
Connection to MongoDB successful
[19]
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff000000000000002a'), 'opTime': {'ts': Timestamp(1719600641, 43), 't': 42}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1719600641, 43), 'signature': {'hash': b'\tS\xb0Ja\xd5\x90\xf6\xed`\xef>\x94\xab\x05\xd2[\xdbM\xc1', 'keyId': 7320226449804230662}}, 'operationTime': Timestamp(1719600641, 43)}, acknowledged=True)

Data Ingestion

[20]
Data ingestion into MongoDB completed

Vector Search Operation

[21]

Handle User Results

[22]
[23]
Query: What is the best romantic movie to watch and why?
Continue to answer the query by using the Search Results:
Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love?
Title: Pearl Harbor, Plot: Pearl Harbor is a classic tale of romance set during a war that complicates everything. It all starts when childhood friends Rafe and Danny become Army Air Corps pilots and meet Evelyn, a Navy nurse. Rafe falls head over heels and next thing you know Evelyn and Rafe are hooking up. Then Rafe volunteers to go fight in Britain and Evelyn and Danny get transferred to Pearl Harbor. While Rafe is off fighting everything gets completely whack and next thing you know everybody is in the middle of an air raid we now know as "Pearl Harbor."
Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl.
Title: China Girl, Plot: A modern day Romeo & Juliet story is told in New York when an Italian boy and a Chinese girl become lovers, causing a tragic conflict between ethnic gangs.
.

Load Gemma 2

[24]
tokenizer_config.json:   0%|          | 0.00/40.6k [00:00<?, ?B/s]
tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/857 [00:00<?, ?B/s]
model.safetensors.index.json:   0%|          | 0.00/39.1k [00:00<?, ?B/s]
Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]
model-00001-of-00004.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]
model-00002-of-00004.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]
model-00003-of-00004.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]
model-00004-of-00004.safetensors:   0%|          | 0.00/3.67G [00:00<?, ?B/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/173 [00:00<?, ?B/s]
[25]
<bos>Query: What is the best romantic movie to watch and why?
Continue to answer the query by using the Search Results:
Title: Shut Up and Kiss Me!, Plot: Ryan and Pete are 27-year old best friends in Miami, born on the same day and each searching for the perfect woman. Ryan is a rookie stockbroker living with his psychic Mom. Pete is a slick surfer dude yet to find commitment. Each meets the women of their dreams on the same day. Ryan knocks heads in an elevator with the gorgeous Jessica, passing out before getting her number. Pete falls for the insatiable Tiara, but Tiara's uncle is mob boss Vincent Bublione, charged with her protection. This high-energy romantic comedy asks to what extent will you go for true love?
Title: Pearl Harbor, Plot: Pearl Harbor is a classic tale of romance set during a war that complicates everything. It all starts when childhood friends Rafe and Danny become Army Air Corps pilots and meet Evelyn, a Navy nurse. Rafe falls head over heels and next thing you know Evelyn and Rafe are hooking up. Then Rafe volunteers to go fight in Britain and Evelyn and Danny get transferred to Pearl Harbor. While Rafe is off fighting everything gets completely whack and next thing you know everybody is in the middle of an air raid we now know as "Pearl Harbor."
Title: Titanic, Plot: The plot focuses on the romances of two couples upon the doomed ship's maiden voyage. Isabella Paradine (Catherine Zeta-Jones) is a wealthy woman mourning the loss of her aunt, who reignites a romance with former flame Wynn Park (Peter Gallagher). Meanwhile, a charming ne'er-do-well named Jamie Perse (Mike Doyle) steals a ticket for the ship, and falls for a sweet innocent Irish girl on board. But their romance is threatened by the villainous Simon Doonan (Tim Curry), who has discovered about the ticket and makes Jamie his unwilling accomplice, as well as having sinister plans for the girl.
Title: China Girl, Plot: A modern day Romeo & Juliet story is told in New York when an Italian boy and a Chinese girl become lovers, causing a tragic conflict between ethnic gangs.
.

Based on the provided movie plots, which one would you recommend as the best romantic movie and why?

I would recommend **Shut Up and Kiss Me!** as the best romantic movie from this list. Here's