Notebooks
M
MongoDB
Rag Pipeline Kerasnlp Mongodb Gemma2

Rag Pipeline Kerasnlp Mongodb Gemma2

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

RAG Pipeline With Keras NLP, MongoDB and OpenAI

Open In Colab

Set Up Libraries

[5]
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 327.5/327.5 kB 6.0 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 8.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 5.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.3 MB/s eta 0:00:00

Set Up Environment Variables

[53]

Data Loading

[23]
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.

Data Cleaning

[24]

Embedding Generation

[25]
Generating embeddings for columns: abstract, title:   0%|          | 0/4000 [00:00<?, ?it/s]
[30]

MongoDB Vector Database and Connection Setup

MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
  3. Create the database: knowledge.
  4. Within the database research_papers, create the collection ‘listings_reviews’.
  5. Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.

Your vector search index created on MongoDB Atlas should look like below:

	{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}


Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

This guide uses Google Colab, which offers a feature for securely storing environment secrets. These secrets can then be accessed within the development environment. Specifically, the line mongo_uri = userdata.get('MONGO_URI') retrieves the URI from the secure storage.

[ ]
[31]
Connection to MongoDB successful
[32]
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff000000000000002a'), 'opTime': {'ts': Timestamp(1719597926, 1), 't': 42}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1719597926, 1), 'signature': {'hash': b'\xb3\xc2\xbaK\x7f\x82\xe0m`\xea\xfa\x94H\x15/\xc7M!*i', 'keyId': 7320226449804230662}}, 'operationTime': Timestamp(1719597926, 1)}, acknowledged=True)

Data Ingestion

[33]
Data ingestion into MongoDB completed

Vector Search Operation

[34]

Handle User Results

[35]
[49]
Query: Give me a recommended paper on machine learning
Continue to answer the query by using the Search Results:
Title: Using Access Data for Paper Recommendations on ArXiv.org, Plot: N/A
Title: Missing Data: A Comparison of Neural Network and Expectation
  Maximisation Techniques, Plot: N/A
Title: An Adaptive Strategy for the Classification of G-Protein Coupled
  Receptors, Plot: N/A
Title: A multivariate approach to heavy flavour tagging with cascade training, Plot: N/A
.

Keras Config and Markdown

[39]

Handle Response Generation and History

[40]

Gemma2 Model Initalisation

[41]
metadata.json:   0%|          | 0.00/143 [00:00<?, ?B/s]
task.json:   0%|          | 0.00/2.25k [00:00<?, ?B/s]
model.weights.h5:   0%|          | 0.00/18.5G [00:00<?, ?B/s]
vocabulary.spm:   0%|          | 0.00/4.24M [00:00<?, ?B/s]
[42]
CPU times: user 56.1 s, sys: 741 ms, total: 56.8 s
Wall time: 36.7 s

Query Gemma 2 with Retrieved Data

[50]
[51]

View Chat History

[52]
<start_of_turn>user
Query: What is the best romantic movie to watch and why?
Continue to answer the query by using the Search Results:
Title: Non-Associativity of Lorentz Transformation and Associative Reflection
  Symmetric Transformation, Plot: N/A
Title: Erwin Schroedinger, Francis Crick and epigenetic stability, Plot: N/A
Title: Time and motion in physics: the Reciprocity Principle, relativistic
  invariance of the lengths of rulers and time dilatation, Plot: N/A
Title: XMM-Newton X-ray Observations of the Wolf-Rayet Binary System WR 147, Plot: N/A
.<end_of_turn>

<start_of_turn>model


It seems like you've provided me with some scientific research papers rather than movie titles!  

To recommend a great romantic movie, I need information about what kind of romance you're looking for. 

For example, do you prefer:

* **Classic romantic comedies?**
* **Heartbreaking dramas?**
* **Something lighthearted and fun?**
* **A movie with a historical setting?**


Tell me more about your taste, and I'll give you a personalized recommendation! 
<end_of_turn>
<end_of_turn>

<start_of_turn>user
Query: What is the best romantic movie to watch and why?
Continue to answer the query by using the Search Results:
Title: Non-Associativity of Lorentz Transformation and Associative Reflection
  Symmetric Transformation, Plot: N/A
Title: Erwin Schroedinger, Francis Crick and epigenetic stability, Plot: N/A
Title: Time and motion in physics: the Reciprocity Principle, relativistic
  invariance of the lengths of rulers and time dilatation, Plot: N/A
Title: XMM-Newton X-ray Observations of the Wolf-Rayet Binary System WR 147, Plot: N/A
.<end_of_turn>

<start_of_turn>model


You've given me a list of scientific papers again!  Those are fascinating, but they won't help me pick a romantic movie.  

To give you a good recommendation, I need to know what kind of romantic movie you'd enjoy.  

Do you like:

* **Funny romances?**
* **Tearjerkers?**
* **Movies set in a particular time period?**
* **Something with a unique plot?**



Tell me more about what you're looking for! 
<end_of_turn>
<end_of_turn>

<start_of_turn>user
Query: Give me a recommended paper on machine learning
Continue to answer the query by using the Search Results:
Title: Using Access Data for Paper Recommendations on ArXiv.org, Plot: N/A
Title: Missing Data: A Comparison of Neural Network and Expectation
  Maximisation Techniques, Plot: N/A
Title: An Adaptive Strategy for the Classification of G-Protein Coupled
  Receptors, Plot: N/A
Title: A multivariate approach to heavy flavour tagging with cascade training, Plot: N/A
.<end_of_turn>

<start_of_turn>model


Based on your search results, I'd recommend **"Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques"**. 

Here's why:

* **Relevance to Machine Learning:** This paper directly addresses a common challenge in machine learning: handling missing data. 
* **Comparison of Methods:** It compares two popular approaches for dealing with missing data – neural networks and Expectation Maximisation – which is valuable for understanding the strengths and weaknesses of different techniques.


Let me know if you'd like to explore other papers based on specific aspects of machine learning! 
<end_of_turn>
<end_of_turn>