Rag Pipeline Kerasnlp Mongodb Gemma2
Set Up Libraries
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 327.5/327.5 kB 6.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.6/75.6 kB 8.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 5.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.3 MB/s eta 0:00:00
Set Up Environment Variables
Data Loading
Repo card metadata block was not found. Setting CardData to empty. WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
Data Cleaning
Embedding Generation
Generating embeddings for columns: abstract, title: 0%| | 0/4000 [00:00<?, ?it/s]
MongoDB Vector Database and Connection Setup
MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.
Creating a database and collection within MongoDB is made simple with MongoDB Atlas.
- First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
- Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
- Create the database:
knowledge. - Within the database
research_papers, create the collection ‘listings_reviews’. - Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.
Your vector search index created on MongoDB Atlas should look like below:
{
"fields": [
{
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}
Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.
This guide uses Google Colab, which offers a feature for securely storing environment secrets. These secrets can then be accessed within the development environment. Specifically, the line mongo_uri = userdata.get('MONGO_URI') retrieves the URI from the secure storage.
Connection to MongoDB successful
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff000000000000002a'), 'opTime': {'ts': Timestamp(1719597926, 1), 't': 42}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1719597926, 1), 'signature': {'hash': b'\xb3\xc2\xbaK\x7f\x82\xe0m`\xea\xfa\x94H\x15/\xc7M!*i', 'keyId': 7320226449804230662}}, 'operationTime': Timestamp(1719597926, 1)}, acknowledged=True) Data Ingestion
Data ingestion into MongoDB completed
Vector Search Operation
Handle User Results
Query: Give me a recommended paper on machine learning Continue to answer the query by using the Search Results: Title: Using Access Data for Paper Recommendations on ArXiv.org, Plot: N/A Title: Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques, Plot: N/A Title: An Adaptive Strategy for the Classification of G-Protein Coupled Receptors, Plot: N/A Title: A multivariate approach to heavy flavour tagging with cascade training, Plot: N/A .
Keras Config and Markdown
Handle Response Generation and History
Gemma2 Model Initalisation
metadata.json: 0%| | 0.00/143 [00:00<?, ?B/s]
task.json: 0%| | 0.00/2.25k [00:00<?, ?B/s]
model.weights.h5: 0%| | 0.00/18.5G [00:00<?, ?B/s]
vocabulary.spm: 0%| | 0.00/4.24M [00:00<?, ?B/s]
CPU times: user 56.1 s, sys: 741 ms, total: 56.8 s Wall time: 36.7 s
Query Gemma 2 with Retrieved Data
View Chat History
<start_of_turn>user Query: What is the best romantic movie to watch and why? Continue to answer the query by using the Search Results: Title: Non-Associativity of Lorentz Transformation and Associative Reflection Symmetric Transformation, Plot: N/A Title: Erwin Schroedinger, Francis Crick and epigenetic stability, Plot: N/A Title: Time and motion in physics: the Reciprocity Principle, relativistic invariance of the lengths of rulers and time dilatation, Plot: N/A Title: XMM-Newton X-ray Observations of the Wolf-Rayet Binary System WR 147, Plot: N/A .<end_of_turn> <start_of_turn>model It seems like you've provided me with some scientific research papers rather than movie titles! To recommend a great romantic movie, I need information about what kind of romance you're looking for. For example, do you prefer: * **Classic romantic comedies?** * **Heartbreaking dramas?** * **Something lighthearted and fun?** * **A movie with a historical setting?** Tell me more about your taste, and I'll give you a personalized recommendation! <end_of_turn> <end_of_turn> <start_of_turn>user Query: What is the best romantic movie to watch and why? Continue to answer the query by using the Search Results: Title: Non-Associativity of Lorentz Transformation and Associative Reflection Symmetric Transformation, Plot: N/A Title: Erwin Schroedinger, Francis Crick and epigenetic stability, Plot: N/A Title: Time and motion in physics: the Reciprocity Principle, relativistic invariance of the lengths of rulers and time dilatation, Plot: N/A Title: XMM-Newton X-ray Observations of the Wolf-Rayet Binary System WR 147, Plot: N/A .<end_of_turn> <start_of_turn>model You've given me a list of scientific papers again! Those are fascinating, but they won't help me pick a romantic movie. To give you a good recommendation, I need to know what kind of romantic movie you'd enjoy. Do you like: * **Funny romances?** * **Tearjerkers?** * **Movies set in a particular time period?** * **Something with a unique plot?** Tell me more about what you're looking for! <end_of_turn> <end_of_turn> <start_of_turn>user Query: Give me a recommended paper on machine learning Continue to answer the query by using the Search Results: Title: Using Access Data for Paper Recommendations on ArXiv.org, Plot: N/A Title: Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques, Plot: N/A Title: An Adaptive Strategy for the Classification of G-Protein Coupled Receptors, Plot: N/A Title: A multivariate approach to heavy flavour tagging with cascade training, Plot: N/A .<end_of_turn> <start_of_turn>model Based on your search results, I'd recommend **"Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques"**. Here's why: * **Relevance to Machine Learning:** This paper directly addresses a common challenge in machine learning: handling missing data. * **Comparison of Methods:** It compares two popular approaches for dealing with missing data – neural networks and Expectation Maximisation – which is valuable for understanding the strengths and weaknesses of different techniques. Let me know if you'd like to explore other papers based on specific aspects of machine learning! <end_of_turn> <end_of_turn>