Notebooks
M
MongoDB
Rag Chatbot With Cohere And Mongodb

Rag Chatbot With Cohere And Mongodb

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

Build Chatbots That Know Your Business (with MongoDB and Cohere)

Open In Colab

What you will learn:

  • How to empower leverage semantic search on customer or operational data in MongoDB Atlas.
  • Pass retrieved data to Cohere’s Command R+ generative model for retrieval-augmented generation (RAG).
  • Develop and deploy a RAG-optimized user interface for your app.
  • Create a conversation data store for your RAG chatbot using MongoDB

Use Case: Develop an advanced chatbot assistant that provides asset managers with information and actionable insights on technology company market reports.

Introduction

  • What is Cohere?
  • What is MongoDB?
  • How Cohere and MongoDB work together?

What is Cohere?

cohere_overview.png

What is MongoDB?

Screenshot 2024-07-24 at 11.44.56.png

What exactly are we showing today?

image.png

Step 1: Install libaries and Set Environment Variables

🚨Critical Security Reminder: Safeguard your production environment by never committing sensitive information, such as environment variable values, to public repositories. This practice is essential for maintaining the security and integrity of your systems.

Libraries:

  • cohere: A Python library for accessing Cohere's large language models, enabling natural language processing tasks like text generation, classification, and embedding.
  • pymongo: The recommended Python driver for MongoDB, allowing Python applications to interact with MongoDB databases for data storage and retrieval.
  • datasets: A library by Hugging Face that provides easy access to a wide range of datasets for machine learning and natural language processing tasks. *tqdm: A fast, extensible progress bar library for Python, useful for displaying progress in long-running operations or loops.
[26]
[27]

Step 2: Data Loading and Preparation

Dataset Information

This dataset contains detailed information about multiple technology companies in the Information Technology sector. For each company, the dataset includes:

  1. Company name and stock ticker symbol
  2. Market analysis reports for recent years (typically 2023 and 2024), which include:
  • Title and author of the report
  • Date of publication
  • Detailed content covering financial performance, product innovations, market position, challenges, and future outlook
  • Stock recommendations and price targets
  1. Key financial metrics such as:
  • Current stock price
  • 52-week price range
  • Market capitalization
  • Price-to-earnings (P/E) ratio
  • Dividend yield
  1. Recent news items, typically including:
  • Date of the news
  • Headline
  • Brief summary

The market analysis reports provide in-depth information about each company's performance, innovations, challenges, and future prospects. They offer insights into the companies' strategies, market positions, and potential for growth.

[28]
[29]
[30]
[31]

Step 3: Embedding Generation with Cohere

[32]
Generating embeddings: 100%|██████████| 63/63 [00:07<00:00,  8.85it/s]
We just computed 63 embeddings.

[33]

Step 4: MongoDB Vector Database and Connection Setup

MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
  3. Create the database: asset_management_use_case.
  4. Within the database asset_management_use_case, create the collection market_reports.
  5. Create a vector search index named vector_index for the ‘listings_reviews’ collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.

Your vector search index created on MongoDB Atlas should look like below:

	{
  "fields": [
    {
      "numDimensions": 1024,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}


Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

[34]
[35]
Connection to MongoDB successful
[36]
DeleteResult({'n': 63, 'electionId': ObjectId('7fffffff000000000000002b'), 'opTime': {'ts': Timestamp(1721913981, 63), 't': 43}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1721913981, 63), 'signature': {'hash': b'cU;+\xe3\xbdRc\t\x80\xad\x03\x16\x11\x18\xe6s\xebF\x01', 'keyId': 7353740577831124994}}, 'operationTime': Timestamp(1721913981, 63)}, acknowledged=True)

Step 5: Data Ingestion

MongoDB's Document model and its compatibility with Python dictionaries offer several benefits for data ingestion.

  • Document-oriented structure:
    • MongoDB stores data in JSON-like documents: BSON(Binary JSON).
    • This aligns naturally with Python dictionaries, allowing for seamless data representation using key value pair data structures.
  • Schema flexibility:
    • MongoDB is schema-less, meaning each document in a collection can have a different structure.
    • This flexibility matches Python's dynamic nature, allowing you to ingest varied data structures without predefined schemas.
  • Efficient ingestion:
    • The similarity between Python dictionaries and MongoDB documents allows for direct ingestion without complex transformations.
    • This leads to faster data insertion and reduced processing overhead.

Screenshot 2024-07-24 at 12.33.36.png

[37]
Data ingestion into MongoDB completed

Step 6: MongoDB Query language and Vector Search

Query flexibility

MongoDB's query language is designed to work well with document structures, making it easy to query and manipulate ingested data using familiar Python-like syntax.

Aggregation Pipeline

MongoDB's aggregation pipelines is a powerful feature of the MongoDB Database that allows for complex data processing and analysis within the database. Aggregation pipeline can be thought of similarly to pipelines in data engineering or machine learning, where processes operate sequentially, each stage taking an input, performing operations, and providing an output for the next stage.

Stages

Stages are the building blocks of an aggregation pipeline. Each stage represents a specific data transformation or analysis operation. Common stages include:

  • $match: Filters documents (similar to WHERE in SQL)
  • $group: Groups documents by specified fields
  • $sort: Sorts the documents
  • $project: Reshapes documents (select, rename, compute fields)
  • $limit: Limits the number of documents
  • $unwind: Deconstructs array fields
  • $lookup: Performs left outer joins with other collections

Screenshot 2024-07-25 at 12.26.20.png

[44]

Step 7: Add the Cohere Reranker

Cohere rerank functions as a second stage search that can improve the precision of your first stage search results

image.png

[45]
[46]
[47]

Step 8: Handling User Queries

[48]
Final answer:
Here is an overview of the companies with negative market reports or sentiment that might deter long-term investment:

## GreenEnergy Corp (GRNE):
- **Challenges**: Despite solid financial performance and a positive market position, GRNE faces challenges due to the volatile political environment and rising trade tensions, resulting in increased tariffs and supply chain disruptions. 
- **Regulatory Scrutiny**: The company is under scrutiny for its data handling practices, raising concerns about potential privacy breaches and ethical dilemmas.

## BioEngineering Corp (BENC):
- **Regulatory Hurdles**: BENC faces delays in obtaining approvals for certain products due to stringent healthcare regulations, impacting their time-to-market.
- **Reimbursement and Pricing Pressures**: As healthcare costs rise, the company must carefully navigate pricing strategies to balance accessibility and profitability.
- **Research and Development Expenses**: BENC has experienced a significant increase in R&D expenses, which may impact its ability to maintain a competitive pricing strategy.

## QuantumSensor Corp (QSCP):
- **Supply Chain Disruptions**: QSCP has faced supply chain issues due to global logistics problems and geopolitical tensions, impacting production and delivery.
- **Regulatory Scrutiny**: The company is under scrutiny for its data collection and handling practices, with potential privacy and ethical concerns.
- **Technical Workforce Challenges**: Attracting and retaining skilled talent in a competitive market has been challenging for QSCP.
[49]
start=122 end=145 text='GreenEnergy Corp (GRNE)' document_ids=['doc_0']
start=151 end=161 text='Challenges' document_ids=['doc_0']
start=173 end=231 text='solid financial performance and a positive market position' document_ids=['doc_0']
start=266 end=322 text='volatile political environment and rising trade tensions' document_ids=['doc_0']
start=337 end=384 text='increased tariffs and supply chain disruptions.' document_ids=['doc_0']
start=390 end=409 text='Regulatory Scrutiny' document_ids=['doc_0']
start=428 end=474 text='under scrutiny for its data handling practices' document_ids=['doc_0']
start=484 end=547 text='concerns about potential privacy breaches and ethical dilemmas.' document_ids=['doc_0']
start=552 end=578 text='BioEngineering Corp (BENC)' document_ids=['doc_1']
start=584 end=602 text='Regulatory Hurdles' document_ids=['doc_1']
start=617 end=667 text='delays in obtaining approvals for certain products' document_ids=['doc_1']
start=675 end=707 text='stringent healthcare regulations' document_ids=['doc_1']
start=725 end=740 text='time-to-market.' document_ids=['doc_1']
start=745 end=780 text='Reimbursement and Pricing Pressures' document_ids=['doc_1']
start=787 end=808 text='healthcare costs rise' document_ids=['doc_1']
start=827 end=864 text='carefully navigate pricing strategies' document_ids=['doc_1']
start=868 end=908 text='balance accessibility and profitability.' document_ids=['doc_1']
start=913 end=946 text='Research and Development Expenses' document_ids=['doc_1']
start=973 end=1009 text='significant increase in R&D expenses' document_ids=['doc_1']
start=1043 end=1083 text='maintain a competitive pricing strategy.' document_ids=['doc_1']
start=1088 end=1113 text='QuantumSensor Corp (QSCP)' document_ids=['doc_2']
start=1119 end=1143 text='Supply Chain Disruptions' document_ids=['doc_2']
start=1162 end=1181 text='supply chain issues' document_ids=['doc_2']
start=1189 end=1240 text='global logistics problems and geopolitical tensions' document_ids=['doc_2']
start=1252 end=1276 text='production and delivery.' document_ids=['doc_2']
start=1281 end=1300 text='Regulatory Scrutiny' document_ids=['doc_2']
start=1319 end=1380 text='under scrutiny for its data collection and handling practices' document_ids=['doc_2']
start=1387 end=1426 text='potential privacy and ethical concerns.' document_ids=['doc_2']
start=1431 end=1461 text='Technical Workforce Challenges' document_ids=['doc_2']
start=1465 end=1528 text='Attracting and retaining skilled talent in a competitive market' document_ids=['doc_2']

Step 9: Using MongoDB as a Data Store for Conversation History

[50]
[51]

Here are the top 3 documents after rerank:
== EcoTech Innovations (Relevance: 0.0001)
== GreenEnergy Systems (Relevance: 0.0001)
== QuantumComputing Inc (Relevance: 0.0000)
Final answer:
I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided. 

## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector. 

## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector. 

## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry. 

Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.

Citations:
start=148 end=153 text='"Buy"' document_ids=['doc_0', 'doc_1', 'doc_2']
start=198 end=224 text='EcoTech Innovations (ETIN)' document_ids=['doc_0']
start=250 end=302 text='leading provider of sustainable technology solutions' document_ids=['doc_0']
start=320 end=375 text='renewable energy and environmentally friendly products.' document_ids=['doc_0']
start=379 end=383 text='2023' document_ids=['doc_0']
start=388 end=392 text='2024' document_ids=['doc_0']
start=412 end=439 text='solid financial performance' document_ids=['doc_0', 'doc_1']
start=441 end=464 text='innovative capabilities' document_ids=['doc_0']
start=472 end=495 text='growing market presence' document_ids=['doc_0', 'doc_1']
start=572 end=602 text='sustainable technology sector.' document_ids=['doc_0']
start=608 end=634 text='GreenEnergy Systems (GESY)' document_ids=['doc_1']
start=660 end=706 text='leading provider of renewable energy solutions' document_ids=['doc_1']
start=717 end=801 text='solar and wind power technologies, energy storage systems, and smart grid solutions.' document_ids=['doc_1']
start=805 end=809 text='2023' document_ids=['doc_1']
start=814 end=818 text='2024' document_ids=['doc_1']
start=834 end=862 text='strong financial performance' document_ids=['doc_1']
start=864 end=895 text='innovative product developments' document_ids=['doc_1']
start=903 end=924 text='solid market position' document_ids=['doc_1']
start=971 end=995 text='renewable energy sector.' document_ids=['doc_1']
start=1001 end=1029 text='QuantumComputing Inc. (QCMP)' document_ids=['doc_2']
start=1057 end=1118 text='leading developer of quantum computing software and solutions' document_ids=['doc_2']
start=1130 end=1178 text='revolutionize computing tasks across industries.' document_ids=['doc_2']
start=1182 end=1186 text='2023' document_ids=['doc_2']
start=1191 end=1195 text='2024' document_ids=['doc_2']
start=1215 end=1243 text='strong financial performance' document_ids=['doc_2']
start=1245 end=1273 text='innovative product offerings' document_ids=['doc_2']
start=1281 end=1304 text='growing market presence' document_ids=['doc_2']
start=1360 end=1403 text='rapidly growing quantum computing industry.' document_ids=['doc_2']
[52]
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can tell you about some companies that have been recommended as "Buy" investments in the documents provided. 

## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity. 

## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market. 

## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects. 

Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and therefore cannot comment on what the single "best" investment is. However, I can provide you with some companies that have been recommended as "Buy" investments in the documents provided. 

## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to businesses worldwide. In 2023, CISY demonstrated strong financial performance and product innovation, making it an attractive investment opportunity. 

## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported strong financial performance, innovative product developments, and strategic partnerships, positioning it well in a rapidly growing and competitive market. 

## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023, BTCI demonstrated solid financial growth, product innovations, and a strengthened market position, making it an attractive investment option for long-term growth prospects. 

Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I can provide information on companies that have been recommended as "Buy" investments in the documents provided. 

## CloudInfra Systems (CISY)
CloudInfra Systems is a leading provider of cloud computing solutions, offering infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) to a diverse range of businesses. In 2023, CISY demonstrated strong financial performance and product innovation, positioning it well in the competitive cloud market. 

## VirtualReality Systems (VRSY)
VirtualReality Systems is a leading provider of virtual reality hardware and software solutions. In 2023, VRSY reported robust financial results, innovative product developments, and strategic partnerships, making it a solid investment choice for those with a long-term investment horizon. 

## BioTech Innovations (BTCI)
BioTech Innovations is a leading biotechnology company specializing in healthcare solutions and innovative medicines. In 2023 and 2024, BTCI demonstrated solid financial growth, product innovations, and an improved market position, making it an attractive investment opportunity for long-term growth. 

Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------
USER: What is the best investment to make why?
-------------------------
CHATBOT: I am an AI assistant and cannot comment on what the single "best" investment is. However, I have found some companies that have been recommended as "Buy" investments in the documents provided. 

## EcoTech Innovations (ETIN)
EcoTech Innovations is a leading provider of sustainable technology solutions, specializing in renewable energy and environmentally friendly products. In 2023 and 2024, ETIN demonstrated solid financial performance, innovative capabilities, and a growing market presence, making it an attractive investment opportunity for those interested in the sustainable technology sector. 

## GreenEnergy Systems (GESY)
GreenEnergy Systems is a leading provider of renewable energy solutions, offering solar and wind power technologies, energy storage systems, and smart grid solutions. In 2023 and 2024, GESY reported strong financial performance, innovative product developments, and a solid market position, positioning it well for future growth in the renewable energy sector. 

## QuantumComputing Inc. (QCMP)
QuantumComputing Inc. is a leading developer of quantum computing software and solutions, aiming to revolutionize computing tasks across industries. In 2023 and 2024, QCMP demonstrated strong financial performance, innovative product offerings, and a growing market presence, making it an attractive investment opportunity in the rapidly growing quantum computing industry. 

Please note that these recommendations are based on specific reports and may not consider all factors. It is always advisable to conduct thorough research and consult professional advice before making any investment decisions.
-------------------------