Notebooks
M
MongoDB
Zero To Hero With Genai With Mongodb Openai

Zero To Hero With Genai With Mongodb Openai

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

From Zero🙎🏾to Hero🦸🏾: Mastering Generative AI with MongoDB


Open In Colab

AI Learning Hub For Developers

What to Expect

Part 1: Foundations of Generative AI & Search

  • Comprehensive understanding of Generative AI applications
  • In-depth code walkthroughs of various retrieval mechanisms including text search, vector search, and hybrid search
  • Exploration of Voyage AI and embedding generation techniques

Part 2: Building Intelligent Search Systems

  • Hands-on implementation of semantic search mechanisms
  • Practical development of Retrieval Augmented Generation (RAG) systems

Part 3: Advanced AI Agents & Integration

  • Introduction to AI Agents and their capabilities
  • Step-by-step implementation of Agentic RAG with MongoDB
  • OPENAI Agent SDK: Build AI Agents with OpenAI Agent SDK

Part 4: Agentic Chat System

  • Agentic Chatbot that can answer queries
  • Implement persistent chat history tracking
  • Preserve conversation context across interactions
  • Implement advanced query-answering mechanisms

How to use this notebook:

  • Execute each cell block sequentially
  • Look out for checkpoints ⛳ for key learning takeaways
  • Look out for key information 🔑 for insights that are useful in LLM application development
  • Ensure you use external link provided to gain access to MongoDB Free Account, Voyage AI API key or any other resources requried

💼 Use Case: Virtual Primary Care Assistant for Medical Pharmarcy


Overview

The Virtual Primary Care Assistant leverages MongoDB's vector search capabilities to provide CVS Pharmacy customers with reliable medical information and personalized guidance based on medication reviews and health conditions. This intelligent assistant integrates with a Medical Pharmarcy's existing customer data infrastructure to offer a comprehensive health support experience.

Key Features

  • Medication Information Retrieval: Users can ask questions about medications and receive accurate information about dosage, side effects, and drug interactions.
  • Experience-Based Insights: Leverages real patient reviews and experiences to provide context-rich responses about medication effectiveness for specific conditions.
  • Symptom Assessment: Helps users understand possible conditions based on symptoms and suggests when to seek professional medical care.
  • Personalized Recommendations: Provides tailored guidance by considering the user's prescription history, health profile, and previous interactions.

Technical Implementation

  • MongoDB serves as the knowledge base, storing structured medication data and vector embeddings of patient reviews
  • Vector search enables semantic understanding of user queries about medications and conditions
  • Hybrid search combines keyword and semantic matching for optimal retrieval of relevant information
  • RAG architecture integrates retrieval results with LLM processing to generate accurate, contextual responses
  • Agentic capabilities allow the system to determine when to search for information versus when to recommend professional consultation

Business Value

  • Reduces call center volume by answering common medication questions
  • Improves medication adherence through accessible information and reminders
  • Enhances customer satisfaction by providing 24/7 access to reliable health guidance
  • Generates insights on common customer concerns to inform product offerings and services

image.png

Part 1: Foundations of Generative AI & Search


  • Understanding Generative AI Applications
    • Core concepts and architecture
    • LLMs and their capabilities
    • Real-world use cases and limitations
  • Retrieval Mechanisms Deep Dive
    • Traditional text search techniques
    • Vector search fundamentals
    • Hybrid search approaches and when to use each
  • Embedding Generation with Voyage AI
    • Introduction to embeddings and their importance
    • Working with Voyage AI embedding models
    • Optimizing embedding generation for different content types

Step 1: Importing Libraries

Install the necessary libraries for the notebook

  • pymongo: MongoDB Python driver, this will be used to connect to the MongoDB Atlas cluster.
  • voyageai: Voyage AI Python client. This will be used to generate the embeddings for the wikipedia data.
  • pandas: Data manipulation and analysis, this will be used to load the wikipedia data and prepare it for the vector search.
  • datasets: Load and manage datasets, this will be used to load the wikipedia data.
  • matplotlib: Plotting and visualizing data, this will be used to visualize the data.
[ ]

Creating the function set_env_securely to securely get and set environment variables. This is a helper function to get and set environment variables securely.

[17]

Step 2: Data Loading and Preparation

For this Virtual Primary Care Assistant, we're working with two complementary datasets:

  1. ChatDoctor-HealthCareMagic-100k
  • This dataset contains doctor-patient conversations about medical conditions and treatments
  • It provides authentic patient questions and professional medical responses
  • We use this data to train our system to understand medical queries and provide informed responses
  1. Drug Reviews Dataset
  • Contains patient-reported experiences with various medications
  • Includes information about conditions treated, effectiveness ratings, and detailed reviews
  • Provides valuable real-world insights on medication effects and side effects

The structure of these datasets is as follows:

Healthcare Conversation Dataset:

  • input: Patient's medical question or symptom description
  • output: Doctor's medical advice or response

Drug Reviews Dataset:

  • drugName: Name of the medication
  • condition: Medical condition being treated
  • review: Patient's detailed experience with the medication
  • rating: Numerical rating (1-10) of the patient's satisfaction

These datasets provide complementary information that allows our system to understand medical questions, provide contextual information about medications, and offer personalized guidance based on real patient experiences.

[18]
[19]
[20]
[21]
[22]
[ ]
[ ]

Step 4: Embedding Generation with Voyage AI

In this step, we will generate the embeddings for the wikipedia data using the Voyage AI API.

We will use the voyage-3-large model to generate the embeddings.

One importnat thing to note is that althoguh you are expected to have credit card for the voyage api, your first 200 million tokens are free for every account, and subsequent usage is priced on a per-token basis.

Go here for more information on getting your API key and setting it in the environment variables.

[29]
[30]

The get_embedding function is used to generate the embeddings for the text using the voyage-3-large model.

The function takes a text string and a task prefix as input and returns the embedding vector as a list of floats.

The function also takes an optional argument input_type which can be set to "document" or "query" to specify the type of input to the model.

[31]
[ ]
[ ]
[ ]
[ ]

Step 5: MongoDB (Operational and Vector Database)

MongoDB acts as both an operational and vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.

Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

[36]
[37]
[ ]

Step 6: Index Creation

What is a Vector Search Index and Why Do We Need It?

A vector search index organizes high-dimensional embeddings for efficient similarity searches. Without it, finding similar vectors would require exhaustive comparisons against every vector in your database—becoming impractical at scale. These indexes enable fast semantic searches by organizing vectors based on their geometric relationships, essential for RAG, recommendation systems, and semantic search.

Understanding HNSW (Hierarchical Navigable Small Worlds)

HNSW is MongoDB Atlas Vector Search's algorithm of choice for approximate nearest neighbor searches:

  • Creates a multi-layered graph connecting vectors to their nearest neighbors
  • Enables logarithmic search complexity through a hierarchical approach
  • Balances speed and accuracy via configurable parameters
  • Provides excellent performance characteristics for production applications

What is a Search Index and Why Do We Need It?

Traditional search indexes improve retrieval speed for non-vector operations:

  • Fast filtering on metadata fields (dates, categories, etc.)
  • Supporting hybrid search combining keywords and semantics
  • Optimizing sorting and standard database operations

In this step, we will create two critical indexes for our Wikipedia dataset:

  1. A vector search index (Float32 ANN Index) for the embedding field to enable semantic similarity searches
  2. A traditional search index on text fields to support keyword-based filtering and hybrid search approaches

Together, these indexes will form the foundation of our information retrieval system, allowing for both precise keyword matching and nuanced semantic understanding.

Create vector search indexes

[39]
[40]
[41]
[ ]

Create Search Index

[43]
[44]
[45]
[ ]
[ ]

Step 7: Data Ingestion

[ ]

Step 8: Implementing Powerful Full-Text Search Capabilities

In this step, we'll develop a robust full-text search function that leverages MongoDB's text search capabilities. This function will enable precise keyword matching across our Wikipedia dataset, allowing users to find exact information quickly and efficiently.

[49]
[50]
[ ]

Step 9: Define Semantic Search Function (Vector Search)

The semantic_search_with_mongodb function performs a vector search in the MongoDB collection based on the user query.

Semantic search and vector search are intrinsically connected—semantic search is the application of vector search technology to understand the meaning behind queries rather than just matching keywords. Vector search powers semantic search by converting text into numerical vector representations (embeddings) that capture semantic meaning, allowing the system to find content with similar meanings even when the exact words differ.

  • user_query parameter is the user's query string.
  • collection parameter is the MongoDB collection to search.
  • top_n parameter is the number of top results to return.
  • vector_search_index_name parameter is the name of the vector search index to use for the search.

The numCandidates parameter is the number of candidate matches to consider. This is set to 150 to match the number of candidate matches to consider in the Elasticsearch vector search.

Another point to note is the queries in MongoDB are performed using the aggregate function enabled by the MongoDB Query Language(MQL).

This allows for more flexibility in the queries and the ability to perform more complex searches. And data processing operations can be defined as stages in the pipeline. If you are a data engineer, data scientist or ML Engineer, the concept of pipeline processing is a key concept.

[52]
[53]
[ ]

⛳ Knowledge Checkpoint:

You now understand semantic search and vector search, including:

  • How semantic search leverages vector search technology to find content based on meaning rather than exact keyword matches
  • The relationship between text embeddings and vector search functionality
  • How MongoDB implements vector search through the $vectorSearch operator
  • The role of similarity metrics in determining relevance between queries and documents
  • Why vector search enables more natural language understanding in search systems
  • The practical implementation of semantic search in a MongoDB pipeline

This foundation will be essential as we progress toward building more sophisticated retrieval and generation systems.

Step 10: Define Hybrid Search Function

The hybrid_search_with_mongodb function conducts a hybrid search on a MongoDB Atlas collection that combines a vector search and a full-text search using Atlas Search.

In the MongoDB hybrid search function, there are two weights:

  • vector_weight = 0.5: This weight scales the score obtained from the vector search portion.
  • full_text_weight = 0.5: This weight scales the score from the full-text search portion.

Note: In the MongoDB hybrid search function, two weights:

	- `vector_weight`
- `full_text_weight`

They are used to control the influence of each search component on the final score.

Here's how they work:

Purpose: The weights allow you to adjust how much the vector (semantic) search and the full-text search contribute to the overall ranking. For example, a higher full_text_weight means that the full-text search results will have a larger impact on the final score, whereas a higher vector_weight would give more importance to the vector similarity score.

Usage in the Pipeline: Within the aggregation pipeline, after retrieving results from each search type, the function computes a reciprocal ranking score for each result (using an expression like 1/(rank + 60)). This score is then multiplied by the corresponding weight:

Vector Search:

	"vs_score": {
  "$multiply": [ vector_weight, { "$divide": [1.0, { "$add": ["$rank", 60] } ] } ]
}

Full-Text Search:

	"fts_score": {
  "$multiply": [ full_text_weight, { "$divide": [1.0, { "$add": ["$rank", 60] } ] } ]
}

Finally, these weighted scores are combined (typically by adding them together) to produce a final score that determines the ranking of the documents.

Impact: By adjusting these weights, you can fine-tune the search results to better match your application's needs. For instance, if the full-text component is more reliable for your dataset, you might set full_text_weight higher than vector_weight.

The weights in the MongoDB function allow you to balance the contributions from vector-based and full-text search components, ensuring that the final ranking score reflects the desired importance of each search method.

[55]
[56]
[ ]

⛳ Knowledge Checkpoint:

You now understand how to implement hybrid search by:

  1. Combining vector search for semantic understanding with text search for keyword matching
  2. Weighting these different search strategies based on query characteristics
  3. Using MongoDB's aggregation pipeline to merge and rank results from different search methods
  4. Calculating combined relevance scores that leverage both search technologies

Part 2: Building Intelligent Search Systems (RAG)


  • Practical development of Retrieval Augmented Generation (RAG) systems

Step 1: Importing Libraries

[ ]
[59]

Step 2: Setting up the LLM

[60]
[ ]

Step 3: Setting Up The RAG Pipeline

This step establishes our Retrieval-Augmented Generation (RAG) system, which enhances LLM responses with contextually relevant information:

  1. Define the custom_rag_pipeline function
  • Create a comprehensive function that orchestrates all components of our RAG system
  • Establish parameters for search strategy, result count, and response formatting
  1. Implement the Retrieval component
  • Process the user's query to identify key information needs
  • Execute our hybrid search mechanism (combining vector and keyword search)
  • Apply relevance filtering to ensure only high-quality results are used
  1. Process retrieved documents for context
  • Extract and consolidate the most relevant information from search results
  • Format the retrieved content to optimize context window usage
  • Structure the information to provide clear attribution and sources
  1. Augment LLM prompt with retrieved context
  • Combine the user's original query with the retrieved information
  • Apply prompt engineering techniques to guide the model's use of context
  • Ensure the model distinguishes between provided context and its own knowledge
  1. Generate and refine the final response
  • Process the LLM's output to ensure accuracy and relevance
  • Format the response according to user preferences
  • Include citations and references to source documents when appropriate
[62]
[ ]

⛳ Knowledge Checkpoint: RAG Pipeline Implementation

You now understand how to build a complete Retrieval-Augmented Generation pipeline with MongoDB, including:

  • Retrieving relevant documents using hybrid search that combines semantic and keyword matching
  • Formatting retrieved documents with proper citations and source attribution
  • Creating effective prompts that guide the LLM to use the retrieved context appropriately
  • Configuring the LLM to prioritize factual responses based on provided information
  • Managing the end-to-end flow from user query to contextualized LLM response

This pattern enables applications to leverage both the structured data in your MongoDB collections and the reasoning capabilities of large language models while maintaining accuracy and traceability.

Part 3: Advanced AI Agents & Integration


Step 1: Importing Libraries

[ ]

Step 2: Creating A Minimal Agent

An agent is a computational entity capable of acting autonomously on behalf of another entity to achieve specific objectives. It accomplishes these goals by processing inputs from its environment and leveraging available technical resources such as microservices, REST APIs, and functions.

In the context of generative AI, the definition extends to include large language models (LLMs) that are guided by system instructions, equipped with various tools, and augmented with memory components.

It is important to note that the definition of an agent is not standardized. Nonetheless, there is a growing consensus that various software systems can exhibit agentic characteristics, suggesting that agency exists on a spectrum.

[TODO: Include image of agentic spectrum and you can add levels]

Two main modules from the OpenAI SDK are used:

  1. Agent: The Agent module in the OpenAI SDK provides a robust framework for creating autonomous computational entities. The Agent module streamlines the process of building intelligent agents by providing a well-defined structure that supports customization, scalability, and integration with external tools and services. All agent will have some common properties such as: name, instructions, model and tools.

  2. Runner: The execution engine that drives agent interactions. It handles the entire lifecycle of an agent’s run—from initiating LLM calls to processing outputs and managing transitions

  • Runner Execution Methods:
    • run(): An asynchronous method that executes the agent’s process and returns a RunResult.
    • run_sync(): A synchronous version that internally calls run().
    • run_streamed(): Executes the agent asynchronously in streaming mode, returning events as they are generated by the LLM, and ultimately a complete RunResultStreaming object.

Note: Using run_sync() within a Jupyter Notebook or Google Colab environment will not work as there's already an event loop within a Jupter environment

Below, we will create a Minimal Agent.

A Minimal Agent is a large language model equipped with an instructional or system prompt that continuously operates in a loop until the desired outcome is achieved.

Our minimal agent is a deep research agent that's given the name "Virtual Primary Care Assistant", assigned the OpenAI o3-mini model and provided with a detailed instruction on how it's meant to behave and provide outputs.

[65]
[66]
[ ]
[ ]

Step 3: Agentic RAG: AI Agents with Retrieval Tools

[ ]
[141]
[142]
[ ]
[ ]

Step 4: Robust Agent (Multipe tools)

[145]

Let's update our agent instruction to ensure it knows when to utilize the right tools

[146]
[147]
[ ]
[149]
[ ]

image.png

Step 5: Agent as Tools (Ochestration)

[101]
[102]
[103]
[129]
[130]
[132]
[ ]

image.png

Part 4: Agentic Chat System


This section demonstrates an Agentic Chat System that enhances the virtual primary care assistant by maintaining a complete conversation history. The system features:

  • Persistent Chat History: Every interaction, including the user’s input and the agent’s response, is stored along with a timestamp.
  • Contextual Input: On each turn, the complete conversation history is appended to the agent's input, ensuring that the context is preserved throughout the conversation.
  • Session Management with Thread IDs: Each message is tagged with a thread ID to uniquely identify the session, making it easy to track and retrieve conversation history.
  • Ordered Retrieval: The chat history can be retrieved by providing a thread ID, with all records ordered by their timestamps.

Below is the complete code implementation for the Agentic Chat System.

[ ]
[153]
[154]
[155]
[ ]