Notebooks
M
MongoDB
Implementing Working Memory With Tavily And Mongodb

Implementing Working Memory With Tavily And Mongodb

agentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

How To Implement Working Memory in AI Applications With Cohere, Tavily and MongoDB

Open In Colab

Memory is the cornerstone on which all forms of intelligence emerge and evolve. It creates the foundation for human cognition and artificial systems to build complex understanding. For humans, memory is a dynamic biological process of encoding, storing, and retrieving information through neural networks, shaping our ability to learn, adapt, and make decisions.

For computational systems in the modern AI application landscape, such as LLM-powered chatbots, AI Agents, and Agentic systems, memory is the foundation for their reliability, performance, and applicability, determining their capacity to maintain context, learn from interactions, and exhibit consistent, intelligent behavior.

In this tutorial, we will cover:

  • Memory in AI Agents and Agentic Systems
  • How to implement working memory in agentic systems
  • How to use Tavily and MongoDB to implement working memory
  • A practical use case: implementing an AI sales assistant with real-time access to internal product catalogs and online information, showcasing working memory's role in personalized recommendations and user interactions.
  • Benefits of working memory in AI applications in real-time scenarios.

Your ability to understand memory from a holistic perspective and the ability to implement various functionalities of memory within computational systems positions you at a critical intersection of cognitive architecture design and practical AI development, making your expertise invaluable as these paradigms increase and become the dominant form factor of modern AI systems.

Install libaries and set environment variables

[1]
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.9/89.9 kB 3.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 248.7/248.7 kB 9.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 26.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 15.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.1/13.1 MB 22.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 4.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 313.6/313.6 kB 13.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 16.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 6.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 7.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 13.6 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 6.9 MB/s eta 0:00:00
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.10.1 requires pandas<2.2.3dev0,>=2.0, but you have pandas 2.2.3 which is incompatible.
gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.
google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.
[2]

Step 1 - 5: Creating a knowledge base (long-term memory)

In this step, the aim is to create a knowledge base consisting of a product accessible by the research assistant via retrieval mechanisms. The retrieval mechanism used in this tutorial is vector search. MongoDB is used as an operational and vector database for the sales assistant's knowledge base. This means we can conduct a semantic search between the vector embeddings of each product generated from concatenated existing product attributes and an embedding of a user’s query passed into the assistant.

image.png

Step 1: Data Loading

The process begins with data ingestion into MongoDB. The product data, including attributes like product name, category, description, and technical details, is structured into a pandas DataFrame.

The product data used in this example is sourced from the Hugging Face Datasets library using the load_dataset() function. Specifically, it is obtained from the "philschmid/amazon-product-descriptions-vlm" dataset, which contains a vast collection of Amazon product descriptions and related information.

[3]
README.md:   0%|          | 0.00/1.22k [00:00<?, ?B/s]
train-00000-of-00001.parquet:   0%|          | 0.00/47.6M [00:00<?, ?B/s]
Generating train split:   0%|          | 0/1345 [00:00<?, ? examples/s]
[4]

Step 2: Data Preparation

[5]
[6]
[7]

Step 3: Embedding Generation With Cohere

To facilitate semantic search capabilities, each product document is enriched with embeddings. The get_embedding() function utilizes the Cohere API to generate a numerical representation of each product's semantic meaning. This function leverages the embed-english-v3.0 model from Cohere to embed the combined textual information stored in each document's product_semantics field.

[8]
Enter your Cohere API Key: ··········
[9]
[10]
Embeddings generated successfully

The resulting embeddings are then stored within a dedicated 'embedding' field in each product document. This step enables the system to search for products based on their semantic similarity, allowing for more nuanced and relevant recommendations.

[11]

Step 4: Data Ingestion To MongoDB

MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

  1. First, register for a MongoDB Atlas account. For existing users, sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.

Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.

[12]
Enter your MONGO URI: ··········
[13]
[14]
Connection to MongoDB successful
[15]
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff0000000000000038'), 'opTime': {'ts': Timestamp(1731438198, 1), 't': 56}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1731438198, 1), 'signature': {'hash': b",8\xe2#{UQ\xf3\xc3\xbc\x91Q!\x9a!\xb7 \x04'\xfc", 'keyId': 7390008424139849730}}, 'operationTime': Timestamp(1731438198, 1)}, acknowledged=True)

This DataFrame is then converted into a list of dictionaries representing a product. The insert_many() method from the pymongo library is then used to efficiently insert these product documents into the MongoDB collection, named products within the amazon_products database. This crucial step establishes the foundation of the AI sales assistant's knowledge base, making the product data accessible for downstream retrieval and analysis processes.

[16]
Data ingestion into MongoDB completed

Step 5: Vector Index Creation

Retrieving data from MongoDB involves leveraging both traditional queries and vector search. For traditional queries, the pymongo library provides methods like find_one() and find() to retrieve documents based on specific criteria.

MongoDB Atlas Vector Search is used for semantic-based retrieval. This feature allows for efficient similarity searches using the pre-calculated product embeddings. The system can retrieve products that are semantically similar to the query by querying the' embedding' field with a target embedding.

This approach significantly enhances the AI sales assistant's ability to understand user intent and offer relevant product suggestions. Variables like embedding_field_name and vector_search_index_name are used to configure and interact with the vector search index within MongoDB, ensuring efficient retrieval of similar products.

Vector indexes also play a crucial role in enabling efficient semantic search within MongoDB. By creating a vector index on the 'embedding' field of the product documents, MongoDB can leverage the HSNW algorithm to perform fast similarity searches. This means that when the AI sales assistant needs to find products similar to a user's query, MongoDB can quickly identify and retrieve the most relevant products based on their semantic embeddings. This significantly improves the system's ability to understand user intent and deliver accurate recommendations in real time.

[17]
[18]
[19]
[20]
[21]
Creating index 'vector_index'...
Waiting for 30 seconds to allow index 'vector_index' to be created...
30-second wait completed for index 'vector_index'.
'vector_index'

Step 6 - 8: Setting up Tavily for working memory (short-term memory)

The Tavily Hybrid RAG Client forms the core of the AI sales assistant's working memory, bridging the gap between the internal knowledge base stored in MongoDB and the vast external knowledge available online.

Unlike traditional RAG systems that rely solely on retrieving documents, adding Tavily into our system introduces a hybrid approach, which combines information from local and foreign sources to provide comprehensive and context-aware responses. This is a form of HybridRAG, as we use two retrieval techniques to supplement information provided to an LLM.

image.png

Step 6: Tavily Hybrid RAG Client setup​ (Working Memory)

The code snippet below initializes the Tavily Hybrid RAG Client, which is the core component responsible for implementing working memory in AI sales assistants. It imports necessary libraries (pymongo and tavily) and then creates an instance of the TavilyHybridClient class.

During initialization, it configures the client with the Tavily API key, specifies MongoDB as the database provider, and provides references to the MongoDB collection, vector search index, embedding field, and content field.

This setup establishes the connection between Tavily and the underlying knowledge base, enabling the client to perform a hybrid search and manage working memory effectively.

[22]
Enter your Tavily API Key: ··········
[23]

Step 7: Retrieving Data From Working Memory (Real Time Search)

[24]
[25]

Step 8: Save Short Term Memory Content to Long Term Memory

There are scenarios where storing new information from the working memory into a long-term memory component within a system is required.

For example. let's assume the user asks for "a black laptop with a long battery life for office use." Tavily might retrieve information about a specific laptop model with long battery life from an external website. By saving this foreign data, the next time a user asks for a "laptop with long battery life", the AI sales assistant can directly retrieve the previously saved information from its local knowledge base, providing a faster and more efficient response.

Below are a few more benefits and rationale for saving foreign data from working memory:

  • Enriched Knowledge Base: By saving foreign data, the AI sales assistant's knowledge base becomes more comprehensive and up-to-date with information from the web. This can significantly improve the relevance and accuracy of future responses.
  • Reduced Latency: Subsequent searches for similar queries will be faster as the relevant information is now available locally, eliminating the need to query external sources again. This also reduced the operational cost of the entire system.
  • Offline Access: If external sources become unavailable, the AI sales assistant can still provide answers based on the previously saved foreign data, ensuring continuity of service.
[26]
[27]

Take note that the item with the content:

  • "Black Dell Laptops and 2-in-1 PCs Black Dell L..."
  • "HP Stream 14" HD BrightView Laptop, Intel Cele..."

are both sourced from the internet or a "foreign" source

[28]
[29]

Observe that included in the "local" sourced results are search results that were once "foreign".

Items from used in the working memory, has been moved to the long term memory

image.png

Benefits of Working Memory For AI Agents and Agentic Systems

image.png

Working memory, enabled by Tavily and MongoDB in your AI application stack, offers several key benefits for LLM-powered chatbots, AI agents, and agentic systems, including AI-powered sales assistants:

  1. Enhanced Context and Personalization: AI agents can remember past interactions and user preferences, allowing them to provide more contextually relevant and personalized responses. This is demonstrated in the code through the use of the Tavily Hybrid RAG Client, which stores and retrieves information from both local and foreign sources, allowing the system to recall past interactions.

  2. Improved Efficiency and Speed: Working memory allows AI agents to access previously retrieved information quickly, reducing the need for repeated external queries. This is evident in the code where the save_foreign=True parameter enables saving foreign data into the local knowledge base, accelerating future searches for similar information.

  3. Increased Knowledge Base and Adaptability: By saving foreign data, AI agents can continuously expand their knowledge base, learning from new interactions and adapting to evolving user needs. This is reflected in the code's use of MongoDB as a long-term memory store, enabling the system to build a more comprehensive knowledge base over time.

  4. Enhanced User Experience: Working memory enables more natural and engaging interactions, as AI agents can understand and respond to user queries with greater context and personalization. This is a crucial benefit highlighted in the AI sales assistant use case, where remembering past interactions leads to more satisfying customer experiences.

Overall, working memory empowers AI agents and agentic systems to become more intelligent, adaptable, reliable, and user-centric, significantly improving their adoption, effectiveness, and overall user experience.