MongoDB Ai Hallucination Detection And Reduction

Ai Hallucination Detection And Reduction

agentsartificial-intelligencellmspartnersmongodb-genai-showcasegalileogenerative-airag

alph-notebooks/mongodb-genai-showcase / ai_hallucination_detection_and_reduction.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Use Case: Creating a Coding Agent

Primary Objective: Create an intelligent coding agent that can analyze GitHub repositories and assist with code-related tasks.

This system is built to improve developer productivity focusing on the following example scenarios:

Code discovery and navigation
Repository ingestion
Automated unit test creation

Components:

Gitingest: Converts GitHub repositories into structured markdown format for processing and analysis
VoyageAI - Generates high-quality code-specific embeddings using the voyage-code-3 model for semantic search
OpenAI - Powers text generation via GPT-4.1
PyMongo - Manages MongoDB connections and operations for storing codebase metadata, files, and embeddings
Galileo - Monitors and evaluates AI responses for hallucination detection and quality assessment metrics
LangGraph - Orchestrates complex agent workflows and manages state transitions for multi-step coding tasks

This notebook contains three main parts:

Part 1: Data Ingestion, Preparation and Storage
Part 2: Information Retrieval and RAG
Part 3: Coding Agent

What you will learn

✅ Building production-ready RAG systems
✅ Implementing advanced vector search strategies
✅ Creating AI-powered coding assistants
✅ Establishing comprehensive AI monitoring
✅ Designing agentic workflows with tool integration
✅ Evaluating and improving AI system quality

Key topics explored: Multi-modal RAG architecture, MongoDB Vector Search, embeddings, hybrid search with RankFusion, AI hallucination detection using Galileo, agentic workflows with LangGraph, GitHub repository processing with GitIngest, quality evaluation frameworks, tool development and integration, and production-ready AI observability patterns.

$image.png$

Setup: Intalling Packages and Environment Variables

[1]

[2]

This code configures Galileo's AI monitoring platform for tracking system performance and detecting hallucinations. Get your Galileo API Key here:

[ ]

Securely captures and stores the Galileo API key using masked input (like password entry)

GALILEO_PROJECT: Organizes AI experiments under a project container
GALILEO_LOG_STREAM: Categorizes logs within the project for different experiment types

Once configured, Galileo automatically tracks retrieval quality, hallucination rates, and AI response accuracy when functions are decorated with @log(). This provides enterprise-grade monitoring without building custom evaluation infrastructure, essential for reliable production AI applications.

[ ]

Part 1: Data Ingestion, Preparation and Storage

Part 1: Data Ingestion Process Overview

The data ingestion pipeline transforms raw GitHub repositories into a searchable, AI-ready knowledge base through four key stages:

1. Repository Extraction

GitIngest processes the GitHub URL and extracts the entire codebase structure
Analyzes repository tree, file relationships, and content hierarchy
Converts source code into structured, processable format

2. Intelligent Segmentation

File-level chunking breaks down the codebase into manageable pieces
Content parsing identifies code structure, functions, classes, and documentation
Smart boundaries ensure logical code blocks remain intact

3. AI-Powered Description Generation

Context-aware summaries created for each file using AI
Code understanding generates human-readable descriptions of what each file does
Semantic enrichment adds searchable metadata beyond just raw code

4. Code-Specific Embeddings

Voyage AI embeddings (2048d): Code-optimized vector representations for technical searches
Specialized models trained specifically for understanding code semantics and structure
Enhanced retrieval for programming-related queries and code comprehension

5. MongoDB Atlas Storage

Vector indexes with scalar quantization for efficient similarity search
Text indexes for keyword-based searches
Dual collections: Metadata and file content stored separately for optimal performance

Result: A GitHub repository becomes a fully searchable, semantically-understood knowledge base ready for sophisticated RAG queries, with both vector similarity and traditional text search capabilities optimized for code understanding.

1.1 Repository Extraction

The codebase_metadata dictionary serves as a repository-level data container that captures high-level information about the entire GitHub repository being processed:

Example of what an object can look like below:

codebase_metadata = {
    "url": "https://github.com/RichmondAlake/memorizz",
    "repository": "richmondalake/memorizz",
    "analyzed_count": "57",              # Number of files processed
    "estimated_tokens": "101.0k",        # Total token count
    "tree": "Directory structure...",     # Full directory tree
    "description": "AI-generated repo summary...",
    "voyage_embeddings": [0.3, 0.4, ...]
}

[3]

Provide the url of the codebase you want to process. You can change the URL below for any url of your choice

[4]

GitIngest is a specialized tool that converts GitHub repositories into structured markdown format for AI processing. The ingest_async function is its asynchronous interface.

Without GitIngest, developers would need to manually clone repos, parse file structures, and format content - GitIngest automates this entire pipeline into a single async function call.

[ ]

2025-08-20 08:15:33.955 | INFO     | gitingest.entrypoint:ingest_async:89 | Starting ingestion process | {"source":"https://github.com/RichmondAlake/memorizz"}
2025-08-20 08:15:33.957 | INFO     | gitingest.entrypoint:ingest_async:98 | Parsing remote repository | {"source":"https://github.com/RichmondAlake/memorizz"}
2025-08-20 08:15:35.377 | INFO     | gitingest.clone:clone_repo:56 | Starting git clone operation | {"url":"https://github.com/richmondalake/memorizz","local_path":"/var/folders/tw/h5zv0cns7yg3z_ytt6y14d3m0000gn/T/gitingest/d2aa6d51-09c3-4b89-b40e-2eb0aa4612c4/richmondalake-memorizz","partial_clone":false,"subpath":"/","branch":null,"tag":null,"commit":"6257e5af49c67e5f2bfabd2b578ea53adf660273","include_submodules":false}
2025-08-20 08:15:36.131 | INFO     | logging:callHandlers:1706 | HTTP Request: HEAD https://github.com/richmondalake/memorizz "HTTP/1.1 200 OK"
2025-08-20 08:15:36.134 | INFO     | gitingest.clone:clone_repo:97 | Executing git clone command | {"command":"git clone --single-branch --no-checkout --depth=1 https://github.com/richmondalake/memorizz <url> /var/folders/tw/h5zv0cns7yg3z_ytt6y14d3m0000gn/T/gitingest/d2aa6d51-09c3-4b89-b40e-2eb0aa4612c4/richmondalake-memorizz"}
2025-08-20 08:15:36.789 | INFO     | gitingest.clone:clone_repo:99 | Git clone completed successfully
2025-08-20 08:15:37.635 | INFO     | gitingest.clone:clone_repo:114 | Checking out commit | {"commit":"6257e5af49c67e5f2bfabd2b578ea53adf660273"}
2025-08-20 08:15:37.748 | INFO     | gitingest.clone:clone_repo:123 | Git clone operation completed successfully | {"local_path":"/var/folders/tw/h5zv0cns7yg3z_ytt6y14d3m0000gn/T/gitingest/d2aa6d51-09c3-4b89-b40e-2eb0aa4612c4/richmondalake-memorizz"}
2025-08-20 08:15:37.749 | INFO     | gitingest.entrypoint:ingest_async:132 | Repository cloned, starting file processing
2025-08-20 08:15:37.761 | INFO     | gitingest.entrypoint:ingest_async:140 | Processing files and generating output
2025-08-20 08:15:37.761 | INFO     | gitingest.ingestion:ingest_query:44 | Starting file ingestion | {"slug":"richmondalake-memorizz","subpath":"/","local_path":"/var/folders/tw/h5zv0cns7yg3z_ytt6y14d3m0000gn/T/gitingest/d2aa6d51-09c3-4b89-b40e-2eb0aa4612c4/richmondalake-memorizz","max_file_size":10485760}
2025-08-20 08:15:37.761 | INFO     | gitingest.ingestion:ingest_query:96 | Processing directory | {"directory_path":"/var/folders/tw/h5zv0cns7yg3z_ytt6y14d3m0000gn/T/gitingest/d2aa6d51-09c3-4b89-b40e-2eb0aa4612c4/richmondalake-memorizz"}
2025-08-20 08:15:37.901 | INFO     | gitingest.ingestion:ingest_query:109 | Directory processing completed | {"total_files":79,"total_directories":30,"total_size_bytes":848358,"stats_total_files":79,"stats_total_size":848358}
2025-08-20 08:15:40.482 | INFO     | gitingest.entrypoint:ingest_async:147 | Ingestion completed successfully
2025-08-20 08:16:53.125 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:17.445 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:25.204 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:27.239 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:28.992 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:29.978 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:31.351 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:32.338 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:33.479 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:34.508 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:35.595 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:36.959 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:39.321 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:41.674 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:43.000 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:43.982 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:45.147 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:46.907 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:47.843 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:50.461 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:51.592 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:52.682 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:53.540 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:54.743 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:56.086 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:57.088 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:58.315 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:17:59.360 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:00.604 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:02.828 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:03.897 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:04.757 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:06.081 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:07.150 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:08.199 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:09.102 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:12.085 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:13.714 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:15.368 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:16.956 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:18.444 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:20.384 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:21.382 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:22.946 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:24.084 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:25.228 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:27.919 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:29.010 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:29.957 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:30.814 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:31.840 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:35.134 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:36.142 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:37.573 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:38.656 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:39.823 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:41.079 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:42.588 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:44.272 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:45.446 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:46.938 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:48.812 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:49.808 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:50.906 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:52.421 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:53.745 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:55.672 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:56.822 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:57.790 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:18:59.697 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:01.257 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:02.881 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:04.189 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:05.299 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:06.288 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:08.633 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:10.955 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:12.092 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:13.141 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:14.410 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:19:15.481 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
2025-08-20 08:20:19.443 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:37.004 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:37.596 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:38.632 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:38.870 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:39.097 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:39.389 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:40.097 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:40.340 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:40.567 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:40.906 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:41.138 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:41.479 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:41.731 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:42.009 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:42.341 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:42.809 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:43.039 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:43.306 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:43.729 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:44.084 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:44.418 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:44.682 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:45.275 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:45.571 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:46.053 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:46.287 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:46.888 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:47.440 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:47.679 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:47.996 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:48.350 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:48.690 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:48.985 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:49.186 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:49.484 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:49.804 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:50.097 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:50.421 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:50.640 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:51.103 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:51.344 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:51.585 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:51.808 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:52.229 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:52.573 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:52.937 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:53.245 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:53.522 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:54.079 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:54.484 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:54.883 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:55.113 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:55.594 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:55.904 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:56.210 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:56.551 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:56.759 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:57.029 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:57.389 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:57.629 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:57.867 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:58.144 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:58.414 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:58.682 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:59.017 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:59.451 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:20:59.748 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:00.071 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:00.340 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:00.662 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:00.924 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:01.267 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:01.551 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:01.871 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:02.152 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:02.653 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:02.920 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:03.157 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:21:03.439 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:36:39.667 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 08:36:39.672 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 08:36:40.216 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:36:58.331 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:37:36.714 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:42:31.965 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:09.165 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:42.101 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:42.637 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:43.135 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:43.497 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:43.890 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:44.418 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:44.763 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:45.229 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:45.685 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:46.025 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:46.548 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:46.998 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:47.339 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:47.700 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:48.185 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:49.162 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:49.891 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:50.450 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:50.878 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:51.254 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:51.791 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:52.232 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:52.534 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:53.061 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:53.494 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:53.914 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:54.369 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:54.820 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:55.431 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:55.867 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:56.404 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:57.142 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:57.621 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:58.147 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:58.516 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:59.240 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:44:59.844 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:00.315 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:00.813 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:01.282 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:01.773 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:02.252 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:02.707 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:03.315 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:03.788 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:04.223 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:04.683 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:05.163 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:05.576 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:06.537 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:45:07.042 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:47:02.647 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 08:47:04.708 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-08-20 08:47:06.311 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 08:54:09.276 | INFO     | logging:callHandlers:1706 | Exception in execute request:
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[112], line 11
      1 from langgraph.prebuilt import create_react_agent
      3 prompt = """
      4 You are a helpful assistant that can answer questions about the codebase.
      5 """
      7 code_agent = create_react_agent(
      8     "openai:gpt-4o",
      9     prompt=prompt, # Prompt for the agent obtained from the database (procedural memory)
     10     tools=code_agent_toolbox,
---> 11     checkpointer=mongodb_checkpointer, # Storing the state of the agent in the database (procedural memory)
     12 )

NameError: name 'mongodb_checkpointer' is not defined
2025-08-20 08:55:33.468 | INFO     | logging:callHandlers:1706 | Exception in execute request:
---------------------------------------------------------------------------
InvalidStateError                         Traceback (most recent call last)
Cell In[116], line 1
----> 1 response = chat_with_agent(
      2     agent=code_agent, 
      3     query="Can you provide me a detailed description of the codebase?",
      4     thread_id="123",
      5     user_id="123",
      6     return_full_result=False
      7 )

Cell In[115], line 20, in chat_with_agent(agent, query, thread_id, user_id, return_full_result)
     17 if user_id:
     18     config["configurable"]["user_id"] = user_id
---> 20 result_state = agent.invoke(
     21     {"messages": [{"role": "user", "content": query}]}, 
     22     config=config
     23 )
     25 if return_full_result:
     26     return result_state

File ~/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/langgraph/pregel/main.py:3026, in Pregel.invoke(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, **kwargs)
   3023 chunks: list[dict[str, Any] | Any] = []
   3024 interrupts: list[Interrupt] = []
-> 3026 for chunk in self.stream(
   3027     input,
   3028     config,
   3029     context=context,
   3030     stream_mode=["updates", "values"]
   3031     if stream_mode == "values"
   3032     else stream_mode,
   3033     print_mode=print_mode,
   3034     output_keys=output_keys,
   3035     interrupt_before=interrupt_before,
   3036     interrupt_after=interrupt_after,
   3037     durability=durability,
   3038     **kwargs,
   3039 ):
   3040     if stream_mode == "values":
   3041         if len(chunk) == 2:

File ~/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/langgraph/pregel/main.py:2582, in Pregel.stream(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, subgraphs, debug, **kwargs)
   2579 runtime = parent_runtime.merge(runtime)
   2580 config[CONF][CONFIG_KEY_RUNTIME] = runtime
-> 2582 with SyncPregelLoop(
   2583     input,
   2584     stream=StreamProtocol(stream.put, stream_modes),
   2585     config=config,
   2586     store=store,
   2587     cache=cache,
   2588     checkpointer=checkpointer,
   2589     nodes=self.nodes,
   2590     specs=self.channels,
   2591     output_keys=output_keys,
   2592     input_keys=self.input_channels,
   2593     stream_keys=self.stream_channels_asis,
   2594     interrupt_before=interrupt_before_,
   2595     interrupt_after=interrupt_after_,
   2596     manager=run_manager,
   2597     durability=durability_,
   2598     trigger_to_nodes=self.trigger_to_nodes,
   2599     migrate_checkpoint=self._migrate_checkpoint,
   2600     retry_policy=self.retry_policy,
   2601     cache_policy=self.cache_policy,
   2602 ) as loop:
   2603     # create runner
   2604     runner = PregelRunner(
   2605         submit=config[CONF].get(
   2606             CONFIG_KEY_RUNNER_SUBMIT, weakref.WeakMethod(loop.submit)
   (...)   2609         node_finished=config[CONF].get(CONFIG_KEY_NODE_FINISHED),
   2610     )
   2611     # enable subgraph streaming

File ~/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/langgraph/pregel/_loop.py:1007, in SyncPregelLoop.__enter__(self)
   1005 def __enter__(self) -> Self:
   1006     if self.checkpointer:
-> 1007         saved = self.checkpointer.get_tuple(self.checkpoint_config)
   1008     else:
   1009         saved = None

File ~/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/langgraph/checkpoint/mongodb/aio.py:506, in AsyncMongoDBSaver.get_tuple(self, config)
    502 try:
    503     # check if we are in the main thread, only bg threads can block
    504     # we don't check in other methods to avoid the overhead
    505     if asyncio.get_running_loop() is self.loop:
--> 506         raise asyncio.InvalidStateError(
    507             "Synchronous calls to AsyncMongoDBSaver are only allowed from a "
    508             "different thread. From the main thread, use the async interface."
    509             "For example, use `await checkpointer.aget_tuple(...)` or `await "
    510             "graph.ainvoke(...)`."
    511         )
    512 except RuntimeError:
    513     pass

InvalidStateError: Synchronous calls to AsyncMongoDBSaver are only allowed from a different thread. From the main thread, use the async interface.For example, use `await checkpointer.aget_tuple(...)` or `await graph.ainvoke(...)`.
2025-08-20 08:58:37.790 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 08:58:39.368 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:00:27.605 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:00:27.650 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:00:27.652 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:00:28.403 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_retriever_span: A trace needs to be created in order to add a span.
2025-08-20 09:00:28.436 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-20 09:01:01.343 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:01:45.445 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:01:45.514 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:01:45.517 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:01:45.522 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-20 09:01:51.389 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:03:41.396 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:03:41.398 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:03:43.481 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:03:58.811 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:03:58.814 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:04:02.070 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:09:33.427 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:09:58.603 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:10:16.694 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:11:39.909 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:11:53.649 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:11:53.655 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-20 09:11:55.275 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:11:55.798 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_retriever_span: A trace needs to be created in order to add a span.
2025-08-20 09:11:55.800 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-20 09:12:01.616 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:12:27.876 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:12:54.468 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 09:12:54.556 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-20 09:12:57.812 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 13:33:43.425 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 13:33:54.524 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 13:34:20.093 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 13:34:21.297 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_retriever_span: A trace needs to be created in order to add a span.
2025-08-20 13:34:21.300 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-20 13:34:24.942 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 13:35:32.485 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-20 13:35:32.690 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-20 13:35:35.211 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-21 17:57:34.843 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-21 17:58:07.926 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-21 17:58:07.928 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-21 17:58:10.541 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-21 17:58:40.045 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-21 17:58:40.081 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_project: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-21 17:58:40.082 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: _init_log_stream: 2 validation errors for GalileoPythonConfig
console_url
  Field required [type=missing, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/missing
api_url
  Value error, Console URL is required. Please set the environment variable `GALILEO_CONSOLE_URL` to your Galileo console URL. [type=value_error, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
2025-08-21 17:58:40.858 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_retriever_span: A trace needs to be created in order to add a span.
2025-08-21 17:58:40.861 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-21 17:58:55.248 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-21 18:00:03.423 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-08-21 18:00:03.488 | WARNING  | logging:callHandlers:1706 | Error occurred during execution: add_tool_span: A trace needs to be created in order to add a span.
2025-08-21 18:00:07.482 | INFO     | logging:callHandlers:1706 | HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

[6]

Repository: richmondalake/memorizz
Commit: 6257e5af49c67e5f2bfabd2b578ea53adf660273
Files analyzed: 79

Estimated tokens: 162.5k

[7]

[8]

[9]

{'analyzed_count': '79',
 'estimated_tokens': '162.5k',
 'repository': 'richmondalake/memorizz',
 'url': 'https://github.com/RichmondAlake/memorizz'}

[10]

[11]

Directory structure:
└── richmondalake-memorizz/
    ├── README.md
    ├── LICENCE.txt
    ├── pyproject.toml
    ├── eval/
    │   ├── README.md
    │   └── longmemeval/
    │       ├── README.md
    │       ├── download_dataset.py
    │       ├── evaluate_delegate_pattern.py
    │       ├── evaluate_hierarchical_pattern.py
    │       ├── evaluate_memorizz.py
    │       └── README_evaluation_architectures.md
    ├── examples/
    │   ├── knowledge_base.ipynb
    │   ├── memagent_single_agent.ipynb
    │   ├── memagent_summarisation.ipynb
    │   ├── memagents_multi_agents.ipynb
    │   ├── persona.ipynb
    │   ├── toolbox.ipynb
    │   └── workflow.ipynb
    ├── results/
    │   └── longmemeval_oracle_general_20250730_021208.json
    └── src/
        └── memorizz/
            ├── __init__.py
            ├── memagent.py
            ├── MEMORY_ARCHITECTURE.md
            ├── multi_agent_orchestrator.py
            ├── task_decomposition.py
            ├── coordination/
            │   ├── README.md
            │   ├── __init__.py
            │   └── shared_memory/
            │       ├── __init__.py
            │       └── shared_memory.py
            ├── database/
            │   ├── __init__.py
            │   └── mongodb/
            │       └── mongodb_tools.py
            ├── embeddings/
            │   ├── README.md
            │   ├── __init__.py
            │   ├── ollama/
            │   │   ├── __init__.py
            │   │   └── provider.py
            │   ├── openai/
            │   │   ├── __init__.py
            │   │   └── provider.py
            │   └── voyageai/
            │       ├── __init__.py
            │       └── provider.py
            ├── enums/
            │   ├── __init__.py
            │   ├── application_mode.py
            │   ├── memory_type.py
            │   └── role.py
            ├── llms/
            │   ├── __init__.py
            │   └── openai.py
            ├── long_term_memory/
            │   ├── __init__.py
            │   ├── episodic/
            │   │   ├── README.md
            │   │   ├── __init__.py
            │   │   ├── conversational_memory_unit.py
            │   │   └── summary_component.py
            │   ├── procedural/
            │   │   ├── README.md
            │   │   ├── __init__.py
            │   │   ├── persona/
            │   │   │   └── README.md
            │   │   ├── toolbox/
            │   │   │   ├── README.md
            │   │   │   ├── __init__.py
            │   │   │   ├── tool_schema.py
            │   │   │   └── toolbox.py
            │   │   └── workflow/
            │   │       ├── __init__.py
            │   │       └── workflow.py
            │   └── semantic/
            │       ├── README.md
            │       ├── __init__.py
            │       ├── knowledge_base.py
            │       └── persona/
            │           ├── README.md
            │           ├── __init__.py
            │           ├── persona.py
            │           └── role_type.py
            ├── memory_provider/
            │   ├── __init__.py
            │   ├── base.py
            │   └── mongodb/
            │       ├── __init__.py
            │       └── provider.py
            ├── memory_unit/
            │   ├── __init__.py
            │   ├── conversational_memory_unit.py
            │   ├── memory_unit.py
            │   └── summary_component.py
            ├── short_term_memory/
            │   ├── __init__.py
            │   ├── semantic_cache.py
            │   └── working_memory/
            │       ├── README.md
            │       ├── __init__.py
            │       └── cwm.py
            └── tests/
                ├── test_memagent_enhanced_tools.py
                └── test_vegetarian_recipe_agent.py

1.2 Intelligent Segmentation

[12]

This code below imports Galileo's instrumented version of the OpenAI client and context manager:

from galileo.openai import openai, galileo_context

This line imports Galileo's instrumented version of the OpenAI client and context manager:

What it imports:

openai: A wrapped OpenAI client that automatically logs all API calls to Galileo
- Replaces the standard import openai
- Every API call (completions, embeddings, etc.) is automatically monitored
- Captures tokens used, response times, model performance metrics
galileo_context: A context manager for controlling when logging happens
- Used to flush logs to Galileo's servers: galileo_context.flush()
- Manages logging sessions and ensures data is properly uploaded

Why use this instead of regular OpenAI?

Automatic Monitoring: No need to manually instrument every API call
Hallucination Detection: Galileo analyzes responses for accuracy and reliability
Cost Tracking: Monitors token usage and API costs across all calls
Performance Analytics: Tracks response times and error rates

This provides zero-code observability - just replace the import and get comprehensive AI monitoring.

[13]

The generate_repo_description uses AI to automatically write a README-style summary of what a GitHub repository does.

Takes in: Repository information (name, URL, file count, directory structure) + a snippet of the actual code
Asks GPT-4: "Look at this repo data and write a paragraph explaining what this codebase is for"
Returns: An AI-generated description that explains the repository's purpose, structure, and goals

Why it's useful:

Automatic Documentation: No need to manually write repo descriptions
Consistent Format: Always produces well-structured summaries
Context Aware: Understands what the code actually does by analyzing the content
Searchable Metadata: Creates text that can be embedded and searched later

[14]

[15]

[16]

('[**memorizz**](https://github.com/RichmondAlake/memorizz) is an experimental '
 'memory management framework for AI agents, with 79 files and approximately '
 '162,500 tokens analyzed in the codebase. The project’s root directory '
 'includes standard files such as `README.md` and `LICENCE.txt`, while the '
 'main development occupies a rich subdirectory structure inside '
 '`richmondalake-memorizz/`. MemoRizz aims to enable memory-augmented agent '
 'architectures, allowing AI systems to persist, retrieve, and reason over '
 'structured memory efficiently—a crucial capability for advanced agent '
 'applications. Core modules inferred from the documentation emphasize agent '
 'memory persistence, flexible memory storage, and retrieval mechanisms, often '
 'designed with extensibility and experimentation in mind. Notable code '
 'patterns likely include modular agent interfaces and pluggable memory '
 'backends, catering to educational and research contexts where safe '
 'experimentation, rather than production-grade deployment, is the primary '
 'goal.')

[17]

================================================
FILE: README.md
================================================
<div align="center">

# Memorizz 🧠

📊 **[Agent Memory Presentation](https://docs.google.com/presentation/d/1iSu667m5-pOXMrJq_LjkfnfD4V0rW1kbhGaQ2u3TKXQ/edit?usp=sharing)** | 🎥 **[AIEWF Richmond's Talk](https://youtu.be/W2HVdB4Jbjs?si=faaI3cMLc71Efpeu)**

[![PyPI version](https://badge.fury.io/py/memorizz.svg)](https://badge.fury.io/py/memorizz)
[![PyPI Downloads](https://static.pepy.tech/badge/memorizz)](https://pepy.tech/projects/memorizz)

</div>

> **⚠️ IMPORTANT WARNING ⚠️**
> 
> **MemoRizz is an EXPERIMENTAL library intended for EDUCATIONAL PURPOSES ONLY.**
> 
> **Do NOT use in production environments or with sensitive data.**
> 
> This library is under active development, has not undergone security audits, and may contain bugs or breaking changes in future releases.

## Overview

**MemoRizz is a memory management framework for AI agents designed to create memory-augme

1.3 AI-Powered Description Generation

The function split_content_to_files takes a giant string containing multiple files and splits it into individual files.

Process:

Finds file separators: Looks for the ====\nFILE: markers that divide files
Extracts each file: For each section, grabs the filename and content
Cleans up paths: Converts src/main.py to just main.py

Why it's needed: -GitIngest gives you one big markdown blob with all files concatenated together. This function unpacks that blob into individual, manageable file objects that can be processed separately (for embeddings, descriptions, etc.).

Bottom line: It's a parser that converts "one giant file containing everything" into "a list of individual files" - like unzipping a compressed archive.

[18]

[19]

[20]

{'content': '<div align="center">\n'
            '\n'
            '# Memorizz 🧠\n'
            '\n'
            '📊 **[Agent Memory '
            'Presentation](https://docs.google.com/presentation/d/1iSu667m5-pOXMrJq_LjkfnfD4V0rW1kbhGaQ2u3TKXQ/edit?usp=sharing)** '
            "| 🎥 **[AIEWF Richmond's "
            'Talk](https://youtu.be/W2HVdB4Jbjs?si=faaI3cMLc71Efpeu)**\n'
            '\n'
            '[![PyPI '
            'version](https://badge.fury.io/py/memorizz.svg)](https://badge.fury.io/py/memorizz)\n'
            '[![PyPI '
            'Downloads](https://static.pepy.tech/badge/memorizz)](https://pepy.tech/projects/memorizz)\n'
            '\n'
            '</div>\n'
            '\n'
            '> **⚠️ IMPORTANT WARNING ⚠️**\n'
            '> \n'
            '> **MemoRizz is an EXPERIMENTAL library intended for EDUCATIONAL '
            'PURPOSES ONLY.**\n'
            '> \n'
            '> **Do NOT use in production environments or with sensitive '
            'data.**\n'
            '> \n'
            '> This library is under active development, has not undergone '
            'security audits, and may contain bugs or breaking changes in '
            'future releases.\n'
            '\n'
            '## Overview\n'
            '\n'
            '**MemoRizz is a memory management framework for AI agents '
            'designed to create memory-augmented agents with explicit memory '
            'type allocation based on application mode.**\n'
            '\n'
            'The framework enables developers to build context-aware agents '
            'capable of sophisticated information retrieval and storage. \n'
            '\n'
            'MemoRizz provides flexible single and multi-agent architectures '
            'that allow you to instantiate agents with specifically allocated '
            'memory types—whether episodic, semantic, procedural, or working '
            "memory—tailored to your application's operational requirements.\n"
            '\n'
            '\n'
            '**Why MemoRizz?**\n'
            '- 🧠 **Persistent Memory**: Your AI agents remember conversations '
            'across sessions\n'
            '- 🔍 **Semantic Search**: Find relevant information using natural '
            'language\n'
            '- 🛠️ **Tool Integration**: Automatically discover and execute '
            'functions\n'
            '- 👤 **Persona System**: Create consistent, specialized agent '
            'personalities\n'
            '- 📊 **Vector Search**: MongoDB Vector Search for efficient '
            'retrieval\n'
            '\n'
            '## Key Features\n'
            '\n'
            '- **Persistent Memory Management**: Long-term memory storage with '
            'semantic retrieval\n'
            '- **MemAgent System**: Complete AI agents with memory, personas, '
            'and tools\n'
            '- **MongoDB Integration**: Built on MongoDB Atlas with vector '
            'search capabilities\n'
            '- **Tool Registration**: Automatically convert Python functions '
            'into LLM-callable tools\n'
            '- **Persona Framework**: Create specialized agent personalities '
            'and behaviors\n'
            '- **Vector Embeddings**: Semantic similarity search across all '
            'stored information\n'
            '\n'
            '## Installation\n'
            '\n'
            '```bash\n'
            'pip install memorizz\n'
            '```\n'
            '\n'
            '### Prerequisites\n'
            '- Python 3.7+\n'
            '- MongoDB Atlas account (or local MongoDB with vector search)\n'
            '- OpenAI API key (for embeddings and LLM functionality)\n'
            '\n'
            '## Quick Start\n'
            '\n'
            '### 1. Basic MemAgent Setup\n'
            '\n'
            '```python\n'
            'import os\n'
            'from memorizz.memory_provider.mongodb.provider import '
            'MongoDBConfig, MongoDBProvider\n'
            'from memorizz.memagent import MemAgent\n'
            'from memorizz.llms.openai import OpenAI\n'
            '\n'
            '# Set up your API keys\n'
            'os.environ["OPENAI_API_KEY"] = "your-openai-api-key"\n'
            '\n'
            '# Configure MongoDB memory provider\n'
            'mongodb_config = MongoDBConfig(uri="your-mongodb-atlas-uri")\n'
            'memory_provider = MongoDBProvider(mongodb_config)\n'
            '\n'
            '# Create a MemAgent\n'
            'agent = MemAgent(\n'
            '    model=OpenAI(model="gpt-4"),\n'
            '    instruction="You are a helpful assistant with persistent '
            'memory.",\n'
            '    memory_provider=memory_provider\n'
            ')\n'
            '\n'
            '# Start conversing - the agent will remember across sessions\n'
            'response = agent.run("Hello! My name is John and I\'m a software '
            'engineer.")\n'
            'print(response)\n'
            '\n'
            '# Later in another session...\n'
            'response = agent.run("What did I tell you about myself?")\n'
            'print(response)  # Agent remembers John is a software engineer\n'
            '```\n'
            '\n'
            '# Table of single agent and multi-agent setups, their '
            'descriptions, and links to example notebooks\n'
            '| Agent Type                | '
            'Description                                                                 '
            '| Example '
            'Notebook                                                                 '
            '|\n'
            '|---------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------------|\n'
            '| Single Agent              | A standalone agent with its own '
            'memory and persona, suitable for individual tasks | [Single Agent '
            'Example](examples/memagent_single_agent.ipynb)                      '
            '|\n'
            '| Multi-Agent               | A system of multiple agents '
            'collaborating, each with specialized roles and shared memory | '
            '[Multi-Agent '
            'Example](examples/memagents_multi_agents.ipynb)                        '
            '|\n'
            '\n'
            '\n'
            '\n'
            '# Memory System Components and Examples\n'
            '\n'
            '| Memory Component | Memory Category | Use Case / Description | '
            'Example Notebook |\n'
            '|------------------|-----------------|------------------------|------------------|\n'
            '| **Persona** | Semantic Memory | Agent identity, personality, '
            'and behavioral consistency | [Persona '
            'Example](examples/persona.ipynb) |\n'
            '| **Knowledge Base** | Semantic Memory | Persistent facts, '
            'concepts, and domain knowledge | [Knowledge Base '
            'Example](examples/knowledge_base.ipynb) |\n'
            '| **Toolbox** | Procedural Memory | Registered functions with '
            'semantic discovery for LLM execution | [Toolbox '
            'Example](examples/toolbox.ipynb) |\n'
            '| **Workflow** | Procedural Memory | Multi-step process '
            'orchestration and execution tracking | [Workflow '
            'Example](examples/workflow.ipynb) |\n'
            '| **Conversation Memory** | Episodic Memory | Interaction history '
            'and conversational context | [Single Agent '
            'Example](examples/memagent_single_agent.ipynb) |\n'
            '| **Summaries** | Episodic Memory | Compressed episodic '
            'experiences and events | [Summarization '
            'Example](examples/memagent_summarisation.ipynb) |\n'
            '| **Working Memory** | Short-term Memory | Active context '
            'management and current session state | [Single Agent '
            'Example](examples/memagent_single_agent.ipynb) |\n'
            '| **Shared Memory** | Multi-Agent Coordination | Blackboard for '
            'inter-agent communication and coordination | [Multi-Agent '
            'Example](examples/memagents_multi_agents.ipynb) |\n'
            '\n'
            '\n'
            '### 2. Creating Specialized Agents with Personas\n'
            '\n'
            '```python\n'
            'from memorizz.long_term_memory.semantic.persona import Persona\n'
            'from memorizz.long_term_memory.semantic.persona.role_type import '
            'RoleType\n'
            '\n'
            '# Create a technical expert persona using predefined role types\n'
            'tech_expert = Persona(\n'
            '    name="TechExpert",\n'
            '    role=RoleType.TECHNICAL_EXPERT,  # Use predefined role enum\n'
            '    goals="Help developers solve complex technical problems with '
            'detailed explanations.",\n'
            '    background="10+ years experience in Python, AI/ML, and '
            'distributed systems."\n'
            ')\n'
            '\n'
            '# Apply persona to agent\n'
            'agent.set_persona(tech_expert)\n'
            'agent.save()\n'
            '\n'
            '# Now the agent will respond as a technical expert\n'
            'response = agent.run("How should I design a scalable '
            'microservices architecture?")\n'
            '```\n'
            '\n'
            '### 3. Tool Registration and Function Calling\n'
            '\n'
            '```python\n'
            'from memorizz.database import MongoDBTools, MongoDBToolsConfig\n'
            'from memorizz.embeddings.openai import get_embedding\n'
            '\n'
            '# Configure tools database\n'
            'tools_config = MongoDBToolsConfig(\n'
            '    mongo_uri="your-mongodb-atlas-uri",\n'
            '    db_name="my_tools_db",\n'
            '    get_embedding=get_embedding  # Required embedding function\n'
            ')\n'
            '\n'
            '# Register tools using decorator\n'
            'with MongoDBTools(tools_config) as tools:\n'
            '    toolbox = tools.mongodb_toolbox()\n'
            '    \n'
            '    @toolbox\n'
            '    def calculate_compound_interest(principal: float, rate: '
            'float, time: int) -> float:\n'
            '        """Calculate compound interest for financial '
            'planning."""\n'
            '        return principal * (1 + rate) ** time\n'
            '    \n'
            '    @toolbox\n'
            '    def get_weather(city: str) -> str:\n'
            '        """Get current weather for a city."""\n'
            '        # Your weather API integration here\n'
            '        return f"Weather in {city}: 72°F, sunny"\n'
            '    \n'
            '    # Add tools to your agent\n'
            '    agent.add_tool(toolbox=toolbox)\n'
            '    \n'
            '    # Agent can now discover and use these tools automatically\n'
            '    response = agent.run("What\'s the weather in San Francisco '
            'and calculate interest on $1000 at 5% for 3 years?")\n'
            '```\n'
            '\n'
            '## Core Concepts\n'
            '\n'
            '### Memory Types\n'
            '\n'
            'MemoRizz supports different memory categories for organizing '
            'information:\n'
            '\n'
            '- **CONVERSATION_MEMORY**: Chat history and dialogue context\n'
            '- **WORKFLOW_MEMORY**: Multi-step process information\n'
            '- **LONG_TERM_MEMORY**: Persistent knowledge storage with '
            'semantic search\n'
            '- **SHORT_TERM_MEMORY**: Temporary processing information\n'
            '- **PERSONAS**: Agent personality and behavior definitions\n'
            '- **TOOLBOX**: Function definitions and metadata\n'
            '- **SHARED_MEMORY**: Multi-agent coordination and communication\n'
            '- **MEMAGENT**: Agent configurations and states\n'
            '- **SUMMARIES**: Compressed summaries of past interactions for '
            'efficient memory management\n'
            '\n'
            '### Long-Term Knowledge Management\n'
            '\n'
            'Store and retrieve persistent knowledge with semantic search:\n'
            '\n'
            '```python\n'
            '# Add knowledge to long-term memory\n'
            'knowledge_id = agent.add_long_term_memory(\n'
            '    "I prefer Python for backend development due to its '
            'simplicity and extensive libraries.", \n'
            '    namespace="preferences"\n'
            ')\n'
            '\n'
            '# Retrieve related knowledge\n'
            'knowledge_entries = '
            'agent.retrieve_long_term_memory(knowledge_id)\n'
            '\n'
            '# Update existing knowledge\n'
            'agent.update_long_term_memory(\n'
            '    knowledge_id, \n'
            '    "I prefer Python for backend development and FastAPI for '
            'building APIs."\n'
            ')\n'
            '\n'
            '# Delete knowledge when no longer needed\n'
            'agent.delete_long_term_memory(knowledge_id)\n'
            '```\n'
            '\n'
            '### Tool Discovery\n'
            '\n'
            'Tools are semantically indexed, allowing natural language '
            'discovery:\n'
            '\n'
            '```python\n'
            '# Tools are automatically found based on intent\n'
            'agent.run("I need to check the weather")  # Finds and uses '
            'get_weather tool\n'
            'agent.run("Help me calculate some financial returns")  # Finds '
            'compound_interest tool\n'
            '```\n'
            '\n'
            '## Advanced Usage\n'
            '\n'
            '### Custom Memory Providers\n'
            '\n'
            'Extend the memory provider interface for custom storage '
            'backends:\n'
            '\n'
            '```python\n'
            'from memorizz.memory_provider.base import MemoryProvider\n'
            '\n'
            'class CustomMemoryProvider(MemoryProvider):\n'
            '    def store(self, data, memory_store_type):\n'
            '        # Your custom storage logic\n'
            '        pass\n'
            '    \n'
            '    def retrieve_by_query(self, query, memory_store_type, '
            'limit=10):\n'
            '        # Your custom retrieval logic\n'
            '        pass\n'
            '```\n'
            '\n'
            '### Multi-Agent Workflows\n'
            '\n'
            'Create collaborative agent systems:\n'
            '\n'
            '```python\n'
            '# Create specialized delegate agents\n'
            'data_analyst = MemAgent(\n'
            '    model=OpenAI(model="gpt-4"),\n'
            '    instruction="You are a data analysis expert.",\n'
            '    memory_provider=memory_provider\n'
            ')\n'
            '\n'
            'report_writer = MemAgent(\n'
            '    model=OpenAI(model="gpt-4"), \n'
            '    instruction="You are a report writing specialist.",\n'
            '    memory_provider=memory_provider\n'
            ')\n'
            '\n'
            '# Create orchestrator agent with delegates\n'
            'orchestrator = MemAgent(\n'
            '    model=OpenAI(model="gpt-4"),\n'
            '    instruction="You coordinate between specialists to complete '
            'complex tasks.",\n'
            '    memory_provider=memory_provider,\n'
            '    delegates=[data_analyst, report_writer]\n'
            ')\n'
            '\n'
            '# Execute multi-agent workflow\n'
            'response = orchestrator.run("Analyze our sales data and create a '
            'quarterly report.")\n'
            '```\n'
            '\n'
            '### Memory Management Operations\n'
            '\n'
            'Control agent memory persistence:\n'
            '\n'
            '```python\n'
            '# Save agent state to memory provider\n'
            'agent.save()\n'
            '\n'
            '# Load existing agent by ID\n'
            'existing_agent = MemAgent.load(\n'
            '    agent_id="your-agent-id",\n'
            '    memory_provider=memory_provider\n'
            ')\n'
            '\n'
            '# Update agent configuration\n'
            'agent.update(\n'
            '    instruction="Updated instruction for the agent",\n'
            '    max_steps=30\n'
            ')\n'
            '\n'
            '# Delete agent and optionally cascade delete memories\n'
            'MemAgent.delete_by_id(\n'
            '    agent_id="agent-id-to-delete",\n'
            '    cascade=True,  # Deletes associated memories\n'
            '    memory_provider=memory_provider\n'
            ')\n'
            '```\n'
            '\n'
            '## Architecture\n'
            '\n'
            '```\n'
            '┌─────────────────┐\n'
            '│   MemAgent      │  ← High-level agent interface\n'
            '├─────────────────┤\n'
            '│   Persona       │  ← Agent personality & behavior\n'
            '├─────────────────┤\n'
            '│   Toolbox       │  ← Function registration & discovery\n'
            '├─────────────────┤\n'
            '│ Memory Provider │  ← Storage abstraction layer\n'
            '├─────────────────┤\n'
            '│ Vector Search   │  ← Semantic similarity & retrieval\n'
            '├─────────────────┤\n'
            '│   MongoDB       │  ← Persistent storage backend\n'
            '└─────────────────┘\n'
            '```\n'
            '\n'
            '## Examples\n'
            '\n'
            'Check out the `examples/` directory for complete working '
            'examples:\n'
            '\n'
            '- **memagent_single_agent.ipynb**: Basic conversational agent '
            'with memory\n'
            '- **memagents_multi_agents.ipynb**: Multi-agent collaboration '
            'workflows\n'
            '- **persona.ipynb**: Creating and using agent personas\n'
            '- **toolbox.ipynb**: Tool registration and function calling\n'
            '- **workflow.ipynb**: Workflow memory and process tracking\n'
            '- **knowledge_base.ipynb**: Long-term knowledge management\n'
            '\n'
            '## Configuration\n'
            '\n'
            '### MongoDB Atlas Setup\n'
            '\n'
            '1. Create a MongoDB Atlas cluster\n'
            '2. Enable Vector Search on your cluster\n'
            '3. Create a database and collection for your agent\n'
            '4. Get your connection string\n'
            '\n'
            '### Environment Variables\n'
            '\n'
            '```bash\n'
            '# Required\n'
            'export OPENAI_API_KEY="your-openai-api-key"\n'
            'export MONGODB_URI="your-mongodb-atlas-uri"\n'
            '\n'
            '# Optional\n'
            'export MONGODB_DB_NAME="memorizz"  # Default database name\n'
            '```\n'
            '\n'
            '## Troubleshooting\n'
            '\n'
            '**Common Issues:**\n'
            '\n'
            '1. **MongoDB Connection**: Ensure your IP is whitelisted in '
            'Atlas\n'
            '2. **Vector Search**: Verify vector search is enabled on your '
            'cluster\n'
            '3. **API Keys**: Check OpenAI API key is valid and has credits\n'
            "4. **Import Errors**: Ensure you're using the correct import "
            'paths shown in examples\n'
            '\n'
            '## Contributing\n'
            '\n'
            'This is an educational project. Contributions for learning '
            'purposes are welcome:\n'
            '\n'
            '1. Fork the repository\n'
            '2. Create a feature branch\n'
            '3. Add tests for new functionality  \n'
            '4. Submit a pull request\n'
            '\n'
            '## License\n'
            '\n'
            'MIT License - see LICENSE file for details.\n'
            '\n'
            '## Educational Resources\n'
            '\n'
            'This library demonstrates key concepts in:\n'
            '- **AI Agent Architecture**: Memory, reasoning, and tool use\n'
            '- **Vector Databases**: Semantic search and retrieval\n'
            '- **LLM Integration**: Function calling and context management\n'
            '- **Software Design**: Clean abstractions and extensible '
            'architecture',
 'file_name': 'README.md'}

The function generate_file_description uses AI to automatically write a one-sentence summary of what each individual file does.

Input:

File name: "memagent.py"
File content: The entire Python code

Process:

Takes a snippet: Grabs first 100 characters of the file content
Asks GPT-4: "Look at this filename and code snippet, write one sentence explaining what this file does"
Gets AI response: A concise description

Output:

"The memagent.py file defines a memory-driven agent by integrating persona management, OpenAI language model support, and a customizable toolbox for executing tasks."

[21]

Quick test of the description function

[22]

[23]

'README.md provides an overview and central presentation link for the Memorizz Agent Memory project, including its purpose and key features.'

[24]

1.4 Code-Specific Embeddings

Get your Voyage AI API from:

[25]

The get_voyage_embedding function converts text into a list of numbers that represent the meaning of the text - these numbers are called "embeddings".

Input:

Any text (like "This function handles user authentication")

Process:

Sends text to Voyage AI: Uses their voyage-code-3 model (specially trained for code)
Gets back numbers: Receives 2048 numbers that capture the text's meaning
Returns the list: [0.1, -0.3, 0.7, 0.2, ...] (2048 numbers total)

Output: A list of 2048 floating-point numbers

Why embeddings matter:

Semantic Understanding: Similar meanings = similar numbers
Searchable: Can find related code by comparing number patterns
AI-Friendly: Machine learning models work with numbers, not text

Why Voyage AI specifically:

Code-Specialized: voyage-code-3 understands programming concepts better than general models
High Quality: Designed specifically for code similarity and search
Optimized Dimensions: 2048 numbers provide good balance of detail vs. efficiency

[26]

/Users/richmondalake/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

[27]

[28]

[29]

  0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00,  1.26it/s]
100%|██████████| 1/1 [00:00<00:00,  2.95it/s]

[30]

[31]

[32]

100%|██████████| 79/79 [00:27<00:00,  2.91it/s]
100%|██████████| 79/79 [00:18<00:00,  4.36it/s]

[33]

1.5 MongoDB Atlas Storage

Ensure to get your MongoDB URI:

[34]

[35]

[36]

Connection to MongoDB successful

[37]

[38]

{'nIndexesWas': 1,
, 'ns': 'code_repository_data.codebase_metadata',
, 'ok': 1.0,
, '$clusterTime': {'clusterTime': Timestamp(1755675315, 2),
,  'signature': {'hash': b'\xa4T\x84FC\xd3\x05r7\x82\xa8;\xae\x12;\xc3\xebt\x1c\xe8',
,   'keyId': 7520068280199938053}},
, 'operationTime': Timestamp(1755675315, 2)}

[39]

[40]

Existing collections: []
Created collection: codebase_metadata
Created collection: codebase_files

1.6 Vector Search Index Creation

The code below is creating vector search indexes for a codebase, using both scalar quantization and full fidelity indexing methods, across different embeddings (e.g., OpenAI and Voyage).

Here's a breakdown of the key components and functionality:

Key Concepts:

Vector Search Index: A data structure designed to efficiently search through vectors (typically high-dimensional embeddings) representing code, text, or other types of data.
Quantization: A technique used to reduce the memory and computational requirements of storing and searching vectors by approximating them with fewer bits.

Scalar Quantization: This method uses a scalar approximation of the vectors, which reduces their dimensionality and size at the expense of some accuracy.
Full Fidelity: This method stores vectors without any quantization or approximation, keeping the full accuracy and size of the vectors.

[62]

Scalar Quantization Indexes are created to save memory and computational resources while sacrificing some accuracy.

Full Fidelity Indexes maintain high accuracy but require more storage and computational power.

The indexes are created separately for codebase metadata and codebase files, allowing for efficient retrieval of relevant code snippets or documentation based on search queries.

[63]

New search index named 'vector_search_index_scalar_openai' is building.
Polling to check if the index 'vector_search_index_scalar_openai' is ready. This may take up to a minute.
vector_search_index_scalar_openai is ready for querying.
New search index named 'vector_search_index_scalar_voyage' is building.
Polling to check if the index 'vector_search_index_scalar_voyage' is ready. This may take up to a minute.
vector_search_index_scalar_voyage is ready for querying.
New search index named 'vector_search_index_full_fidelity_openai' is building.
Polling to check if the index 'vector_search_index_full_fidelity_openai' is ready. This may take up to a minute.
vector_search_index_full_fidelity_openai is ready for querying.
New search index named 'vector_search_index_full_fidelity_voyage' is building.
Polling to check if the index 'vector_search_index_full_fidelity_voyage' is ready. This may take up to a minute.
vector_search_index_full_fidelity_voyage is ready for querying.

[64]

New search index named 'vector_search_index_scalar_voyage' is building.
Polling to check if the index 'vector_search_index_scalar_voyage' is ready. This may take up to a minute.
vector_search_index_scalar_voyage is ready for querying.
New search index named 'vector_search_index_scalar_openai' is building.
Polling to check if the index 'vector_search_index_scalar_openai' is ready. This may take up to a minute.
vector_search_index_scalar_openai is ready for querying.
Vector search index 'vector_search_index_full_fidelity_openai' already exists.
Vector search index 'vector_search_index_full_fidelity_voyage' already exists.

[65]

[66]

[67]

[68]

Search index 'codebase_metadata_index' created successfully
Search index 'codebase_files_index' created successfully

'codebase_files_index'

[69]

DeleteResult({'n': 79, 'electionId': ObjectId('7fffffff000000000000000b'), 'opTime': {'ts': Timestamp(1755675736, 9), 't': 11}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1755675736, 9), 'signature': {'hash': b"\x86\xcf\xa6\x0f\x9e\x8c]\xaa'H\x05T\xe1\x07(\x0b&t&f", 'keyId': 7520068280199938053}}, 'operationTime': Timestamp(1755675736, 9)}, acknowledged=True)

[70]

InsertManyResult([ObjectId('68a57c593b3ae10de5e80534'), ObjectId('68a57c593b3ae10de5e80535'), ObjectId('68a57c593b3ae10de5e80536'), ObjectId('68a57c593b3ae10de5e80537'), ObjectId('68a57c593b3ae10de5e80538'), ObjectId('68a57c593b3ae10de5e80539'), ObjectId('68a57c593b3ae10de5e8053a'), ObjectId('68a57c593b3ae10de5e8053b'), ObjectId('68a57c593b3ae10de5e8053c'), ObjectId('68a57c593b3ae10de5e8053d'), ObjectId('68a57c593b3ae10de5e8053e'), ObjectId('68a57c593b3ae10de5e8053f'), ObjectId('68a57c593b3ae10de5e80540'), ObjectId('68a57c593b3ae10de5e80541'), ObjectId('68a57c593b3ae10de5e80542'), ObjectId('68a57c593b3ae10de5e80543'), ObjectId('68a57c593b3ae10de5e80544'), ObjectId('68a57c593b3ae10de5e80545'), ObjectId('68a57c593b3ae10de5e80546'), ObjectId('68a57c593b3ae10de5e80547'), ObjectId('68a57c593b3ae10de5e80548'), ObjectId('68a57c593b3ae10de5e80549'), ObjectId('68a57c593b3ae10de5e8054a'), ObjectId('68a57c593b3ae10de5e8054b'), ObjectId('68a57c593b3ae10de5e8054c'), ObjectId('68a57c593b3ae10de5e8054d'), ObjectId('68a57c593b3ae10de5e8054e'), ObjectId('68a57c593b3ae10de5e8054f'), ObjectId('68a57c593b3ae10de5e80550'), ObjectId('68a57c593b3ae10de5e80551'), ObjectId('68a57c593b3ae10de5e80552'), ObjectId('68a57c593b3ae10de5e80553'), ObjectId('68a57c593b3ae10de5e80554'), ObjectId('68a57c593b3ae10de5e80555'), ObjectId('68a57c593b3ae10de5e80556'), ObjectId('68a57c593b3ae10de5e80557'), ObjectId('68a57c593b3ae10de5e80558'), ObjectId('68a57c593b3ae10de5e80559'), ObjectId('68a57c593b3ae10de5e8055a'), ObjectId('68a57c593b3ae10de5e8055b'), ObjectId('68a57c593b3ae10de5e8055c'), ObjectId('68a57c593b3ae10de5e8055d'), ObjectId('68a57c593b3ae10de5e8055e'), ObjectId('68a57c593b3ae10de5e8055f'), ObjectId('68a57c593b3ae10de5e80560'), ObjectId('68a57c593b3ae10de5e80561'), ObjectId('68a57c593b3ae10de5e80562'), ObjectId('68a57c593b3ae10de5e80563'), ObjectId('68a57c593b3ae10de5e80564'), ObjectId('68a57c593b3ae10de5e80565'), ObjectId('68a57c593b3ae10de5e80566'), ObjectId('68a57c593b3ae10de5e80567'), ObjectId('68a57c593b3ae10de5e80568'), ObjectId('68a57c593b3ae10de5e80569'), ObjectId('68a57c593b3ae10de5e8056a'), ObjectId('68a57c593b3ae10de5e8056b'), ObjectId('68a57c593b3ae10de5e8056c'), ObjectId('68a57c593b3ae10de5e8056d'), ObjectId('68a57c593b3ae10de5e8056e'), ObjectId('68a57c593b3ae10de5e8056f'), ObjectId('68a57c593b3ae10de5e80570'), ObjectId('68a57c593b3ae10de5e80571'), ObjectId('68a57c593b3ae10de5e80572'), ObjectId('68a57c593b3ae10de5e80573'), ObjectId('68a57c593b3ae10de5e80574'), ObjectId('68a57c593b3ae10de5e80575'), ObjectId('68a57c593b3ae10de5e80576'), ObjectId('68a57c593b3ae10de5e80577'), ObjectId('68a57c593b3ae10de5e80578'), ObjectId('68a57c593b3ae10de5e80579'), ObjectId('68a57c593b3ae10de5e8057a'), ObjectId('68a57c593b3ae10de5e8057b'), ObjectId('68a57c593b3ae10de5e8057c'), ObjectId('68a57c593b3ae10de5e8057d'), ObjectId('68a57c593b3ae10de5e8057e'), ObjectId('68a57c593b3ae10de5e8057f'), ObjectId('68a57c593b3ae10de5e80580'), ObjectId('68a57c593b3ae10de5e80581'), ObjectId('68a57c593b3ae10de5e80582')], acknowledged=True)

Part 2: Information Retrieval and RAG

2.1 Semantic Search powered by Vector Search

[71]

[72]

[73]

[77]

Let's compare the results of the two embeddings

[78]

[79]

[80]

Vector Search is not all you need

2.2 Hybrid Search (Semantic + Text)

[81]

[82]

Found 5 results for query: 'Get me the class responsible for creating agents and running them and has all the code for calling LLMs and tools'

[83]

Found 5 results for query: 'Get me the class responsible for creating agents and running them and has all the code for calling LLMs and tools'

[84]

[85]

2.3 Retrieval Evaluation (Determinstic/Programatic Approach)

This section describes our systematic approach to evaluating RAG system performance using deterministic metrics.

Unlike subjective human evaluation, this programmatic method provides consistent, reproducible measurements of retrieval quality.

Our evaluation framework measures how well the RAG system retrieves relevant documents by comparing actual search results against a curated ground-truth dataset.

The process is deterministic because it uses objective criteria and programmatic because it's fully automated.

Key Components:

Ground Truth Dataset: 52 carefully crafted queries with manually annotated expected files
Automated Evaluation: Python scripts that calculate precision, recall, and F1 scores
Filename-Based Matching: Compares retrieved filenames against expected filenames

Evaluation Dataset Structure

Our evaluation dataset (rag_evaluation_dataset.csv) contains three columns:

Column	Description	Example
`Input`	User query	"How do I create a basic MemAgent with OpenAI and MongoDB?"
`Expected Output`	Human-written ideal response	"To create a basic MemAgent, you need to: 1) Set up MongoDB configuration..."
`Expected Files`	Comma-separated list of relevant files	"README.md,src/memagent.py,examples/memagent_single_agent.ipynb"

The evaluation follows these steps:

Query Execution: Each query is processed through the RAG system's search function
Filename Extraction: File paths are extracted from search results and converted to basenames
Set Comparison: Retrieved filenames are compared against expected filenames using set operations
Metric Calculation: Precision, recall, and F1 scores are computed for each query
Aggregation: Individual scores are averaged to get overall system performance

Precision Definition: The fraction of retrieved documents that are relevant. Formula: Precision = True Positives / (True Positives + False Positives) True Positives (TP): Relevant documents retrieved. False Positives (FP): Irrelevant documents retrieved. Interpretation:

High precision (close to 1.0) means most retrieved files are actually relevant
Low precision means the system returns many irrelevant files

Query: "How do I register a function as a tool?"
Retrieved: ['toolbox.py', 'memagent.py', 'unrelated_file.py']
Expected: ['toolbox.py', 'memagent.py', 'tool_schema.py']

True Positives: 2 (toolbox.py, memagent.py)
False Positives: 1 (unrelated_file.py)
Precision = 2/3 = 0.667

Recall Definition: The fraction of relevant documents that are retrieved. Formula: Recall = True Positives / (True Positives + False Negatives) True Positives (TP): Relevant documents retrieved. False Negatives (FN): Relevant documents that were not retrieved.

Interpretation: High recall (close to 1.0) means the system finds most relevant files Low recall means the system misses many relevant files

True Positives: 2 (toolbox.py, memagent.py)
False Negatives: 1 (tool_schema.py was expected but not retrieved)
Recall = 2/3 = 0.667

F1 Score Definition: The harmonic mean of precision and recall, providing a balanced measure. Formula: F1 = 2 × (Precision × Recall) / (Precision + Recall)

Interpretation: F1 score balances precision and recall Perfect F1 = 1.0 (perfect precision and recall) F1 = 0.0 when either precision or recall is 0

F1 = 2 × (0.667 × 0.667) / (0.667 + 0.667) = 0.667

Precision focuses on the accuracy of the retrieved documents, and looks at the total number of retrieved documents.

Recall focuses on the completeness of the retrieved relevant documents and looks at the total number of relevant documents, regardless of whether they were retrieved or not

[86]

[87]

2.3.1 Running the evaluation

[88]

🚀 Starting RAG Evaluation(Deterministic/Programmatic Approach) for OpenAI...
🔍 Evaluating RAG system with 51 queries...
================================================================================
Query  1: P=0.429 | R=0.600 | F1=0.500
   Query: How do I create a basic MemAgent with OpenAI and MongoDB?...
   Expected (5): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'openai.py', 'provider.py']
   Retrieved (7): ['README.md', '__init__.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memagents_multi_agents.ipynb', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py', 'memagent_single_agent.ipynb']

Query  2: P=0.333 | R=0.400 | F1=0.364
   Query: What are the different memory types available in MemoRizz?...
   Expected (5): ['README.md', '__init__.py', 'base.py', 'memagent.py', 'memory_type.py']
   Retrieved (6): ['MEMORY_ARCHITECTURE.md', 'README.md', 'evaluate_hierarchical_pattern.py', 'evaluate_memorizz.py', 'memory_type.py', 'persona.ipynb']
   ✅ Matches: ['README.md', 'memory_type.py']

Query  3: P=0.667 | R=0.800 | F1=0.727
   Query: How do I register a function as a tool in the toolbox?...
   Expected (5): ['README.md', 'memagent.py', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']
   Retrieved (6): ['README.md', '__init__.py', 'test_memagent_enhanced_tools.py', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']
   ✅ Matches: ['README.md', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']

Query  4: P=0.833 | R=1.000 | F1=0.909
   Query: What persona role types are predefined in the system?...
   Expected (5): ['README.md', '__init__.py', 'persona.ipynb', 'persona.py', 'role_type.py']
   Retrieved (6): ['README.md', '__init__.py', 'persona.ipynb', 'persona.py', 'role.py', 'role_type.py']
   ✅ Matches: ['README.md', '__init__.py', 'persona.ipynb', 'persona.py', 'role_type.py']

Query  5: P=0.333 | R=0.500 | F1=0.400
   Query: How do I set up MongoDB Atlas with vector search for MemoRizz?...
   Expected (4): ['README.md', 'memagent_single_agent.ipynb', 'openai.py', 'provider.py']
   Retrieved (6): ['README.md', '__init__.py', 'knowledge_base.ipynb', 'memagent_single_agent.ipynb', 'memagent_summarisation.ipynb', 'persona.ipynb']
   ✅ Matches: ['README.md', 'memagent_single_agent.ipynb']

Query  6: P=0.500 | R=0.600 | F1=0.545
   Query: How does the multi-agent orchestration work?...
   Expected (5): ['README.md', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'shared_memory.py', 'task_decomposition.py']
   Retrieved (6): ['README.md', 'evaluate_delegate_pattern.py', 'memagent_single_agent.ipynb', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'workflow.py']
   ✅ Matches: ['README.md', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py']

Query  7: P=0.400 | R=0.400 | F1=0.400
   Query: What application modes are available and how do they work?...
   Expected (5): ['README.md', 'application_mode.py', 'memagent.py', 'memory_component.py', 'memory_type.py']
   Retrieved (5): ['README.md', '__init__.py', 'application_mode.py', 'test_memagent_enhanced_tools.py', 'workflow.py']
   ✅ Matches: ['README.md', 'application_mode.py']

Query  8: P=0.400 | R=0.400 | F1=0.400
   Query: How do I add and retrieve long-term memory knowledge?...
   Expected (5): ['README.md', 'knowledge_base.ipynb', 'knowledge_base.py', 'memagent.py', 'provider.py']
   Retrieved (5): ['README.md', 'evaluate_hierarchical_pattern.py', 'evaluate_memorizz.py', 'knowledge_base.py', 'workflow.py']
   ✅ Matches: ['README.md', 'knowledge_base.py']

Query  9: P=0.000 | R=0.000 | F1=0.000
   Query: How does the conversation memory component work?...
   Expected (5): ['conversational_memory_component.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_component.py', 'memory_type.py']
   Retrieved (5): ['README.md', '__init__.py', 'conversational_memory_unit.py', 'memory_unit.py', 'workflow.py']
   ❌ No matches found

Query 10: P=0.500 | R=0.600 | F1=0.545
   Query: What's the difference between semantic cache and long-term memory?...
   Expected (5): ['README.md', 'knowledge_base.py', 'memagent.py', 'memory_type.py', 'semantic_cache.py']
   Retrieved (6): ['README.md', '__init__.py', 'evaluate_hierarchical_pattern.py', 'evaluate_memorizz.py', 'knowledge_base.py', 'semantic_cache.py']
   ✅ Matches: ['README.md', 'knowledge_base.py', 'semantic_cache.py']

Query 11: P=0.500 | R=0.600 | F1=0.545
   Query: How do I create a custom persona for my agent?...
   Expected (5): ['README.md', 'memagent.py', 'persona.ipynb', 'persona.py', 'role_type.py']
   Retrieved (6): ['README.md', '__init__.py', 'memagents_multi_agents.ipynb', 'persona.ipynb', 'persona.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'persona.ipynb', 'persona.py']

Query 12: P=0.667 | R=0.800 | F1=0.727
   Query: How does tool discovery work in the toolbox?...
   Expected (5): ['README.md', 'openai.py', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']
   Retrieved (6): ['README.md', '__init__.py', 'test_memagent_enhanced_tools.py', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']
   ✅ Matches: ['README.md', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']

Query 13: P=0.286 | R=0.500 | F1=0.364
   Query: What are the required dependencies for MemoRizz?...
   Expected (4): ['README.md', 'ollama.py', 'openai.py', 'pyproject.toml']
   Retrieved (7): ['README.md', 'evaluate_hierarchical_pattern.py', 'knowledge_base.ipynb', 'memagent_single_agent.ipynb', 'memagent_summarisation.ipynb', 'persona.ipynb', 'pyproject.toml']
   ✅ Matches: ['README.md', 'pyproject.toml']

Query 14: P=0.250 | R=0.400 | F1=0.308
   Query: How do I implement error handling in my MemAgent?...
   Expected (5): ['memagent.py', 'memagent_single_agent.ipynb', 'memory_component.py', 'provider.py', 'toolbox.py']
   Retrieved (8): ['README.md', 'evaluate_delegate_pattern.py', 'memagent.py', 'memagent_single_agent.ipynb', 'multi_agent_orchestrator.py', 'summary_component.py', 'test_memagent_enhanced_tools.py', 'workflow.py']
   ✅ Matches: ['memagent.py', 'memagent_single_agent.ipynb']

Query 15: P=0.333 | R=0.400 | F1=0.364
   Query: How does context window management work?...
   Expected (5): ['README.md', 'cwm.py', 'memagent.py', 'memory_type.py', 'summary_component.py']
   Retrieved (6): ['README.md', 'cwm.py', 'memory_unit.py', 'multi_agent_orchestrator.py', 'test_memagent_enhanced_tools.py', 'workflow.py']
   ✅ Matches: ['README.md', 'cwm.py']

Query 16: P=0.667 | R=0.400 | F1=0.500
   Query: What's the workflow memory system and how do I use it?...
   Expected (5): ['README.md', 'memagent.py', 'memory_type.py', 'workflow.ipynb', 'workflow.py']
   Retrieved (3): ['MEMORY_ARCHITECTURE.md', 'README.md', 'workflow.py']
   ✅ Matches: ['README.md', 'workflow.py']

Query 17: P=0.429 | R=0.600 | F1=0.500
   Query: How do I save and load existing agents?...
   Expected (5): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py', 'provider.py']
   Retrieved (7): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py', 'memagent_single_agent.ipynb']

Query 18: P=0.333 | R=0.200 | F1=0.250
   Query: What embedding providers are supported?...
   Expected (5): ['README.md', 'ollama.py', 'openai.py', 'test-ollama-embed.ipynb', 'test-openai-embed.ipynb']
   Retrieved (3): ['README.md', '__init__.py', 'provider.py']
   ✅ Matches: ['README.md']

Query 19: P=0.250 | R=0.400 | F1=0.308
   Query: How do I configure different memory types for my agent?...
   Expected (5): ['README.md', 'application_mode.py', 'memagent.py', 'memory_component.py', 'memory_type.py']
   Retrieved (8): ['MEMORY_ARCHITECTURE.md', 'README.md', 'cwm.py', 'evaluate_delegate_pattern.py', 'memagent.py', 'memagents_multi_agents.ipynb', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py']

Query 20: P=0.333 | R=0.400 | F1=0.364
   Query: What's the difference between delegates and shared memory?...
   Expected (5): ['README.md', 'memagent.py', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'shared_memory.py']
   Retrieved (6): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'evaluate_delegate_pattern.py', 'shared_memory.py', 'workflow.py']
   ✅ Matches: ['README.md', 'shared_memory.py']

Query 21: P=0.333 | R=0.400 | F1=0.364
   Query: How do I implement tool access control?...
   Expected (5): ['README.md', 'memagent.py', 'provider.py', 'toolbox.ipynb', 'toolbox.py']
   Retrieved (6): ['README.md', '__init__.py', 'test_memagent_enhanced_tools.py', 'tool_schema.py', 'toolbox.py', 'workflow.py']
   ✅ Matches: ['README.md', 'toolbox.py']

Query 22: P=0.400 | R=0.400 | F1=0.400
   Query: What's the schema generation system for tools?...
   Expected (5): ['README.md', 'memagent.py', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']
   Retrieved (5): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'test_memagent_enhanced_tools.py', 'tool_schema.py']
   ✅ Matches: ['README.md', 'tool_schema.py']

Query 23: P=0.500 | R=0.400 | F1=0.444
   Query: How does the summary component work for memory management?...
   Expected (5): ['README.md', 'cwm.py', 'memagent.py', 'memory_type.py', 'summary_component.py']
   Retrieved (4): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'summary_component.py']
   ✅ Matches: ['README.md', 'summary_component.py']

Query 24: P=0.400 | R=0.400 | F1=0.400
   Query: What MongoDB collections does MemoRizz create?...
   Expected (5): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py', 'provider.py']
   Retrieved (5): ['README.md', '__init__.py', 'knowledge_base.ipynb', 'memagent_single_agent.ipynb', 'persona.ipynb']
   ✅ Matches: ['README.md', 'memagent_single_agent.ipynb']

Query 25: P=0.429 | R=0.600 | F1=0.500
   Query: How do I handle agent updates and versioning?...
   Expected (5): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py', 'provider.py']
   Retrieved (7): ['README.md', 'evaluate_delegate_pattern.py', 'memagent.py', 'memagent_single_agent.ipynb', 'multi_agent_orchestrator.py', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py', 'memagent_single_agent.ipynb']

Query 26: P=0.429 | R=0.600 | F1=0.500
   Query: What's the task decomposition system?...
   Expected (5): ['README.md', 'memagent.py', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'task_decomposition.py']
   Retrieved (7): ['MEMORY_ARCHITECTURE.md', 'README.md', 'README_evaluation_architectures.md', 'evaluate_delegate_pattern.py', 'evaluate_hierarchical_pattern.py', 'multi_agent_orchestrator.py', 'task_decomposition.py']
   ✅ Matches: ['README.md', 'multi_agent_orchestrator.py', 'task_decomposition.py']

Query 27: P=0.429 | R=0.600 | F1=0.500
   Query: How do I implement custom memory providers?...
   Expected (5): ['README.md', '__init__.py', 'base.py', 'memagent.py', 'provider.py']
   Retrieved (7): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'memory_unit.py', 'provider.py', 'shared_memory.py', 'workflow.py']
   ✅ Matches: ['README.md', '__init__.py', 'provider.py']

Query 28: P=0.125 | R=0.200 | F1=0.154
   Query: What's the role system in conversations?...
   Expected (5): ['README.md', 'conversational_memory_component.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py']
   Retrieved (8): ['README.md', '__init__.py', 'conversational_memory_unit.py', 'evaluate_hierarchical_pattern.py', 'memory_unit.py', 'multi_agent_orchestrator.py', 'role.py', 'role_type.py']
   ✅ Matches: ['README.md']

Query 29: P=0.375 | R=0.600 | F1=0.462
   Query: How does vector search indexing work?...
   Expected (5): ['README.md', 'knowledge_base.ipynb', 'knowledge_base.py', 'openai.py', 'provider.py']
   Retrieved (8): ['MEMORY_ARCHITECTURE.md', 'README.md', 'README_evaluation_architectures.md', '__init__.py', 'evaluate_hierarchical_pattern.py', 'knowledge_base.py', 'memagent_single_agent.ipynb', 'provider.py']
   ✅ Matches: ['README.md', 'knowledge_base.py', 'provider.py']

Query 30: P=0.333 | R=0.400 | F1=0.364
   Query: What are the configuration constants in MemAgent?...
   Expected (5): ['README.md', 'application_mode.py', 'memagent.py', 'memagent_single_agent.ipynb', 'toolbox.py']
   Retrieved (6): ['README.md', '__init__.py', 'evaluate_delegate_pattern.py', 'memagent.py', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py']

Query 31: P=0.500 | R=0.600 | F1=0.545
   Query: How do I handle nested multi-agent scenarios?...
   Expected (5): ['README.md', 'memagent.py', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'shared_memory.py']
   Retrieved (6): ['README.md', 'evaluate_delegate_pattern.py', 'memagent_single_agent.ipynb', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py']

Query 32: P=0.286 | R=0.400 | F1=0.333
   Query: What's the MemAgentModel for?...
   Expected (5): ['README.md', 'memagent.py', 'memory_type.py', 'persona.py', 'provider.py']
   Retrieved (7): ['README.md', 'evaluate_delegate_pattern.py', 'evaluate_memorizz.py', 'memagent.py', 'multi_agent_orchestrator.py', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py']

Query 33: P=0.375 | R=0.600 | F1=0.462
   Query: How do I delete agents and their associated memories?...
   Expected (5): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py', 'provider.py']
   Retrieved (8): ['README.md', 'evaluate_delegate_pattern.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'summary_component.py', 'test_memagent_enhanced_tools.py']
   ✅ Matches: ['README.md', 'memagent.py', 'memagent_single_agent.ipynb']

Query 34: P=0.500 | R=0.600 | F1=0.545
   Query: What's the database tools configuration system?...
   Expected (5): ['README.md', 'mongodb_tools.py', 'openai.py', 'toolbox.ipynb', 'toolbox.py']
   Retrieved (6): ['README.md', '__init__.py', 'mongodb_tools.py', 'test_memagent_enhanced_tools.py', 'tool_schema.py', 'toolbox.py']
   ✅ Matches: ['README.md', 'mongodb_tools.py', 'toolbox.py']

Query 35: P=0.200 | R=0.200 | F1=0.200
   Query: How does the instruction system work for agents?...
   Expected (5): ['README.md', 'memagent.py', 'memagent_single_agent.ipynb', 'persona.ipynb', 'persona.py']
   Retrieved (5): ['README.md', 'evaluate_delegate_pattern.py', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'summary_component.py']
   ✅ Matches: ['README.md']

Query 36: P=0.167 | R=0.200 | F1=0.182
   Query: What's the maximum steps configuration?...
   Expected (5): ['README.md', 'application_mode.py', 'memagent.py', 'memagent_single_agent.ipynb', 'toolbox.ipynb']
   Retrieved (6): ['MEMORY_ARCHITECTURE.md', 'README.md', 'README_evaluation_architectures.md', 'evaluate_hierarchical_pattern.py', 'knowledge_base.ipynb', 'pyproject.toml']
   ✅ Matches: ['README.md']

Query 37: P=0.000 | R=0.000 | F1=0.000
   Query: How do I implement conversation threading?...
   Expected (5): ['README.md', 'conversational_memory_component.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py']
   Retrieved (8): ['__init__.py', 'conversational_memory_unit.py', 'evaluate_hierarchical_pattern.py', 'memory_unit.py', 'multi_agent_orchestrator.py', 'provider.py', 'role.py', 'workflow.py']
   ❌ No matches found

Query 38: P=0.333 | R=0.400 | F1=0.364
   Query: What's the namespace system in long-term memory?...
   Expected (5): ['README.md', 'knowledge_base.ipynb', 'knowledge_base.py', 'memagent.py', 'provider.py']
   Retrieved (6): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'evaluate_hierarchical_pattern.py', 'knowledge_base.py', 'workflow.py']
   ✅ Matches: ['README.md', 'knowledge_base.py']

Query 39: P=0.200 | R=0.200 | F1=0.200
   Query: How does the embedding dimension system work?...
   Expected (5): ['ollama.py', 'openai.py', 'provider.py', 'test-ollama-embed.ipynb', 'test-openai-embed.ipynb']
   Retrieved (5): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'evaluate_hierarchical_pattern.py', 'provider.py']
   ✅ Matches: ['provider.py']

Query 40: P=0.143 | R=0.200 | F1=0.167
   Query: What are the security considerations for MemoRizz?...
   Expected (5): ['LICENCE.txt', 'README.md', 'openai.py', 'provider.py', 'pyproject.toml']
   Retrieved (7): ['README.md', 'evaluate_hierarchical_pattern.py', 'evaluate_memorizz.py', 'knowledge_base.ipynb', 'memagent_single_agent.ipynb', 'memagent_summarisation.ipynb', 'persona.ipynb']
   ✅ Matches: ['README.md']

Query 41: P=0.167 | R=0.200 | F1=0.182
   Query: How do I configure logging in MemoRizz?...
   Expected (5): ['README.md', 'memagent.py', 'multi_agent_orchestrator.py', 'provider.py', 'toolbox.py']
   Retrieved (6): ['README.md', 'evaluate_hierarchical_pattern.py', 'evaluate_memorizz.py', 'knowledge_base.ipynb', 'memagent_summarisation.ipynb', 'persona.ipynb']
   ✅ Matches: ['README.md']

Query 42: P=0.167 | R=0.200 | F1=0.182
   Query: What's the ObjectId system in MongoDB integration?...
   Expected (5): ['knowledge_base.ipynb', 'knowledge_base.py', 'memagent.py', 'provider.py', 'toolbox.py']
   Retrieved (6): ['README.md', '__init__.py', 'mongodb_tools.py', 'persona.py', 'provider.py', 'workflow.py']
   ✅ Matches: ['provider.py']

Query 43: P=0.375 | R=0.600 | F1=0.462
   Query: How do I handle concurrent agent operations?...
   Expected (5): ['README.md', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'provider.py', 'shared_memory.py']
   Retrieved (8): ['README.md', 'evaluate_delegate_pattern.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'test_memagent_enhanced_tools.py', 'workflow.py']
   ✅ Matches: ['README.md', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py']

Query 44: P=0.500 | R=0.800 | F1=0.615
   Query: What's the tool augmentation feature?...
   Expected (5): ['README.md', 'openai.py', 'provider.py', 'toolbox.ipynb', 'toolbox.py']
   Retrieved (8): ['README.md', '__init__.py', 'provider.py', 'task_decomposition.py', 'test_memagent_enhanced_tools.py', 'tool_schema.py', 'toolbox.ipynb', 'toolbox.py']
   ✅ Matches: ['README.md', 'provider.py', 'toolbox.ipynb', 'toolbox.py']

Query 45: P=0.750 | R=0.600 | F1=0.667
   Query: How does the persona background system work?...
   Expected (5): ['README.md', 'memagent.py', 'persona.ipynb', 'persona.py', 'role_type.py']
   Retrieved (4): ['README.md', '__init__.py', 'persona.ipynb', 'persona.py']
   ✅ Matches: ['README.md', 'persona.ipynb', 'persona.py']

Query 46: P=0.200 | R=0.200 | F1=0.200
   Query: What's the conversation memory retrieval system?...
   Expected (5): ['README.md', 'conversational_memory_component.py', 'memagent.py', 'memagent_single_agent.ipynb', 'provider.py']
   Retrieved (5): ['README.md', '__init__.py', 'conversational_memory_unit.py', 'evaluate_hierarchical_pattern.py', 'memory_unit.py']
   ✅ Matches: ['README.md']

Query 47: P=0.333 | R=0.400 | F1=0.364
   Query: How do I implement custom application modes?...
   Expected (5): ['README.md', 'application_mode.py', 'memagent.py', 'memagent_single_agent.ipynb', 'memory_type.py']
   Retrieved (6): ['README.md', '__init__.py', 'application_mode.py', 'persona.ipynb', 'test_memagent_enhanced_tools.py', 'workflow.py']
   ✅ Matches: ['README.md', 'application_mode.py']

Query 48: P=0.667 | R=0.800 | F1=0.727
   Query: What's the agent ID generation system?...
   Expected (5): ['README.md', 'memagent.py', 'memagents_multi_agents.ipynb', 'multi_agent_orchestrator.py', 'provider.py']
   Retrieved (6): ['README.md', 'evaluate_delegate_pattern.py', 'memagent.py', 'multi_agent_orchestrator.py', 'provider.py', 'summary_component.py']
   ✅ Matches: ['README.md', 'memagent.py', 'multi_agent_orchestrator.py', 'provider.py']

Query 49: P=0.429 | R=0.600 | F1=0.500
   Query: How does the search index model work?...
   Expected (5): ['README.md', 'knowledge_base.ipynb', 'knowledge_base.py', 'openai.py', 'provider.py']
   Retrieved (7): ['README.md', 'evaluate_hierarchical_pattern.py', 'evaluate_memorizz.py', 'knowledge_base.py', 'memagent_single_agent.ipynb', 'provider.py', 'workflow.py']
   ✅ Matches: ['README.md', 'knowledge_base.py', 'provider.py']

Query 50: P=0.400 | R=0.400 | F1=0.400
   Query: What's the shared memory session system?...
   Expected (5): ['README.md', 'memagents_multi_agents.ipynb', 'memory_type.py', 'multi_agent_orchestrator.py', 'shared_memory.py']
   Retrieved (5): ['MEMORY_ARCHITECTURE.md', 'README.md', '__init__.py', 'shared_memory.py', 'summary_component.py']
   ✅ Matches: ['README.md', 'shared_memory.py']

Query 51: P=0.375 | R=0.750 | F1=0.500
   Query: How do I handle environment variables and API key management?...
   Expected (4): ['README.md', 'memagent_single_agent.ipynb', 'openai.py', 'provider.py']
   Retrieved (8): ['README.md', '__init__.py', 'knowledge_base.ipynb', 'memagent.py', 'memagent_summarisation.ipynb', 'openai.py', 'provider.py', 'workflow.ipynb']
   ✅ Matches: ['README.md', 'openai.py', 'provider.py']

================================================================================
📊 FINAL EVALUATION RESULTS:
   Average Precision: 0.378
   Average Recall:    0.462
   Average F1 Score:  0.410
================================================================================

[89]

📈 Evaluation complete! Your RAG system achieved:
   • Precision: 37.8%
   • Recall: 46.2%
   • F1 Score: 41.0%

📊 Additional Statistics:
   • Best Precision: 0.833
   • Best Recall: 1.000
   • Best F1 Score: 0.909
   • Worst Precision: 0.000
   • Worst Recall: 0.000
   • Worst F1 Score: 0.000
   • Std Dev Precision: 0.172
   • Std Dev Recall: 0.209
   • Std Dev F1: 0.180

(<Figure size 1500x400 with 3 Axes>,
, array([<Axes: title={'center': 'Precision Distribution'}, xlabel='Precision', ylabel='Frequency'>,
,        <Axes: title={'center': 'Recall Distribution'}, xlabel='Recall', ylabel='Frequency'>,
,        <Axes: title={'center': 'F1 Score Distribution'}, xlabel='F1 Score', ylabel='Frequency'>],
,       dtype=object))

[ ]

[80]

📈 Evaluation complete! Your RAG system achieved:
   • Precision: 32.8%
   • Recall: 61.0%
   • F1 Score: 42.5%

📊 Additional Statistics:
   • Best Precision: 0.625
   • Best Recall: 1.000
   • Best F1 Score: 0.769
   • Worst Precision: 0.100
   • Worst Recall: 0.200
   • Worst F1 Score: 0.133
   • Std Dev Precision: 0.118
   • Std Dev Recall: 0.201
   • Std Dev F1: 0.147

(<Figure size 1500x400 with 3 Axes>,
, array([<Axes: title={'center': 'Precision Distribution'}, xlabel='Precision', ylabel='Frequency'>,
,        <Axes: title={'center': 'Recall Distribution'}, xlabel='Recall', ylabel='Frequency'>,
,        <Axes: title={'center': 'F1 Score Distribution'}, xlabel='F1 Score', ylabel='Frequency'>],
,       dtype=object))

2.4 Retrieval Augmented Generation

[91]

[92]

Based on the provided context, the file responsible for the definition of the MemAgent class is likely **memagent.py**. The comments and import statements in other files indicate that MemAgent is imported from memagent.py. For example, in **cwm.py**, there is a commented line:

```python
# from ..memagent import MemAgent
```

This suggests that MemAgent is defined in the memagent.py file.

[93]

The file responsible for the definition of the MemAgent class is memagent.py. This file manages memory-related operations in the project.

2.5 ReRanking with VoyageAI (rerank-2)

[94]

[95]

[96]

[97]

🔄 Reranking Position Changes
============================================================
📊 Summary:
   📈 Moved Up:     2 docs (avg: +6.5)
   📉 Moved Down:   2 docs (avg: -2.0)
   ➡️  No Change:    1 docs

📋 Top 5 After Reranking:
------------------------------------------------------------
 1. summary_component.py |   ⬆️+6 | 0.605 | was #7
 2. README.md            |   ⬆️+7 | 0.527 | was #9
 3. multi_agent_orchestrator.py |   ⬇️-2 | 0.465 | was #1
 4. memagents_multi_agents.ipynb |    ➡️0 | 0.432 | was #4
 5. evaluate_delegate_pattern.py |   ⬇️-2 | 0.379 | was #3

[101]

[102]

🔄 Reranking Position Changes
============================================================
📊 Summary:
   📈 Moved Up:     3 docs (avg: +2.3)
   📉 Moved Down:   1 docs (avg: -3.0)
   ➡️  No Change:    1 docs

📋 Top 5 After Reranking:
------------------------------------------------------------
 1. README.md            |   ⬆️+5 | 0.527 | was #6
 2. multi_agent_orchestrator.py |    ➡️0 | 0.465 | was #2
 3. cwm.py               |   ⬆️+1 | 0.461 | was #4
 4. memagents_multi_agents.ipynb |   ⬇️-3 | 0.432 | was #1
 5. README.md            |   ⬆️+1 | 0.328 | was #6

Why Domain‑Specific Embeddings Can Outperform General‑Purpose Ones

Syntax & Vocabulary Fit
Domain models “speak code”—they treat identifiers, decorators and type hints as meaningful signals rather than noise.
Stronger Semantic Clustering
Code‑centric embeddings group related functions, imports and idioms more tightly, so the most relevant files rise to the top.
Score Thresholds & Latent‑Space Concentration
In a high‑dimensional embedding space (e.g. 1,024–2,048 dims), semantically similar code artifacts form tight clusters—provided the model has been trained on code.
- voyage‑code‑3 dedicates its entire capacity to code patterns (imports, signatures, docstrings), so relevant files consistently score above common developer thresholds (0.7–0.8) when compared to general‑purpose embeddings.
- In our 52‑query benchmark:
  - At a 0.8 cut‑off, the code‑specific model surfaces 3 high‑confidence files, whereas the general model surfaces none.
  - Lowering to 0.7 yields all 10 correct hits for voyage‑code‑3 versus only 2 for the general model.
    This demonstrates how domain‑focused embeddings amplify true positives above practical similarity thresholds.
Reranking Stability
After reranking, voyage‑code‑3 results shifted minimally, indicating that its initial ranking was already aligned with true relevance. In contrast, the general‑purpose embedding required larger reshuffles:

In practice, setting a threshold filters out noise while retaining high‑confidence matches. With voyage‑code‑3, a 0.8 threshold would mean only truly relevant code files pass the cut, ensuring precision without sacrificing recall.

[234]

Part 3: Coding Agent

3.1 Create Agent State

[103]

3.2 Create Agent Tools

Create a tool that can transform github url to markdown content

[104]

Create tool that can prepare metadata for github repo

[105]

Create a tool that can process the data for ingestion

[106]

Ingest data into MongoDB

[107]

[108]

[109]

[110]

[111]

[ ]

Using a very simple LangGraph ReAct Agent

[118]

[119]

[124]

/Users/richmondalake/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `CompletionTokensDetails` - serialized value may not be as expected [input_value={'accepted_prediction_tok...d_prediction_tokens': 0}, input_type=dict])
  PydanticSerializationUnexpectedValue(Expected `PromptTokensDetails` - serialized value may not be as expected [input_value={'audio_tokens': 0, 'cached_tokens': 30848}, input_type=dict])
  return self.__pydantic_serializer__.to_python(
/Users/richmondalake/miniconda3/envs/galileo_webinars/lib/python3.11/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:
  PydanticSerializationUnexpectedValue(Expected `CompletionTokensDetails` - serialized value may not be as expected [input_value={'accepted_prediction_tok...d_prediction_tokens': 0}, input_type=dict])
  PydanticSerializationUnexpectedValue(Expected `PromptTokensDetails` - serialized value may not be as expected [input_value={'audio_tokens': 0, 'cached_tokens': 0}, input_type=dict])
  return self.__pydantic_serializer__.to_python(

[123]

'To write unit tests for the `MemAgent` class, we need to focus on its methods and their expected behaviors. A typical unit test structure involves setting up the necessary environment, creating instances, invoking methods, and asserting the expected outcomes. Here’s an example using the Python `unittest` framework:\n\nFirst, ensure you have a testing directory, e.g., `tests/`, and create a new file, e.g., `test_memagent.py`.\n\nHere\'s how you might structure some basic unit tests:\n\n```python\nimport unittest\nfrom unittest.mock import MagicMock, patch\nfrom memagent import MemAgent\nfrom memory_provider import MemoryProvider  # Ensure these imports match your actual module structure\nfrom persona import Persona\nfrom llms.openai import OpenAI\n\nclass TestMemAgent(unittest.TestCase):\n\n    def setUp(self):\n        # Mock the MemoryProvider and other dependencies you need\n        self.mock_memory_provider = MagicMock(spec=MemoryProvider)\n\n        # Create a sample persona\n        self.sample_persona = Persona()\n\n        # Create a MemAgent instance using mocks\n        self.agent = MemAgent(\n            model=OpenAI(),\n            memory_provider=self.mock_memory_provider,\n            persona=self.sample_persona,\n            instruction="You are a helpful assistant.",\n            memory_types=["conversation_memory", "workflow_memory"],\n            agent_id="test-agent-id"\n        )\n\n    def test_initialization(self):\n        # Test if MemAgent is initialized correctly\n        self.assertEqual(self.agent.instruction, "You are a helpful assistant.")\n        self.assertEqual(self.agent.agent_id, "test-agent-id")\n        self.assertIsInstance(self.agent.memory_provider, MemoryProvider)\n        self.assertIsInstance(self.agent.persona, Persona)\n\n    def test_run_method(self):\n        # Mock any method dependencies\n        self.agent._execute_main_loop = MagicMock(return_value="Test response")\n        \n        # Call the run method\n        response = self.agent.run(query="What\'s the weather?")\n        \n        # Assert expected behavior\n        self.agent._execute_main_loop.assert_called_once()\n        self.assertEqual(response, "Test response")\n\n    def test_memory_management(self):\n        # Mock the memory related methods\n        self.mock_memory_provider.retrieve_memagent.return_value = self.agent\n        \n        # Test retrieving and updating memory\n        self.agent.memory_ids = ["mem-id-1"]\n        self.assertIn("mem-id-1", self.agent.memory_ids)\n\n        new_memory_id = "mem-id-2"\n        self.agent.update_memory([new_memory_id])\n        self.assertIn(new_memory_id, self.agent.memory_ids)\n        \n        # Test delete memory\n        self.agent.delete_memory()\n        self.assertEqual(self.agent.memory_ids, [])\n\n    def test_tool_management(self):\n        # Add a tool and check it\'s added\n        self.agent.tools = []\n        self.agent.add_tool(func=lambda x: x, persist=False)\n        self.assertEqual(len(self.agent.tools), 1)\n\nif __name__ == \'__main__\':\n    unittest.main()\n```\n\n### Important Notes:\n- Ensure that you mock external systems, such as actual calls to a database or external APIs.\n- This example assumes that various other classes and methods (`MemoryProvider`, `Persona`, `OpenAI`, etc.) are available from your imports and correctly implemented.\n- Modify the imports and method calls to suit your specific codebase structure.\n- Adjust the test methods based on the particular implementation details and behaviors you wish to verify for `MemAgent`.\n\nThis is a foundational setup. To enhance it, you might consider parameterizing tests with different configurations, testing additional methods, handling exceptions, and verifying side effects or state changes.'

Using LangGraph Graph and Node

3.3 LLM defintion

[125]

[126]

[127]

3.4 Agent Node definition

[128]

3.5 Tool Node Definition

[141]

3.6 Graph Definition

[142]

<langgraph.graph.state.StateGraph at 0x127ae14d0>

3.7 Adding Checkpointer

[143]

[144]

[151]

3.8 Executing the Coding Chat Agent

[152]

[ ]