Video Intelligence Agent
π¬ AI-Powered Real-time Video Stream Intelligence & Incident Detection System
An intelligent video monitoring system that provides real-time analysis and incident detection for live video streams, broadcasts, and recordings. The system combines AI embeddings, and languge models to automatically detect, analyze, and resolve video quality issues, network problems, and streaming incidents as they occur.
This notebook implements an AI-Powered Real-time Video Stream Intelligence & Incident Detection System that automatically monitors and analyzes video content to detect incidents, quality issues, and network problems. The system extracts frames from videos at regular intervals, generates semantic embeddings using Voyage AI's multimodal models, and creates detailed scene descriptions using OpenAI's GPT-4 Vision.
- Users can search through video content using natural language queries like "find frames with a referee" and instantly jump to relevant timestamps through an interactive HTML5 video player that displays similarity scores and scene descriptions.
The system supports multiple video sources including local files, webcams, and YouTube livestreams, storing all analysis data in MongoDB with vector search capabilities for fast retrieval. Built using an agent-based architecture, specialized AI agents handle different aspects like frame retrieval, video display, incident analysis, and stream monitoring.
- Real-time monitoring continuously processes live video feeds, compares frames against a database of known incidents, and provides immediate alerts with visual overlays.
This makes the system valuable for broadcast monitoring, security surveillance, quality assurance, and content discovery applications where organizations need to automatically detect issues, search large video archives, or monitor live streams for technical problems or security incidents.
Pre-requisite:
- Please ensure that you are using a MongoDB 8.1 database to use the new $rankFusion operator
Step 1: Extracting Embeddings and Metadata from Video Data
This step involves extracting video frames and generating corresponding embeddings and metadata descriptions to facilitate intelligent search functionality.
This step covers three key techniques:
- Video-to-frame conversion for image extraction
- Multimodal embedding generation with Voyage AI to encode semantic relationships between text and images
- Automated metadata generation using GPT-4o Vision Pro"
1.1 Video to Images Function Explanation
This function: video_to_images extracts still images from a video at regular time intervals (default: every 2 seconds).
What it does:
- Opens a video file using OpenCV
- Calculates which frames to extract based on the video's frame rate and desired time interval
- Loops through the video, saving only the frames that match the timing interval
- Saves each extracted frame as a JPEG with a timestamp filename (e.g., "frame_0001_t2.0s.jpg")
- Returns the total number of frames extracted
Key parameters:
video_path: Input video fileoutput_dir: Where to save the images (default: "frames")interval_seconds: Time between extractions (default: 2 seconds)
Usage:
The example usage extracts frames every 2 seconds from "videos/video.mp4" and saves them to a "frames" folder. This is useful for creating video thumbnails or analyzing video content frame by frame.
In the next cells we will be downloading a video to use for the video intelligence use case
Video for this use case is obtained from YouTube, but you can modify the cells below for your own use case
[youtube] Extracting URL: https://www.youtube.com/watch?v=20DThpeng84 [youtube] 20DThpeng84: Downloading webpage [youtube] 20DThpeng84: Downloading tv client config [youtube] 20DThpeng84: Downloading tv player API JSON [youtube] 20DThpeng84: Downloading ios player API JSON [youtube] 20DThpeng84: Downloading m3u8 information β οΈ Video >1h; this may take a while. β¬οΈ Downloading at format `best[height<=1080]`β¦ [youtube] Extracting URL: https://www.youtube.com/watch?v=20DThpeng84 [youtube] 20DThpeng84: Downloading webpage [youtube] 20DThpeng84: Downloading tv client config [youtube] 20DThpeng84: Downloading tv player API JSON [youtube] 20DThpeng84: Downloading ios player API JSON [youtube] 20DThpeng84: Downloading m3u8 information [info] 20DThpeng84: Downloading 1 format(s): 18 [download] Destination: videos/FC Barcelona vs Real Betis (5-1) ο½ CDR 2025 Full Match.mp4 [download] 100% of 390.84MiB in 00:01:47 at 3.63MiB/s
{'success': False,
, 'error': 'File not found after download',
, 'url': 'https://www.youtube.com/watch?v=20DThpeng84'} 1.2 Setting environment variables
/Users/richmondalake/miniconda3/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
1.3 Generating Embeddings with Voyage AI
Voyage AI multimodal-3 is a state-of-the-art embedding model that revolutionizes how we process documents containing interleaved text and images by vectorizing both modalities through a unified transformer backbone, eliminating the need for complex document parsing while improving retrieval accuracy by an average of 19.63% over competing models.
Unlike traditional CLIP-based models that process text and images separately, voyage-multimodal-3 captures the contextual relationships between visual and textual elements in screenshots, PDFs, slides, tables, and figures, making it ideal for RAG applications and semantic search across content-rich documents where visual layout and textual content are equally important.
Learn more on Voyage AI Multimodal embeddings here: https://docs.voyageai.com/docs/multimodal-embeddings
1.3.1 Generating metadata with each from OpenAI Vision
1.4 Generating frame metadata
This code processes video frames to extract both AI embeddings and text descriptions for intelligent search capabilities.
process_single_frame() - Handles individual frame processing:
- Loads an image using PIL
- Generates a vector embedding using Voyage AI (captures visual semantic meaning)
- Creates a text description using OpenAI's vision model
- Returns both as a dictionary or None if processing fails
process_frames_to_embeddings_with_descriptions() - Batch processes all frames:
- Discovers frames - Scans the frames directory for image files (.jpg, .png, etc.)
- Processes sequentially - Calls
process_single_frame()for each image - Adds metadata - Extracts frame number and timestamp from filename
- Rate limiting - Includes delays between API calls to avoid hitting service limits
- Progress tracking - Shows processing status and handles failures gracefully
The result is a dictionary mapping each frame filename to its embedding vector, text description, frame number, and timestamp - enabling both visual and semantic search capabilities.
Step 2: Connecting and Saving Data To MongoDB
Connection to MongoDB successful
2.1 Collection creation
Existing collections: ['video_intelligence', 'video_library', 'previous_frame_incidentS'] Collection video_intelligence already exists Collection video_library already exists Collection previous_frame_incidentS already exists
2.2 Creating Vector and Search Indexes
This code below defines vectorβsearch indexes on several collections to support efficient similarity queries over high-dimensional embeddings.
For each target collectionβsuch as frame metadata, past incident records, and video librariesβit creates named indexes using varying quantization strategies.
Scalar and binary quantization compress embeddings for reduced storage and faster lookups, while full-fidelity indexes preserve maximum precision at the cost of higher resource usage.
By configuring multiple index variants on the same collection, you can benchmark and choose the optimal trade-off between search accuracy, speed, and storage footprint. Once built, these indexes enable rapid nearestβneighbor retrieval of semantically similar items for tasks like incident detection, frame comparison, and content recommendation.
Vector search index 'vector_search_index_scalar' already exists. Vector search index 'vector_search_index_full_fidelity' already exists. Vector search index 'vector_search_index_binary' already exists. Vector search index 'incident_vector_index_scalar' already exists. Vector search index 'video_vector_index' already exists.
The code below is a helper wraps MongoDB Atlas Search index creation:
- given a collection, an index-definition dict, and a name,
- it builds a SearchIndexModel, calls create_search_index,
- and returns the resultβprinting success or catching errors and returning None.
Search index 'frame_intelligence_index' created successfully
'frame_intelligence_index'
Step 3: Data Ingestion
The step starts by clearing out any existing documents in the three target collections (FRAME_INTELLIGENCE_METADATA, PREVIOUS_FRAME_INCIDENTS, and VIDEO_LIBRARY) via repeated calls to delete_many({}), ensuring youβre working with a clean slate before seeding new data.
Next, it converts your Pandas DataFrame (frame_data_df) into a list of Python dictionaries with to_dict(orient="records"), then uses insert_many on the frame_intelligence_collection (aliased from db[FRAME_INTELLIGENCE_METADATA]) to bulk-load those records.
This pattern guarantees that your frame intelligence collection is freshly populated and ready for downstream tasks like vector indexing or semantic search.
Because thereβs no additional transformation pipelineβno ETL steps, schema migrations, or data-wrangling utilitiesβloading new data is straightforward. You simply clear, convert, and insert, which keeps the setup simple and minimizes the chance of errors or mismatches between your source DataFrame and the MongoDB collection.
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff0000000000000002'), 'opTime': {'ts': Timestamp(1751369841, 50), 't': 2}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1751369841, 50), 'signature': {'hash': b'Ln\xd3a\x05\xc94u\xe5\x05\xb3\x1b5\x15\xcd%\xdb\x854b', 'keyId': 7520068280199938053}}, 'operationTime': Timestamp(1751369841, 50)}, acknowledged=True) InsertManyResult([ObjectId('6863c872a9040ef0b5adcb35'), ObjectId('6863c872a9040ef0b5adcb36'), ObjectId('6863c872a9040ef0b5adcb37'), ObjectId('6863c872a9040ef0b5adcb38'), ObjectId('6863c872a9040ef0b5adcb39'), ObjectId('6863c872a9040ef0b5adcb3a'), ObjectId('6863c872a9040ef0b5adcb3b'), ObjectId('6863c872a9040ef0b5adcb3c'), ObjectId('6863c872a9040ef0b5adcb3d'), ObjectId('6863c872a9040ef0b5adcb3e'), ObjectId('6863c872a9040ef0b5adcb3f'), ObjectId('6863c872a9040ef0b5adcb40'), ObjectId('6863c872a9040ef0b5adcb41'), ObjectId('6863c872a9040ef0b5adcb42'), ObjectId('6863c872a9040ef0b5adcb43'), ObjectId('6863c872a9040ef0b5adcb44'), ObjectId('6863c872a9040ef0b5adcb45'), ObjectId('6863c872a9040ef0b5adcb46'), ObjectId('6863c872a9040ef0b5adcb47'), ObjectId('6863c872a9040ef0b5adcb48'), ObjectId('6863c872a9040ef0b5adcb49'), ObjectId('6863c872a9040ef0b5adcb4a'), ObjectId('6863c872a9040ef0b5adcb4b'), ObjectId('6863c872a9040ef0b5adcb4c'), ObjectId('6863c872a9040ef0b5adcb4d'), ObjectId('6863c872a9040ef0b5adcb4e'), ObjectId('6863c872a9040ef0b5adcb4f'), ObjectId('6863c872a9040ef0b5adcb50'), ObjectId('6863c872a9040ef0b5adcb51'), ObjectId('6863c872a9040ef0b5adcb52'), ObjectId('6863c872a9040ef0b5adcb53'), ObjectId('6863c872a9040ef0b5adcb54'), ObjectId('6863c872a9040ef0b5adcb55'), ObjectId('6863c872a9040ef0b5adcb56'), ObjectId('6863c872a9040ef0b5adcb57'), ObjectId('6863c872a9040ef0b5adcb58'), ObjectId('6863c872a9040ef0b5adcb59'), ObjectId('6863c872a9040ef0b5adcb5a'), ObjectId('6863c872a9040ef0b5adcb5b'), ObjectId('6863c872a9040ef0b5adcb5c'), ObjectId('6863c872a9040ef0b5adcb5d'), ObjectId('6863c872a9040ef0b5adcb5e'), ObjectId('6863c872a9040ef0b5adcb5f'), ObjectId('6863c872a9040ef0b5adcb60'), ObjectId('6863c872a9040ef0b5adcb61'), ObjectId('6863c872a9040ef0b5adcb62'), ObjectId('6863c872a9040ef0b5adcb63'), ObjectId('6863c872a9040ef0b5adcb64'), ObjectId('6863c872a9040ef0b5adcb65'), ObjectId('6863c872a9040ef0b5adcb66'), ObjectId('6863c872a9040ef0b5adcb67'), ObjectId('6863c872a9040ef0b5adcb68'), ObjectId('6863c872a9040ef0b5adcb69'), ObjectId('6863c872a9040ef0b5adcb6a'), ObjectId('6863c872a9040ef0b5adcb6b'), ObjectId('6863c872a9040ef0b5adcb6c'), ObjectId('6863c872a9040ef0b5adcb6d'), ObjectId('6863c872a9040ef0b5adcb6e'), ObjectId('6863c872a9040ef0b5adcb6f'), ObjectId('6863c872a9040ef0b5adcb70'), ObjectId('6863c872a9040ef0b5adcb71'), ObjectId('6863c872a9040ef0b5adcb72'), ObjectId('6863c872a9040ef0b5adcb73'), ObjectId('6863c872a9040ef0b5adcb74'), ObjectId('6863c872a9040ef0b5adcb75'), ObjectId('6863c872a9040ef0b5adcb76'), ObjectId('6863c872a9040ef0b5adcb77'), ObjectId('6863c872a9040ef0b5adcb78'), ObjectId('6863c872a9040ef0b5adcb79'), ObjectId('6863c872a9040ef0b5adcb7a'), ObjectId('6863c872a9040ef0b5adcb7b'), ObjectId('6863c872a9040ef0b5adcb7c'), ObjectId('6863c872a9040ef0b5adcb7d'), ObjectId('6863c872a9040ef0b5adcb7e'), ObjectId('6863c872a9040ef0b5adcb7f'), ObjectId('6863c872a9040ef0b5adcb80'), ObjectId('6863c872a9040ef0b5adcb81'), ObjectId('6863c872a9040ef0b5adcb82'), ObjectId('6863c872a9040ef0b5adcb83'), ObjectId('6863c872a9040ef0b5adcb84'), ObjectId('6863c872a9040ef0b5adcb85'), ObjectId('6863c872a9040ef0b5adcb86'), ObjectId('6863c872a9040ef0b5adcb87'), ObjectId('6863c872a9040ef0b5adcb88'), ObjectId('6863c872a9040ef0b5adcb89'), ObjectId('6863c872a9040ef0b5adcb8a'), ObjectId('6863c872a9040ef0b5adcb8b'), ObjectId('6863c872a9040ef0b5adcb8c'), ObjectId('6863c872a9040ef0b5adcb8d'), ObjectId('6863c872a9040ef0b5adcb8e'), ObjectId('6863c872a9040ef0b5adcb8f'), ObjectId('6863c872a9040ef0b5adcb90'), ObjectId('6863c872a9040ef0b5adcb91'), ObjectId('6863c872a9040ef0b5adcb92'), ObjectId('6863c872a9040ef0b5adcb93'), ObjectId('6863c872a9040ef0b5adcb94'), ObjectId('6863c872a9040ef0b5adcb95'), ObjectId('6863c872a9040ef0b5adcb96'), ObjectId('6863c872a9040ef0b5adcb97'), ObjectId('6863c872a9040ef0b5adcb98'), ObjectId('6863c872a9040ef0b5adcb99'), ObjectId('6863c872a9040ef0b5adcb9a'), ObjectId('6863c872a9040ef0b5adcb9b'), ObjectId('6863c872a9040ef0b5adcb9c'), ObjectId('6863c872a9040ef0b5adcb9d'), ObjectId('6863c872a9040ef0b5adcb9e'), ObjectId('6863c872a9040ef0b5adcb9f'), ObjectId('6863c872a9040ef0b5adcba0'), ObjectId('6863c872a9040ef0b5adcba1'), ObjectId('6863c872a9040ef0b5adcba2'), ObjectId('6863c872a9040ef0b5adcba3'), ObjectId('6863c872a9040ef0b5adcba4'), ObjectId('6863c872a9040ef0b5adcba5'), ObjectId('6863c872a9040ef0b5adcba6'), ObjectId('6863c872a9040ef0b5adcba7'), ObjectId('6863c872a9040ef0b5adcba8'), ObjectId('6863c872a9040ef0b5adcba9'), ObjectId('6863c872a9040ef0b5adcbaa'), ObjectId('6863c872a9040ef0b5adcbab'), ObjectId('6863c872a9040ef0b5adcbac'), ObjectId('6863c872a9040ef0b5adcbad'), ObjectId('6863c872a9040ef0b5adcbae'), ObjectId('6863c872a9040ef0b5adcbaf'), ObjectId('6863c872a9040ef0b5adcbb0'), ObjectId('6863c872a9040ef0b5adcbb1'), ObjectId('6863c872a9040ef0b5adcbb2'), ObjectId('6863c872a9040ef0b5adcbb3'), ObjectId('6863c872a9040ef0b5adcbb4'), ObjectId('6863c872a9040ef0b5adcbb5'), ObjectId('6863c872a9040ef0b5adcbb6'), ObjectId('6863c872a9040ef0b5adcbb7'), ObjectId('6863c872a9040ef0b5adcbb8'), ObjectId('6863c872a9040ef0b5adcbb9'), ObjectId('6863c872a9040ef0b5adcbba'), ObjectId('6863c872a9040ef0b5adcbbb'), ObjectId('6863c872a9040ef0b5adcbbc'), ObjectId('6863c872a9040ef0b5adcbbd'), ObjectId('6863c872a9040ef0b5adcbbe'), ObjectId('6863c872a9040ef0b5adcbbf'), ObjectId('6863c872a9040ef0b5adcbc0'), ObjectId('6863c872a9040ef0b5adcbc1'), ObjectId('6863c872a9040ef0b5adcbc2'), ObjectId('6863c872a9040ef0b5adcbc3'), ObjectId('6863c872a9040ef0b5adcbc4'), ObjectId('6863c872a9040ef0b5adcbc5'), ObjectId('6863c872a9040ef0b5adcbc6'), ObjectId('6863c872a9040ef0b5adcbc7'), ObjectId('6863c872a9040ef0b5adcbc8'), ObjectId('6863c872a9040ef0b5adcbc9'), ObjectId('6863c872a9040ef0b5adcbca'), ObjectId('6863c872a9040ef0b5adcbcb'), ObjectId('6863c872a9040ef0b5adcbcc'), ObjectId('6863c872a9040ef0b5adcbcd'), ObjectId('6863c872a9040ef0b5adcbce'), ObjectId('6863c872a9040ef0b5adcbcf'), ObjectId('6863c872a9040ef0b5adcbd0'), ObjectId('6863c872a9040ef0b5adcbd1'), ObjectId('6863c872a9040ef0b5adcbd2'), ObjectId('6863c872a9040ef0b5adcbd3'), ObjectId('6863c872a9040ef0b5adcbd4'), ObjectId('6863c872a9040ef0b5adcbd5'), ObjectId('6863c872a9040ef0b5adcbd6'), ObjectId('6863c872a9040ef0b5adcbd7'), ObjectId('6863c872a9040ef0b5adcbd8'), ObjectId('6863c872a9040ef0b5adcbd9'), ObjectId('6863c872a9040ef0b5adcbda'), ObjectId('6863c872a9040ef0b5adcbdb'), ObjectId('6863c872a9040ef0b5adcbdc'), ObjectId('6863c872a9040ef0b5adcbdd'), ObjectId('6863c872a9040ef0b5adcbde'), ObjectId('6863c872a9040ef0b5adcbdf'), ObjectId('6863c872a9040ef0b5adcbe0'), ObjectId('6863c872a9040ef0b5adcbe1'), ObjectId('6863c872a9040ef0b5adcbe2'), ObjectId('6863c872a9040ef0b5adcbe3'), ObjectId('6863c872a9040ef0b5adcbe4'), ObjectId('6863c872a9040ef0b5adcbe5'), ObjectId('6863c872a9040ef0b5adcbe6'), ObjectId('6863c872a9040ef0b5adcbe7'), ObjectId('6863c872a9040ef0b5adcbe8'), ObjectId('6863c872a9040ef0b5adcbe9'), ObjectId('6863c872a9040ef0b5adcbea'), ObjectId('6863c872a9040ef0b5adcbeb'), ObjectId('6863c872a9040ef0b5adcbec'), ObjectId('6863c872a9040ef0b5adcbed'), ObjectId('6863c872a9040ef0b5adcbee'), ObjectId('6863c872a9040ef0b5adcbef'), ObjectId('6863c872a9040ef0b5adcbf0'), ObjectId('6863c872a9040ef0b5adcbf1'), ObjectId('6863c872a9040ef0b5adcbf2'), ObjectId('6863c872a9040ef0b5adcbf3'), ObjectId('6863c872a9040ef0b5adcbf4'), ObjectId('6863c872a9040ef0b5adcbf5'), ObjectId('6863c872a9040ef0b5adcbf6'), ObjectId('6863c872a9040ef0b5adcbf7'), ObjectId('6863c872a9040ef0b5adcbf8'), ObjectId('6863c872a9040ef0b5adcbf9'), ObjectId('6863c872a9040ef0b5adcbfa'), ObjectId('6863c872a9040ef0b5adcbfb'), ObjectId('6863c872a9040ef0b5adcbfc'), ObjectId('6863c872a9040ef0b5adcbfd'), ObjectId('6863c872a9040ef0b5adcbfe'), ObjectId('6863c872a9040ef0b5adcbff'), ObjectId('6863c872a9040ef0b5adcc00'), ObjectId('6863c872a9040ef0b5adcc01'), ObjectId('6863c872a9040ef0b5adcc02'), ObjectId('6863c872a9040ef0b5adcc03'), ObjectId('6863c872a9040ef0b5adcc04'), ObjectId('6863c872a9040ef0b5adcc05'), ObjectId('6863c872a9040ef0b5adcc06'), ObjectId('6863c872a9040ef0b5adcc07'), ObjectId('6863c872a9040ef0b5adcc08'), ObjectId('6863c872a9040ef0b5adcc09'), ObjectId('6863c872a9040ef0b5adcc0a'), ObjectId('6863c872a9040ef0b5adcc0b'), ObjectId('6863c872a9040ef0b5adcc0c'), ObjectId('6863c872a9040ef0b5adcc0d'), ObjectId('6863c872a9040ef0b5adcc0e'), ObjectId('6863c872a9040ef0b5adcc0f'), ObjectId('6863c872a9040ef0b5adcc10'), ObjectId('6863c872a9040ef0b5adcc11'), ObjectId('6863c872a9040ef0b5adcc12'), ObjectId('6863c872a9040ef0b5adcc13'), ObjectId('6863c872a9040ef0b5adcc14'), ObjectId('6863c872a9040ef0b5adcc15'), ObjectId('6863c872a9040ef0b5adcc16'), ObjectId('6863c872a9040ef0b5adcc17'), ObjectId('6863c872a9040ef0b5adcc18'), ObjectId('6863c872a9040ef0b5adcc19'), ObjectId('6863c872a9040ef0b5adcc1a'), ObjectId('6863c872a9040ef0b5adcc1b'), ObjectId('6863c872a9040ef0b5adcc1c'), ObjectId('6863c872a9040ef0b5adcc1d'), ObjectId('6863c872a9040ef0b5adcc1e'), ObjectId('6863c872a9040ef0b5adcc1f'), ObjectId('6863c872a9040ef0b5adcc20'), ObjectId('6863c872a9040ef0b5adcc21'), ObjectId('6863c872a9040ef0b5adcc22'), ObjectId('6863c872a9040ef0b5adcc23'), ObjectId('6863c872a9040ef0b5adcc24'), ObjectId('6863c872a9040ef0b5adcc25'), ObjectId('6863c872a9040ef0b5adcc26'), ObjectId('6863c872a9040ef0b5adcc27'), ObjectId('6863c872a9040ef0b5adcc28'), ObjectId('6863c872a9040ef0b5adcc29'), ObjectId('6863c872a9040ef0b5adcc2a'), ObjectId('6863c872a9040ef0b5adcc2b'), ObjectId('6863c872a9040ef0b5adcc2c'), ObjectId('6863c872a9040ef0b5adcc2d'), ObjectId('6863c872a9040ef0b5adcc2e'), ObjectId('6863c872a9040ef0b5adcc2f'), ObjectId('6863c872a9040ef0b5adcc30'), ObjectId('6863c872a9040ef0b5adcc31'), ObjectId('6863c872a9040ef0b5adcc32'), ObjectId('6863c872a9040ef0b5adcc33'), ObjectId('6863c872a9040ef0b5adcc34'), ObjectId('6863c872a9040ef0b5adcc35'), ObjectId('6863c872a9040ef0b5adcc36'), ObjectId('6863c872a9040ef0b5adcc37'), ObjectId('6863c872a9040ef0b5adcc38'), ObjectId('6863c872a9040ef0b5adcc39'), ObjectId('6863c872a9040ef0b5adcc3a'), ObjectId('6863c872a9040ef0b5adcc3b'), ObjectId('6863c872a9040ef0b5adcc3c'), ObjectId('6863c872a9040ef0b5adcc3d'), ObjectId('6863c872a9040ef0b5adcc3e'), ObjectId('6863c872a9040ef0b5adcc3f'), ObjectId('6863c872a9040ef0b5adcc40'), ObjectId('6863c872a9040ef0b5adcc41'), ObjectId('6863c872a9040ef0b5adcc42'), ObjectId('6863c872a9040ef0b5adcc43'), ObjectId('6863c872a9040ef0b5adcc44'), ObjectId('6863c872a9040ef0b5adcc45'), ObjectId('6863c872a9040ef0b5adcc46'), ObjectId('6863c872a9040ef0b5adcc47'), ObjectId('6863c872a9040ef0b5adcc48'), ObjectId('6863c872a9040ef0b5adcc49'), ObjectId('6863c872a9040ef0b5adcc4a'), ObjectId('6863c872a9040ef0b5adcc4b'), ObjectId('6863c872a9040ef0b5adcc4c'), ObjectId('6863c872a9040ef0b5adcc4d'), ObjectId('6863c872a9040ef0b5adcc4e'), ObjectId('6863c872a9040ef0b5adcc4f'), ObjectId('6863c872a9040ef0b5adcc50'), ObjectId('6863c872a9040ef0b5adcc51'), ObjectId('6863c872a9040ef0b5adcc52'), ObjectId('6863c872a9040ef0b5adcc53'), ObjectId('6863c872a9040ef0b5adcc54'), ObjectId('6863c872a9040ef0b5adcc55'), ObjectId('6863c872a9040ef0b5adcc56'), ObjectId('6863c872a9040ef0b5adcc57'), ObjectId('6863c872a9040ef0b5adcc58'), ObjectId('6863c872a9040ef0b5adcc59'), ObjectId('6863c872a9040ef0b5adcc5a'), ObjectId('6863c872a9040ef0b5adcc5b'), ObjectId('6863c872a9040ef0b5adcc5c'), ObjectId('6863c872a9040ef0b5adcc5d'), ObjectId('6863c872a9040ef0b5adcc5e'), ObjectId('6863c872a9040ef0b5adcc5f'), ObjectId('6863c872a9040ef0b5adcc60'), ObjectId('6863c872a9040ef0b5adcc61'), ObjectId('6863c872a9040ef0b5adcc62'), ObjectId('6863c872a9040ef0b5adcc63'), ObjectId('6863c872a9040ef0b5adcc64'), ObjectId('6863c872a9040ef0b5adcc65'), ObjectId('6863c872a9040ef0b5adcc66'), ObjectId('6863c872a9040ef0b5adcc67'), ObjectId('6863c872a9040ef0b5adcc68'), ObjectId('6863c872a9040ef0b5adcc69'), ObjectId('6863c872a9040ef0b5adcc6a'), ObjectId('6863c872a9040ef0b5adcc6b'), ObjectId('6863c872a9040ef0b5adcc6c'), ObjectId('6863c872a9040ef0b5adcc6d'), ObjectId('6863c872a9040ef0b5adcc6e'), ObjectId('6863c872a9040ef0b5adcc6f'), ObjectId('6863c872a9040ef0b5adcc70'), ObjectId('6863c872a9040ef0b5adcc71'), ObjectId('6863c872a9040ef0b5adcc72'), ObjectId('6863c872a9040ef0b5adcc73'), ObjectId('6863c872a9040ef0b5adcc74'), ObjectId('6863c872a9040ef0b5adcc75'), ObjectId('6863c872a9040ef0b5adcc76'), ObjectId('6863c872a9040ef0b5adcc77'), ObjectId('6863c872a9040ef0b5adcc78'), ObjectId('6863c872a9040ef0b5adcc79'), ObjectId('6863c872a9040ef0b5adcc7a'), ObjectId('6863c872a9040ef0b5adcc7b'), ObjectId('6863c872a9040ef0b5adcc7c'), ObjectId('6863c872a9040ef0b5adcc7d'), ObjectId('6863c872a9040ef0b5adcc7e'), ObjectId('6863c872a9040ef0b5adcc7f'), ObjectId('6863c872a9040ef0b5adcc80'), ObjectId('6863c872a9040ef0b5adcc81'), ObjectId('6863c872a9040ef0b5adcc82'), ObjectId('6863c872a9040ef0b5adcc83'), ObjectId('6863c872a9040ef0b5adcc84'), ObjectId('6863c872a9040ef0b5adcc85'), ObjectId('6863c872a9040ef0b5adcc86'), ObjectId('6863c872a9040ef0b5adcc87'), ObjectId('6863c872a9040ef0b5adcc88'), ObjectId('6863c872a9040ef0b5adcc89'), ObjectId('6863c872a9040ef0b5adcc8a'), ObjectId('6863c872a9040ef0b5adcc8b'), ObjectId('6863c872a9040ef0b5adcc8c'), ObjectId('6863c872a9040ef0b5adcc8d'), ObjectId('6863c872a9040ef0b5adcc8e'), ObjectId('6863c872a9040ef0b5adcc8f'), ObjectId('6863c872a9040ef0b5adcc90'), ObjectId('6863c872a9040ef0b5adcc91'), ObjectId('6863c872a9040ef0b5adcc92'), ObjectId('6863c872a9040ef0b5adcc93'), ObjectId('6863c872a9040ef0b5adcc94'), ObjectId('6863c872a9040ef0b5adcc95'), ObjectId('6863c872a9040ef0b5adcc96'), ObjectId('6863c872a9040ef0b5adcc97'), ObjectId('6863c872a9040ef0b5adcc98'), ObjectId('6863c872a9040ef0b5adcc99'), ObjectId('6863c872a9040ef0b5adcc9a'), ObjectId('6863c872a9040ef0b5adcc9b'), ObjectId('6863c872a9040ef0b5adcc9c'), ObjectId('6863c872a9040ef0b5adcc9d'), ObjectId('6863c872a9040ef0b5adcc9e'), ObjectId('6863c872a9040ef0b5adcc9f'), ObjectId('6863c872a9040ef0b5adcca0'), ObjectId('6863c872a9040ef0b5adcca1'), ObjectId('6863c872a9040ef0b5adcca2'), ObjectId('6863c872a9040ef0b5adcca3'), ObjectId('6863c872a9040ef0b5adcca4'), ObjectId('6863c872a9040ef0b5adcca5'), ObjectId('6863c872a9040ef0b5adcca6'), ObjectId('6863c872a9040ef0b5adcca7'), ObjectId('6863c872a9040ef0b5adcca8'), ObjectId('6863c872a9040ef0b5adcca9'), ObjectId('6863c872a9040ef0b5adccaa'), ObjectId('6863c872a9040ef0b5adccab'), ObjectId('6863c872a9040ef0b5adccac'), ObjectId('6863c872a9040ef0b5adccad'), ObjectId('6863c872a9040ef0b5adccae'), ObjectId('6863c872a9040ef0b5adccaf'), ObjectId('6863c872a9040ef0b5adccb0'), ObjectId('6863c872a9040ef0b5adccb1'), ObjectId('6863c872a9040ef0b5adccb2'), ObjectId('6863c872a9040ef0b5adccb3'), ObjectId('6863c872a9040ef0b5adccb4'), ObjectId('6863c872a9040ef0b5adccb5'), ObjectId('6863c872a9040ef0b5adccb6'), ObjectId('6863c872a9040ef0b5adccb7'), ObjectId('6863c872a9040ef0b5adccb8'), ObjectId('6863c872a9040ef0b5adccb9'), ObjectId('6863c872a9040ef0b5adccba'), ObjectId('6863c872a9040ef0b5adccbb'), ObjectId('6863c872a9040ef0b5adccbc'), ObjectId('6863c872a9040ef0b5adccbd'), ObjectId('6863c872a9040ef0b5adccbe'), ObjectId('6863c872a9040ef0b5adccbf'), ObjectId('6863c872a9040ef0b5adccc0'), ObjectId('6863c872a9040ef0b5adccc1'), ObjectId('6863c872a9040ef0b5adccc2'), ObjectId('6863c872a9040ef0b5adccc3'), ObjectId('6863c872a9040ef0b5adccc4'), ObjectId('6863c872a9040ef0b5adccc5'), ObjectId('6863c872a9040ef0b5adccc6'), ObjectId('6863c872a9040ef0b5adccc7'), ObjectId('6863c872a9040ef0b5adccc8'), ObjectId('6863c872a9040ef0b5adccc9'), ObjectId('6863c872a9040ef0b5adccca'), ObjectId('6863c872a9040ef0b5adcccb'), ObjectId('6863c872a9040ef0b5adcccc'), ObjectId('6863c872a9040ef0b5adcccd'), ObjectId('6863c872a9040ef0b5adccce'), ObjectId('6863c872a9040ef0b5adcccf'), ObjectId('6863c872a9040ef0b5adccd0'), ObjectId('6863c872a9040ef0b5adccd1'), ObjectId('6863c872a9040ef0b5adccd2'), ObjectId('6863c872a9040ef0b5adccd3'), ObjectId('6863c872a9040ef0b5adccd4'), ObjectId('6863c872a9040ef0b5adccd5'), ObjectId('6863c872a9040ef0b5adccd6'), ObjectId('6863c872a9040ef0b5adccd7'), ObjectId('6863c872a9040ef0b5adccd8'), ObjectId('6863c872a9040ef0b5adccd9'), ObjectId('6863c872a9040ef0b5adccda'), ObjectId('6863c872a9040ef0b5adccdb'), ObjectId('6863c872a9040ef0b5adccdc'), ObjectId('6863c872a9040ef0b5adccdd'), ObjectId('6863c872a9040ef0b5adccde'), ObjectId('6863c872a9040ef0b5adccdf'), ObjectId('6863c872a9040ef0b5adcce0'), ObjectId('6863c872a9040ef0b5adcce1'), ObjectId('6863c872a9040ef0b5adcce2'), ObjectId('6863c872a9040ef0b5adcce3'), ObjectId('6863c872a9040ef0b5adcce4'), ObjectId('6863c872a9040ef0b5adcce5'), ObjectId('6863c872a9040ef0b5adcce6'), ObjectId('6863c872a9040ef0b5adcce7'), ObjectId('6863c872a9040ef0b5adcce8'), ObjectId('6863c872a9040ef0b5adcce9'), ObjectId('6863c872a9040ef0b5adccea'), ObjectId('6863c872a9040ef0b5adcceb'), ObjectId('6863c872a9040ef0b5adccec'), ObjectId('6863c872a9040ef0b5adcced'), ObjectId('6863c872a9040ef0b5adccee'), ObjectId('6863c872a9040ef0b5adccef'), ObjectId('6863c872a9040ef0b5adccf0'), ObjectId('6863c872a9040ef0b5adccf1'), ObjectId('6863c872a9040ef0b5adccf2'), ObjectId('6863c872a9040ef0b5adccf3'), ObjectId('6863c872a9040ef0b5adccf4'), ObjectId('6863c872a9040ef0b5adccf5'), ObjectId('6863c872a9040ef0b5adccf6'), ObjectId('6863c872a9040ef0b5adccf7'), ObjectId('6863c872a9040ef0b5adccf8'), ObjectId('6863c872a9040ef0b5adccf9'), ObjectId('6863c872a9040ef0b5adccfa'), ObjectId('6863c872a9040ef0b5adccfb'), ObjectId('6863c872a9040ef0b5adccfc'), ObjectId('6863c872a9040ef0b5adccfd'), ObjectId('6863c872a9040ef0b5adccfe'), ObjectId('6863c872a9040ef0b5adccff'), ObjectId('6863c872a9040ef0b5adcd00'), ObjectId('6863c872a9040ef0b5adcd01'), ObjectId('6863c872a9040ef0b5adcd02'), ObjectId('6863c872a9040ef0b5adcd03'), ObjectId('6863c872a9040ef0b5adcd04'), ObjectId('6863c872a9040ef0b5adcd05'), ObjectId('6863c872a9040ef0b5adcd06'), ObjectId('6863c872a9040ef0b5adcd07'), ObjectId('6863c872a9040ef0b5adcd08'), ObjectId('6863c872a9040ef0b5adcd09'), ObjectId('6863c872a9040ef0b5adcd0a'), ObjectId('6863c872a9040ef0b5adcd0b'), ObjectId('6863c872a9040ef0b5adcd0c'), ObjectId('6863c872a9040ef0b5adcd0d'), ObjectId('6863c872a9040ef0b5adcd0e'), ObjectId('6863c872a9040ef0b5adcd0f'), ObjectId('6863c872a9040ef0b5adcd10'), ObjectId('6863c872a9040ef0b5adcd11'), ObjectId('6863c872a9040ef0b5adcd12'), ObjectId('6863c872a9040ef0b5adcd13'), ObjectId('6863c872a9040ef0b5adcd14'), ObjectId('6863c872a9040ef0b5adcd15'), ObjectId('6863c872a9040ef0b5adcd16'), ObjectId('6863c872a9040ef0b5adcd17'), ObjectId('6863c872a9040ef0b5adcd18'), ObjectId('6863c872a9040ef0b5adcd19'), ObjectId('6863c872a9040ef0b5adcd1a'), ObjectId('6863c872a9040ef0b5adcd1b'), ObjectId('6863c872a9040ef0b5adcd1c'), ObjectId('6863c872a9040ef0b5adcd1d'), ObjectId('6863c872a9040ef0b5adcd1e'), ObjectId('6863c872a9040ef0b5adcd1f'), ObjectId('6863c872a9040ef0b5adcd20'), ObjectId('6863c872a9040ef0b5adcd21'), ObjectId('6863c872a9040ef0b5adcd22'), ObjectId('6863c872a9040ef0b5adcd23'), ObjectId('6863c872a9040ef0b5adcd24'), ObjectId('6863c872a9040ef0b5adcd25'), ObjectId('6863c872a9040ef0b5adcd26'), ObjectId('6863c872a9040ef0b5adcd27'), ObjectId('6863c872a9040ef0b5adcd28')], acknowledged=True) Step 4: Retrieval Methods
4.1 Semantic Search powered by Vector Search
In the code below semantic_search_with_mongodb wraps the end-to-end process of running a semantic vector search in MongoDB Atlas. It first obtains a numeric embedding for the userβs query via get_voyage_embedding, then constructs a two-stage aggregation pipeline:
- A
$vectorSearchstage that leverages your precreated vector index to find semantically similar documents. - A
$projectstage that strips out the raw embedding and internal_id, and injects the similarity score (vectorSearchScore) into each result. Finally, it executes the pipeline and returns the top-N results as a Python list, abstracting away all of the boilerplate needed to perform high-precision, retrieval-grounded queries.
Under the hood, the MongoDB $vectorSearch operator supports several key parameters for tuning accuracy and performance:
index(string): the name of the vector index to use. (mongodb.com)queryVector(array): the embedding representing the query text. (mongodb.com)path(string): the document field that stores precomputed embeddings. (mongodb.com)numCandidates(int): how many nearest-neighbor candidates to retrieve before final scoringβhigher values improve recall at the cost of latency. (mongodb.com)limit(int): the maximum number of top-scoring documents to return. (mongodb.com)
By tuning numCandidates and limit, you can balance throughput, resource usage, and retrieval fidelity for your specific dataset.
The cells below runs the same semantic search query:
βCan you get me the frame with the referee on the screenβ
against three different vectorβsearch index configurations (scalar, full_fidelity, and binary) on the FRAME_INTELLIGENCE_METADATA collection.
Each call to semantic_search_with_mongodb embeds the user query, invokes the specified index via MongoDBβs $vectorSearch, and returns the top 5 most similar frame documents for that quantization strategy.
By assigning the results to scalar_results, full_fidelity_results, and binary_results, you can directly compare how each index type affects retrieval quality and performance. This makes it easy to benchmark and choose the optimal trade-off between precision, speed, and storage footprint for your frameβmatching application.
[{'frame_description': 'The frame shows a soccer match in progress. A player wearing a pink and navy blue jersey with a crest and a sponsor logo is sitting on the ground. Another player, wearing a white jersey, is partially in the foreground, obscuring some of the sitting player. The scoreboard at the top left corner displays "RMA 1 - 0 FCB" with 11:27 as the time. The ESPN and ESPN+ LIVE logos are visible in the top right corner. The background is red, suggesting an advertisement board.',
, 'frame_number': 487,
, 'frame_timestamp': 956.8,
, 'score': 0.6657900810241699},
, {'frame_description': 'The video frame shows a soccer match in progress. A referee wearing a bright yellow shirt and black shorts is walking on the field. The referee\'s shirt has several sponsor logos, including "WΓΌrth" and "Hankook." The playing field is green grass. \n\nOn the top left of the frame, the scoreboard indicates "RMA 1-0 FCB," showing that RMA is leading FCB. Next to the scoreboard, the time displayed is "07:07." In the top right corner, the broadcast logos "ESPN" and "ABC Live" are visible. \n\nAnother person, likely a player, is partially visible in the lower part of the frame.',
, 'frame_number': 355,
, 'frame_timestamp': 696.9,
, 'score': 0.6579937934875488},
, {'frame_description': 'In the video frame, two soccer players are on a field. \n\n- The player on the left is wearing a white jersey with black stripes and the number 22, named "RΓΌdiger."\n- The player on the right is wearing a blue and red striped jersey.\n- Both players appear to be in motion, likely during a match.\n \nThe setting appears to be a professional soccer match broadcast on ESPN and ABC, as indicated by the logos in the top-right corner. \n\nThe scoreboard at the top-left corner shows "RMA 0 - 0 FCB" with a time of "02:25," suggesting the match is in the early stages. The background shows a blurred advertisement board.',
, 'frame_number': 212,
, 'frame_timestamp': 415.4,
, 'score': 0.6577053070068359},
, {'frame_description': 'In the video frame, there is a crowded stadium with spectators filling the stands, indicating a live sports event. In the foreground is a large digital overlay. On the left side of the overlay, there is a vertical bar with text reading "THIBAUT COURTOIS" and the number "1," suggesting it\'s displaying player information. On the right side, there\'s a faint image of a person, and above it is the word "GOAL." The top right corner displays the ESPN logo alongside "LIVE." The setting is vibrant with stadium lights illuminating the scene.',
, 'frame_number': 70,
, 'frame_timestamp': 135.8,
, 'score': 0.6571376323699951},
, {'frame_description': 'The image shows a soccer match scene. In the foreground, there are three players wearing jerseys. The player on the left, with "VALVERDE" on the back of the jersey, has the number 9. He is facing away from the camera. The player in the middle is wearing a blue goalkeeper kit and appears to be looking towards the play. The player on the right is wearing a white jersey with the logo of "Emirates Fly Better" visible. Behind them is a soccer net, and the background is filled with spectators.\n\nIn the top left corner, there is a scoreboard displaying "RMA 0 - 0 FCB" and a time of "03:56". The channel logo "ESPN" is visible in the top right corner.',
, 'frame_number': 258,
, 'frame_timestamp': 505.9,
, 'score': 0.6536464691162109}] [{'frame_description': 'The frame shows a soccer match in progress. A player wearing a pink and navy blue jersey with a crest and a sponsor logo is sitting on the ground. Another player, wearing a white jersey, is partially in the foreground, obscuring some of the sitting player. The scoreboard at the top left corner displays "RMA 1 - 0 FCB" with 11:27 as the time. The ESPN and ESPN+ LIVE logos are visible in the top right corner. The background is red, suggesting an advertisement board.',
, 'frame_number': 487,
, 'frame_timestamp': 956.8,
, 'score': 0.6655800938606262},
, {'frame_description': 'The video frame shows a soccer match in progress. A referee wearing a bright yellow shirt and black shorts is walking on the field. The referee\'s shirt has several sponsor logos, including "WΓΌrth" and "Hankook." The playing field is green grass. \n\nOn the top left of the frame, the scoreboard indicates "RMA 1-0 FCB," showing that RMA is leading FCB. Next to the scoreboard, the time displayed is "07:07." In the top right corner, the broadcast logos "ESPN" and "ABC Live" are visible. \n\nAnother person, likely a player, is partially visible in the lower part of the frame.',
, 'frame_number': 355,
, 'frame_timestamp': 696.9,
, 'score': 0.6579287052154541},
, {'frame_description': 'In the video frame, two soccer players are on a field. \n\n- The player on the left is wearing a white jersey with black stripes and the number 22, named "RΓΌdiger."\n- The player on the right is wearing a blue and red striped jersey.\n- Both players appear to be in motion, likely during a match.\n \nThe setting appears to be a professional soccer match broadcast on ESPN and ABC, as indicated by the logos in the top-right corner. \n\nThe scoreboard at the top-left corner shows "RMA 0 - 0 FCB" with a time of "02:25," suggesting the match is in the early stages. The background shows a blurred advertisement board.',
, 'frame_number': 212,
, 'frame_timestamp': 415.4,
, 'score': 0.6575652360916138},
, {'frame_description': 'In the video frame, there is a crowded stadium with spectators filling the stands, indicating a live sports event. In the foreground is a large digital overlay. On the left side of the overlay, there is a vertical bar with text reading "THIBAUT COURTOIS" and the number "1," suggesting it\'s displaying player information. On the right side, there\'s a faint image of a person, and above it is the word "GOAL." The top right corner displays the ESPN logo alongside "LIVE." The setting is vibrant with stadium lights illuminating the scene.',
, 'frame_number': 70,
, 'frame_timestamp': 135.8,
, 'score': 0.6569483280181885},
, {'frame_description': 'The image shows a soccer match scene. In the foreground, there are three players wearing jerseys. The player on the left, with "VALVERDE" on the back of the jersey, has the number 9. He is facing away from the camera. The player in the middle is wearing a blue goalkeeper kit and appears to be looking towards the play. The player on the right is wearing a white jersey with the logo of "Emirates Fly Better" visible. Behind them is a soccer net, and the background is filled with spectators.\n\nIn the top left corner, there is a scoreboard displaying "RMA 0 - 0 FCB" and a time of "03:56". The channel logo "ESPN" is visible in the top right corner.',
, 'frame_number': 258,
, 'frame_timestamp': 505.9,
, 'score': 0.6542713642120361}] [{'frame_description': 'The frame shows a soccer match in progress. A player wearing a pink and navy blue jersey with a crest and a sponsor logo is sitting on the ground. Another player, wearing a white jersey, is partially in the foreground, obscuring some of the sitting player. The scoreboard at the top left corner displays "RMA 1 - 0 FCB" with 11:27 as the time. The ESPN and ESPN+ LIVE logos are visible in the top right corner. The background is red, suggesting an advertisement board.',
, 'frame_number': 487,
, 'frame_timestamp': 956.8,
, 'score': 0.6655800938606262},
, {'frame_description': 'The video frame shows a soccer match in progress. A referee wearing a bright yellow shirt and black shorts is walking on the field. The referee\'s shirt has several sponsor logos, including "WΓΌrth" and "Hankook." The playing field is green grass. \n\nOn the top left of the frame, the scoreboard indicates "RMA 1-0 FCB," showing that RMA is leading FCB. Next to the scoreboard, the time displayed is "07:07." In the top right corner, the broadcast logos "ESPN" and "ABC Live" are visible. \n\nAnother person, likely a player, is partially visible in the lower part of the frame.',
, 'frame_number': 355,
, 'frame_timestamp': 696.9,
, 'score': 0.6579287052154541},
, {'frame_description': 'The image shows a soccer match in progress. Two players are in the frame, both wearing a dark blue and red striped kit. One player, with the number 6 on his back, is partially seen. The other player is facing away, with visible bandaging on his wrist.\n\nThe scoreboard graphic at the top shows "RMA 1 - 0 FCB" with a timer reading "09:23." The broadcast logos for "ESPN" and "ABC Live" are also visible at the top right corner. The playing field has green grass, and a white line is visible in the background indicating part of the pitch markings.',
, 'frame_number': 424,
, 'frame_timestamp': 832.7,
, 'score': 0.6527595520019531},
, {'frame_description': 'The image captures a moment from a soccer match shown on ESPN. The top left displays a score box indicating "RMA 1 - 0 FCB" at 08:51. A player wearing a jersey with pink and navy colors is partially visible. There\'s a white circular patch on the jersey\'s sleeve. A graphic partially obscures the player. The setting appears to be a football field with a white boundary line and green grass in the background. The ESPN and ABC Live logos are shown in the top right corner.',
, 'frame_number': 408,
, 'frame_timestamp': 801.2,
, 'score': 0.6490817070007324},
, {'frame_description': 'The image shows a soccer player in motion on a field, wearing a blue and red jersey with a club crest on the chest. He has black hair and is looking to the side. He is wearing a white wristband on his left hand. In the background, there is a blurred advertisement board. \n\nAt the top of the frame, a scoreboard displays "RMA 1 - 0 FCB" and a time of "09:21". There is also a logo that says "ESPN" with the word "LIVE" next to it in the top right corner.',
, 'frame_number': 423,
, 'frame_timestamp': 830.8,
, 'score': 0.6490310430526733}] 4.2 Hybrid Search (Text + Vector Search)
hybrid_search combines semantic vector search and traditional text search in MongoDB using the $rankFusion operator. It first converts the user_query into an embedding via get_voyage_embedding, then defines two sub-pipelinesβone using $vectorSearch on the specified vector_search_index_name, the other using Atlas Searchβs $search on text_search_index_name. These pipelines each retrieve up to 20 candidates, which are then merged and re-ranked according to specified weights, producing a unified list of the top-top_n results enriched with detailed scoring information.
The $rankFusion stage supports key parameters for fine-tuning relevance blending:
pipelines: maps names (βvectorPipelineβ, βtextPipelineβ) to aggregation pipelines that source vector and text matches.combination.weights: assigns relative importance to each pipeline (e.g.vector_weight=0.7,text_weight=0.3).scoreDetails: when set totrue, includes per-pipeline scores in each documentβsscoreDetailsfield. After fusion, a$projectstage hides raw embeddings and internal IDs while surfacing score breakdowns, and a final$limitensures only the top-scoring documents are returned. This abstraction lets you callhybrid_search(query, collection)to effortlessly leverage both semantic and lexical matching in one go.
Found 5 results for query: 'Can you get me the frame with the refree on the screen'
[{'frame_description': 'The frame shows a soccer match in progress. A player wearing a pink and navy blue jersey with a crest and a sponsor logo is sitting on the ground. Another player, wearing a white jersey, is partially in the foreground, obscuring some of the sitting player. The scoreboard at the top left corner displays "RMA 1 - 0 FCB" with 11:27 as the time. The ESPN and ESPN+ LIVE logos are visible in the top right corner. The background is red, suggesting an advertisement board.',
, 'frame_number': 487,
, 'frame_timestamp': 956.8,
, 'scoreDetails': {'value': 0.00819672131147541,
, 'description': 'value output by reciprocal rank fusion algorithm, computed as sum of (weight * (1 / (60 + rank))) across input pipelines from which this document is output, from:',
, 'details': [{'inputPipelineName': 'textPipeline', 'rank': 0, 'weight': 0.5},
, {'inputPipelineName': 'vectorPipeline',
, 'rank': 1,
, 'weight': 0.5,
, 'value': 0.6657900810241699,
, 'details': []}]}},
, {'frame_description': 'The video frame shows a soccer match in progress. A referee wearing a bright yellow shirt and black shorts is walking on the field. The referee\'s shirt has several sponsor logos, including "WΓΌrth" and "Hankook." The playing field is green grass. \n\nOn the top left of the frame, the scoreboard indicates "RMA 1-0 FCB," showing that RMA is leading FCB. Next to the scoreboard, the time displayed is "07:07." In the top right corner, the broadcast logos "ESPN" and "ABC Live" are visible. \n\nAnother person, likely a player, is partially visible in the lower part of the frame.',
, 'frame_number': 355,
, 'frame_timestamp': 696.9,
, 'scoreDetails': {'value': 0.008064516129032258,
, 'description': 'value output by reciprocal rank fusion algorithm, computed as sum of (weight * (1 / (60 + rank))) across input pipelines from which this document is output, from:',
, 'details': [{'inputPipelineName': 'textPipeline', 'rank': 0, 'weight': 0.5},
, {'inputPipelineName': 'vectorPipeline',
, 'rank': 2,
, 'weight': 0.5,
, 'value': 0.6579937934875488,
, 'details': []}]}},
, {'frame_description': 'In the video frame, two soccer players are on a field. \n\n- The player on the left is wearing a white jersey with black stripes and the number 22, named "RΓΌdiger."\n- The player on the right is wearing a blue and red striped jersey.\n- Both players appear to be in motion, likely during a match.\n \nThe setting appears to be a professional soccer match broadcast on ESPN and ABC, as indicated by the logos in the top-right corner. \n\nThe scoreboard at the top-left corner shows "RMA 0 - 0 FCB" with a time of "02:25," suggesting the match is in the early stages. The background shows a blurred advertisement board.',
, 'frame_number': 212,
, 'frame_timestamp': 415.4,
, 'scoreDetails': {'value': 0.007936507936507936,
, 'description': 'value output by reciprocal rank fusion algorithm, computed as sum of (weight * (1 / (60 + rank))) across input pipelines from which this document is output, from:',
, 'details': [{'inputPipelineName': 'textPipeline', 'rank': 0, 'weight': 0.5},
, {'inputPipelineName': 'vectorPipeline',
, 'rank': 3,
, 'weight': 0.5,
, 'value': 0.6577053070068359,
, 'details': []}]}},
, {'frame_description': 'In the video frame, there is a crowded stadium with spectators filling the stands, indicating a live sports event. In the foreground is a large digital overlay. On the left side of the overlay, there is a vertical bar with text reading "THIBAUT COURTOIS" and the number "1," suggesting it\'s displaying player information. On the right side, there\'s a faint image of a person, and above it is the word "GOAL." The top right corner displays the ESPN logo alongside "LIVE." The setting is vibrant with stadium lights illuminating the scene.',
, 'frame_number': 70,
, 'frame_timestamp': 135.8,
, 'scoreDetails': {'value': 0.0078125,
, 'description': 'value output by reciprocal rank fusion algorithm, computed as sum of (weight * (1 / (60 + rank))) across input pipelines from which this document is output, from:',
, 'details': [{'inputPipelineName': 'textPipeline', 'rank': 0, 'weight': 0.5},
, {'inputPipelineName': 'vectorPipeline',
, 'rank': 4,
, 'weight': 0.5,
, 'value': 0.6571376323699951,
, 'details': []}]}},
, {'frame_description': 'The image shows a soccer match scene. In the foreground, there are three players wearing jerseys. The player on the left, with "VALVERDE" on the back of the jersey, has the number 9. He is facing away from the camera. The player in the middle is wearing a blue goalkeeper kit and appears to be looking towards the play. The player on the right is wearing a white jersey with the logo of "Emirates Fly Better" visible. Behind them is a soccer net, and the background is filled with spectators.\n\nIn the top left corner, there is a scoreboard displaying "RMA 0 - 0 FCB" and a time of "03:56". The channel logo "ESPN" is visible in the top right corner.',
, 'frame_number': 258,
, 'frame_timestamp': 505.9,
, 'scoreDetails': {'value': 0.007692307692307693,
, 'description': 'value output by reciprocal rank fusion algorithm, computed as sum of (weight * (1 / (60 + rank))) across input pipelines from which this document is output, from:',
, 'details': [{'inputPipelineName': 'textPipeline', 'rank': 0, 'weight': 0.5},
, {'inputPipelineName': 'vectorPipeline',
, 'rank': 5,
, 'weight': 0.5,
, 'value': 0.6536464691162109,
, 'details': []}]}}] 4.3 Viewing the video player and returned time stamp
This code creates an interactive video player for Jupyter notebooks that enables intelligent scene navigation based on AI search results. The create_video_player_with_scenes() function takes a video file and search results (containing timestamps, descriptions, and similarity scores), then generates an HTML interface with an embedded video player. It automatically handles video encoding by converting smaller files (under 50MB) to base64 for direct embedding, while serving larger videos from their local path with appropriate MIME type detection.
The interface features a standard HTML5 video player with custom scene navigation controls below it. Users can click timestamp buttons to instantly jump to specific scenes, with each button showing the frame number, timestamp, similarity score, and description preview. When selected, the full scene description appears above the player, the video jumps to that timestamp, and playback begins automatically. The system includes keyboard shortcuts (spacebar for play/pause, arrow keys for 10-second navigation) and visual feedback effects, creating a seamless experience for exploring video content based on AI-generated embeddings and making it ideal for video analysis and content search.
π¬ Creating interactive video player...
π¬ Video player created with 5 scenes π Video: video.mp4 β οΈ Large video file - serving from local path