Agentic Video Search
advanced_techniquesagentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag
Export
Building an Agentic Video Search System using Voyage AI and MongoDB
Step 1: Install required packages
- voyageai: Voyage AI's Python SDK
- pymongo: MongoDB's Python driver
- anthropic: Anthropic's Python SDK
- huggingface_hub: Python library for interacting with the Hugging Face Hub
- ffmpeg-python: Python wrapper for
ffmpeg - tqdm: Python library to display progress bars for loops
[1]
You'll also need to install the ffmpeg binary itself. To do this, run the following commands from the terminal and note the path to the ffmpeg installation:
MacOS
brew install ffmpeg
Linux
sudo apt-get install ffmpeg
Windows
- Download the executable from ffmpeg.org
- Extract the downloaded zip file
- Note the path to the
binfolder
Step 2: Setup prerequisites
Voyage AI
MongoDB
- Register for a free MongoDB Atlas account
- Create a new database cluster
- Obtain the connection string for your database cluster
Anthropic
[2]
[171]
Enter your Voyage API key: ········
[4]
Enter your MongoDB connection string: ········
{'ok': 1.0,
, '$clusterTime': {'clusterTime': Timestamp(1767387291, 1),
, 'signature': {'hash': b'\xf8\xbcI\xcf\x81DR\xc1\xcdO\xcf\xa8\x1d\xc9\x1do\x14dH\xf2',
, 'keyId': 7558184680432861186}},
, 'operationTime': Timestamp(1767387291, 1)} [5]
Enter your Anthropic API key: ········
[17]
Step 3: Download the dataset
[172]
[173]
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 10 files: 0%| | 0/10 [00:00<?, ?it/s]
Step 4: Segment the videos using captions
voyage-multimodal-3.5 has a 32k token limit or a 20 MB file size limit for video inputs. When working with large videos, split them into smaller segments prior to embedding to keep them within the model’s limits. Splitting videos at natural breaks in captions/transcripts ensures that related frames remain together, resulting in more focused embeddings.
[98]
[99]
[100]
4
[101]
[102]
{'segment_id': 'segment_000',
, 'video_id': 'video_000',
, 'caption': 'Chef Marguerite Dubois, wearing her signature striped apron, rolls out the laminated croissant dough using a wooden rolling pin on a granite countertop dusted with flour.',
, 'metadata': {'video_title': 'Classic French Croissants with Chef Marguerite Dubois',
, 'start': 0,
, 'end': 7}} Step 5: Embed the video segments
[103]
[104]
[189]
[107]
0%| | 0/17 [00:00<?, ?it/s] 6%|▌ | 1/17 [00:07<02:05, 7.86s/it] 12%|█▏ | 2/17 [00:15<01:59, 8.00s/it] 18%|█▊ | 3/17 [00:23<01:47, 7.68s/it] 24%|██▎ | 4/17 [00:31<01:40, 7.73s/it] 29%|██▉ | 5/17 [00:37<01:27, 7.26s/it] 35%|███▌ | 6/17 [00:44<01:20, 7.28s/it] 41%|████ | 7/17 [00:52<01:13, 7.39s/it] 47%|████▋ | 8/17 [01:02<01:14, 8.24s/it] 53%|█████▎ | 9/17 [01:06<00:55, 6.95s/it] 59%|█████▉ | 10/17 [01:13<00:49, 7.00s/it] 65%|██████▍ | 11/17 [01:22<00:44, 7.47s/it] 71%|███████ | 12/17 [01:29<00:37, 7.44s/it] 76%|███████▋ | 13/17 [01:36<00:29, 7.39s/it] 82%|████████▏ | 14/17 [01:42<00:20, 6.92s/it] 88%|████████▊ | 15/17 [01:48<00:13, 6.60s/it] 94%|█████████▍| 16/17 [01:55<00:06, 6.72s/it] 100%|██████████| 17/17 [02:02<00:00, 7.18s/it]
[109]
dict_keys(['segment_id', 'video_id', 'caption', 'metadata', 'embedding'])
Step 6: Ingest documents into MongoDB
[110]
[111]
[112]
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff0000000000000048'), 'opTime': {'ts': Timestamp(1767391621, 1), 't': 72}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1767391621, 1), 'signature': {'hash': b'\x01)\xa3v^\x13N\xb8\xc7Ny\x97\xf0\xa5\x885\x92?M\xcd', 'keyId': 7558184680432861186}}, 'operationTime': Timestamp(1767391621, 1)}, acknowledged=True) [113]
InsertManyResult([ObjectId('695841876d5b2abc43875acc'), ObjectId('695841876d5b2abc43875acd'), ObjectId('695841876d5b2abc43875ace'), ObjectId('695841876d5b2abc43875acf'), ObjectId('695841876d5b2abc43875ad0'), ObjectId('695841876d5b2abc43875ad1'), ObjectId('695841876d5b2abc43875ad2'), ObjectId('695841876d5b2abc43875ad3'), ObjectId('695841876d5b2abc43875ad4'), ObjectId('695841876d5b2abc43875ad5'), ObjectId('695841876d5b2abc43875ad6'), ObjectId('695841876d5b2abc43875ad7'), ObjectId('695841876d5b2abc43875ad8'), ObjectId('695841876d5b2abc43875ad9'), ObjectId('695841876d5b2abc43875ada'), ObjectId('695841876d5b2abc43875adb'), ObjectId('695841876d5b2abc43875adc')], acknowledged=True) Step 7: Create search indexes
[114]
[115]
[116]
[117]
['fts-index', 'vector-index']
Step 8: Define search functions
[162]
[194]
[201]
[196]
Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37) Classic French Croissants with Chef Marguerite Dubois (0:59 - 1:01) Classic French Croissants with Chef Marguerite Dubois (0:00 - 0:07)
[202]
Artisan Sourdough Bread Folding Technique (0:10 - 0:18) Artisan Sourdough Bread Folding Technique (0:19 - 0:20) Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)
Step 9: Building the Agentic Search Pipeline
[125]
[127]
[182]
[183]
[184]
Determining search type... Using search type: vector Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37) Classic French Croissants with Chef Marguerite Dubois (0:59 - 1:01) Classic French Croissants with Chef Marguerite Dubois (0:00 - 0:07)
[203]
Determining search type... Using search type: hybrid Artisan Sourdough Bread Folding Technique (0:10 - 0:18) Artisan Sourdough Bread Folding Technique (0:19 - 0:20) Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)