Notebooks
M
MongoDB
Agentic Video Search

Agentic Video Search

advanced_techniquesagentsartificial-intelligencellmsmongodb-genai-showcasenotebooksgenerative-airag

Open In Colab

View Article

Building an Agentic Video Search System using Voyage AI and MongoDB

Step 1: Install required packages

  • voyageai: Voyage AI's Python SDK
  • pymongo: MongoDB's Python driver
  • anthropic: Anthropic's Python SDK
  • huggingface_hub: Python library for interacting with the Hugging Face Hub
  • ffmpeg-python: Python wrapper for ffmpeg
  • tqdm: Python library to display progress bars for loops
[1]

You'll also need to install the ffmpeg binary itself. To do this, run the following commands from the terminal and note the path to the ffmpeg installation:

MacOS

	brew install ffmpeg

Linux

	sudo apt-get install ffmpeg

Windows

  • Download the executable from ffmpeg.org
  • Extract the downloaded zip file
  • Note the path to the bin folder

Step 2: Setup prerequisites

Voyage AI

MongoDB

Anthropic

[2]
[171]
Enter your Voyage API key: ········
[4]
Enter your MongoDB connection string: ········
{'ok': 1.0,
, '$clusterTime': {'clusterTime': Timestamp(1767387291, 1),
,  'signature': {'hash': b'\xf8\xbcI\xcf\x81DR\xc1\xcdO\xcf\xa8\x1d\xc9\x1do\x14dH\xf2',
,   'keyId': 7558184680432861186}},
, 'operationTime': Timestamp(1767387291, 1)}
[5]
Enter your Anthropic API key: ········
[17]

Step 3: Download the dataset

[172]
[173]
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 10 files:   0%|          | 0/10 [00:00<?, ?it/s]

Step 4: Segment the videos using captions

voyage-multimodal-3.5 has a 32k token limit or a 20 MB file size limit for video inputs. When working with large videos, split them into smaller segments prior to embedding to keep them within the model’s limits. Splitting videos at natural breaks in captions/transcripts ensures that related frames remain together, resulting in more focused embeddings.

[98]
[99]
[100]
4
[101]
[102]
{'segment_id': 'segment_000',
, 'video_id': 'video_000',
, 'caption': 'Chef Marguerite Dubois, wearing her signature striped apron, rolls out the laminated croissant dough using a wooden rolling pin on a granite countertop dusted with flour.',
, 'metadata': {'video_title': 'Classic French Croissants with Chef Marguerite Dubois',
,  'start': 0,
,  'end': 7}}

Step 5: Embed the video segments

[103]
[104]
[189]
[107]

  0%|          | 0/17 [00:00<?, ?it/s]
  6%|▌         | 1/17 [00:07<02:05,  7.86s/it]
 12%|█▏        | 2/17 [00:15<01:59,  8.00s/it]
 18%|█▊        | 3/17 [00:23<01:47,  7.68s/it]
 24%|██▎       | 4/17 [00:31<01:40,  7.73s/it]
 29%|██▉       | 5/17 [00:37<01:27,  7.26s/it]
 35%|███▌      | 6/17 [00:44<01:20,  7.28s/it]
 41%|████      | 7/17 [00:52<01:13,  7.39s/it]
 47%|████▋     | 8/17 [01:02<01:14,  8.24s/it]
 53%|█████▎    | 9/17 [01:06<00:55,  6.95s/it]
 59%|█████▉    | 10/17 [01:13<00:49,  7.00s/it]
 65%|██████▍   | 11/17 [01:22<00:44,  7.47s/it]
 71%|███████   | 12/17 [01:29<00:37,  7.44s/it]
 76%|███████▋  | 13/17 [01:36<00:29,  7.39s/it]
 82%|████████▏ | 14/17 [01:42<00:20,  6.92s/it]
 88%|████████▊ | 15/17 [01:48<00:13,  6.60s/it]
 94%|█████████▍| 16/17 [01:55<00:06,  6.72s/it]
100%|██████████| 17/17 [02:02<00:00,  7.18s/it]
[109]
dict_keys(['segment_id', 'video_id', 'caption', 'metadata', 'embedding'])

Step 6: Ingest documents into MongoDB

[110]
[111]
[112]
DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff0000000000000048'), 'opTime': {'ts': Timestamp(1767391621, 1), 't': 72}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1767391621, 1), 'signature': {'hash': b'\x01)\xa3v^\x13N\xb8\xc7Ny\x97\xf0\xa5\x885\x92?M\xcd', 'keyId': 7558184680432861186}}, 'operationTime': Timestamp(1767391621, 1)}, acknowledged=True)
[113]
InsertManyResult([ObjectId('695841876d5b2abc43875acc'), ObjectId('695841876d5b2abc43875acd'), ObjectId('695841876d5b2abc43875ace'), ObjectId('695841876d5b2abc43875acf'), ObjectId('695841876d5b2abc43875ad0'), ObjectId('695841876d5b2abc43875ad1'), ObjectId('695841876d5b2abc43875ad2'), ObjectId('695841876d5b2abc43875ad3'), ObjectId('695841876d5b2abc43875ad4'), ObjectId('695841876d5b2abc43875ad5'), ObjectId('695841876d5b2abc43875ad6'), ObjectId('695841876d5b2abc43875ad7'), ObjectId('695841876d5b2abc43875ad8'), ObjectId('695841876d5b2abc43875ad9'), ObjectId('695841876d5b2abc43875ada'), ObjectId('695841876d5b2abc43875adb'), ObjectId('695841876d5b2abc43875adc')], acknowledged=True)

Step 7: Create search indexes

[114]
[115]
[116]
[117]
['fts-index', 'vector-index']

Step 8: Define search functions

[162]
[194]
[201]
[196]
Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)
Classic French Croissants with Chef Marguerite Dubois (0:59 - 1:01)
Classic French Croissants with Chef Marguerite Dubois (0:00 - 0:07)
[202]
Artisan Sourdough Bread Folding Technique (0:10 - 0:18)
Artisan Sourdough Bread Folding Technique (0:19 - 0:20)
Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)

Step 9: Building the Agentic Search Pipeline

[125]
[127]
[182]
[183]
[184]
Determining search type...
Using search type: vector
Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)
Classic French Croissants with Chef Marguerite Dubois (0:59 - 1:01)
Classic French Croissants with Chef Marguerite Dubois (0:00 - 0:07)
[203]
Determining search type...
Using search type: hybrid
Artisan Sourdough Bread Folding Technique (0:10 - 0:18)
Artisan Sourdough Bread Folding Technique (0:19 - 0:20)
Classic French Croissants with Chef Marguerite Dubois (0:24 - 0:37)