Main

archived_examplesagentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learninggpt-4-visionllama-indexaudio_searchragmultimodallangchainlancedb-recipes

Audio Similarity Search using Vector Embeddings

This notebook demonstrates how to create vector embeddings of audio files to store into the LanceDB vector store, and then to find similar audio files. We will be using panns_inference package to tag the audio and create embeddings. We'll also be using this HuggingFace dataset for the audio files. The dataset contains 2,000 sounds and labels.

Installing dependencies

[1]
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (2.17.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets) (3.13.1)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets) (1.25.2)
Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (14.0.2)
Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets) (0.6)
Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.3.8)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (1.5.3)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2.31.0)
Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (4.66.2)
Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets) (3.4.1)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/dist-packages (from datasets) (0.70.16)
Requirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.6.0)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets) (3.9.3)
Requirement already satisfied: huggingface-hub>=0.19.4 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.20.3)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets) (23.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (6.0.1)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (4.0.3)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.4->datasets) (4.10.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2024.2.2)
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2023.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0)
Requirement already satisfied: lancedb in /usr/local/lib/python3.10/dist-packages (0.6.1)
Requirement already satisfied: deprecation in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.1.0)
Requirement already satisfied: pylance==0.10.1 in /usr/local/lib/python3.10/dist-packages (from lancedb) (0.10.1)
Requirement already satisfied: ratelimiter~=1.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (1.2.0.post0)
Requirement already satisfied: retry>=0.9.2 in /usr/local/lib/python3.10/dist-packages (from lancedb) (0.9.2)
Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (4.66.2)
Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.6.3)
Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (23.2.0)
Requirement already satisfied: semver>=3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (3.0.2)
Requirement already satisfied: cachetools in /usr/local/lib/python3.10/dist-packages (from lancedb) (5.3.3)
Requirement already satisfied: pyyaml>=6.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (6.0.1)
Requirement already satisfied: click>=8.1.7 in /usr/local/lib/python3.10/dist-packages (from lancedb) (8.1.7)
Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.31.0)
Requirement already satisfied: overrides>=0.7 in /usr/local/lib/python3.10/dist-packages (from lancedb) (7.7.0)
Requirement already satisfied: pyarrow>=12 in /usr/local/lib/python3.10/dist-packages (from pylance==0.10.1->lancedb) (14.0.2)
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from pylance==0.10.1->lancedb) (1.25.2)
Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (0.6.0)
Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (2.16.3)
Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (4.10.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (2024.2.2)
Requirement already satisfied: decorator>=3.4.2 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb) (4.4.2)
Requirement already satisfied: py<2.0.0,>=1.4.26 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb) (1.11.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from deprecation->lancedb) (23.2)

Importing all the libraries

[2]

NOTE : if you get any error while importing lancedb just you need to restart runtime

[3]

On devices that have CUDA installed, you may be able to install torch's CUDA supported version.

	pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

If you don't have CUDA or a GPU (or different os), you can install torch here: https://pytorch.org/get-started/locally/

Load data

[4]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Checkpoint path: /root/panns_data/Cnn14_mAP=0.431.pth
GPU number: 1
[5]
Dataset({
,    features: ['filename', 'fold', 'target', 'category', 'esc10', 'src_file', 'take', 'audio'],
,    num_rows: 2000
,})

Create Embeddings

Now, to create the data embeddings! We can start by creating batches of 70 for the data, keeping track of the most important columns: category and audio.

[7]

We now want to iterate through these batches, and for each audio file, we want to use the AudioTagging embedder to extract the embedding. Then, we can store these embeddings, audio files, and category name into a list of dictionaries. Each dictionary has to contain a vector column in order to add to the LanceDB table, if no embedding function is provided.

[8]
100%|██████████| 40/40 [00:13<00:00,  2.99it/s]

Once we have this data list, we can create a LanceDB table by first connecting to a certain directory before, and then calling db.create_table(). If the table already exists, we open the table and add the data.

Add the VectorStore

[14]
Created Table

We can now combine all of this into a single function:

Composite function

[11]

NOTE: if you get out of memory, then next time Run all cells & uncomment this lines #insert_audio()

[ ]

Great! We now have a fully populated table with all the necessary information. The next step would be to query the table and find those similar audio files. We can do this by first opening the table, and then getting the specific audio file we want to search for.

Query the database

[15]
Category: water_drops

Next, we call the embedding function again to create those embeddings, which would allow us to search our table.

[16]
                                               audio  \
0  [0.00506591796875, 0.00653076171875, 0.0051574...   
1  [-0.157318115234375, -0.122344970703125, -0.17...   
2  [-0.0162353515625, -0.015716552734375, -0.0150...   
3  [-0.0008544921875, -0.000762939453125, -0.0005...   
4  [-0.003753662109375, -0.004119873046875, -0.00...   

                                              vector  sampling_rate  \
0  [0.0, 0.70255554, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   
1  [0.0, 0.68818694, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   
2  [0.0, 0.58163136, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   
3  [0.0, 1.0475253, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...          44100   
4  [0.0, 0.45124823, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   

           category  _distance  
0       water_drops  52.260319  
1       water_drops  57.536579  
2       water_drops  75.637405  
3  drinking_sipping  76.979073  
4       water_drops  77.981728  
<ipython-input-16-422451c4025f>:2: UnsupportedWarning: to_df is unsupported as of 0.4.0. Use to_pandas() instead
  result = tbl.search(embedding[0]).limit(5).to_df()
[17]
0. Category: water_drops
1. Category: water_drops
2. Category: water_drops
3. Category: drinking_sipping
4. Category: water_drops

Nice! It seems to be working! We can compile this into another function here, that takes an id of the audio from 0 to 1,999.

Search Audio using IDs

[18]
[19]
Category: car_horn
                                               audio  \
0  [-0.022979736328125, -0.021820068359375, -0.02...   
1  [0.313934326171875, 0.312774658203125, 0.31698...   
2  [0.0655517578125, 0.011505126953125, -0.024536...   
3  [0.063690185546875, 0.065216064453125, 0.07296...   
4  [-0.006866455078125, -0.007476806640625, -0.00...   

                                              vector  sampling_rate  \
0  [0.0, 0.12407931, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   
1  [0.0, 0.5878662, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...          44100   
2  [0.0, 0.7369921, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,...          44100   
3  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...          44100   
4  [0.0, 0.42053863, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...          44100   

          category   _distance  
0         airplane   85.660736  
1  washing_machine   91.059029  
2   vacuum_cleaner  110.453621  
3         clapping  111.933441  
4        footsteps  115.770401  
0. Category: airplane
<ipython-input-18-a781248f1cc6>:9: UnsupportedWarning: to_df is unsupported as of 0.4.0. Use to_pandas() instead
  result = tbl.search(embedding[0]).limit(5).to_df()
1. Category: washing_machine
2. Category: vacuum_cleaner
3. Category: clapping
4. Category: footsteps