Main
Audio Similarity Search using Vector Embeddings
This notebook demonstrates how to create vector embeddings of audio files to store into the LanceDB vector store, and then to find similar audio files. We will be using panns_inference package to tag the audio and create embeddings. We'll also be using this HuggingFace dataset for the audio files. The dataset contains 2,000 sounds and labels.
Installing dependencies
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (2.17.1) Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets) (3.13.1) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets) (1.25.2) Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (14.0.2) Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets) (0.6) Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.3.8) Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (1.5.3) Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2.31.0) Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (4.66.2) Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets) (3.4.1) Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/dist-packages (from datasets) (0.70.16) Requirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.6.0) Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets) (3.9.3) Requirement already satisfied: huggingface-hub>=0.19.4 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.20.3) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets) (23.2) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (6.0.1) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.2.0) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.4.1) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (6.0.5) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.9.4) Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (4.0.3) Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.4->datasets) (4.10.0) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2024.2.2) Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2023.4) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0) Requirement already satisfied: lancedb in /usr/local/lib/python3.10/dist-packages (0.6.1) Requirement already satisfied: deprecation in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.1.0) Requirement already satisfied: pylance==0.10.1 in /usr/local/lib/python3.10/dist-packages (from lancedb) (0.10.1) Requirement already satisfied: ratelimiter~=1.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (1.2.0.post0) Requirement already satisfied: retry>=0.9.2 in /usr/local/lib/python3.10/dist-packages (from lancedb) (0.9.2) Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (4.66.2) Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.6.3) Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (23.2.0) Requirement already satisfied: semver>=3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (3.0.2) Requirement already satisfied: cachetools in /usr/local/lib/python3.10/dist-packages (from lancedb) (5.3.3) Requirement already satisfied: pyyaml>=6.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (6.0.1) Requirement already satisfied: click>=8.1.7 in /usr/local/lib/python3.10/dist-packages (from lancedb) (8.1.7) Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.31.0) Requirement already satisfied: overrides>=0.7 in /usr/local/lib/python3.10/dist-packages (from lancedb) (7.7.0) Requirement already satisfied: pyarrow>=12 in /usr/local/lib/python3.10/dist-packages (from pylance==0.10.1->lancedb) (14.0.2) Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from pylance==0.10.1->lancedb) (1.25.2) Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (0.6.0) Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (2.16.3) Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (4.10.0) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (3.6) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (2024.2.2) Requirement already satisfied: decorator>=3.4.2 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb) (4.4.2) Requirement already satisfied: py<2.0.0,>=1.4.26 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb) (1.11.0) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from deprecation->lancedb) (23.2)
Importing all the libraries
NOTE : if you get any error while importing lancedb just you need to restart runtime
On devices that have CUDA installed, you may be able to install torch's CUDA supported version.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
If you don't have CUDA or a GPU (or different os), you can install torch here: https://pytorch.org/get-started/locally/
Load data
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
Checkpoint path: /root/panns_data/Cnn14_mAP=0.431.pth GPU number: 1
Dataset({
, features: ['filename', 'fold', 'target', 'category', 'esc10', 'src_file', 'take', 'audio'],
, num_rows: 2000
,}) Create Embeddings
Now, to create the data embeddings! We can start by creating batches of 70 for the data, keeping track of the most important columns: category and audio.
We now want to iterate through these batches, and for each audio file, we want to use the AudioTagging embedder to extract the embedding. Then, we can store these embeddings, audio files, and category name into a list of dictionaries. Each dictionary has to contain a vector column in order to add to the LanceDB table, if no embedding function is provided.
100%|██████████| 40/40 [00:13<00:00, 2.99it/s]
Once we have this data list, we can create a LanceDB table by first connecting to a certain directory before, and then calling db.create_table(). If the table already exists, we open the table and add the data.
Add the VectorStore
Created Table
We can now combine all of this into a single function:
Composite function
NOTE: if you get out of memory, then next time Run all cells & uncomment this lines #insert_audio()
Great! We now have a fully populated table with all the necessary information. The next step would be to query the table and find those similar audio files. We can do this by first opening the table, and then getting the specific audio file we want to search for.
Query the database
Category: water_drops
Next, we call the embedding function again to create those embeddings, which would allow us to search our table.
audio \
0 [0.00506591796875, 0.00653076171875, 0.0051574...
1 [-0.157318115234375, -0.122344970703125, -0.17...
2 [-0.0162353515625, -0.015716552734375, -0.0150...
3 [-0.0008544921875, -0.000762939453125, -0.0005...
4 [-0.003753662109375, -0.004119873046875, -0.00...
vector sampling_rate \
0 [0.0, 0.70255554, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... 44100
1 [0.0, 0.68818694, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... 44100
2 [0.0, 0.58163136, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... 44100
3 [0.0, 1.0475253, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... 44100
4 [0.0, 0.45124823, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... 44100
category _distance
0 water_drops 52.260319
1 water_drops 57.536579
2 water_drops 75.637405
3 drinking_sipping 76.979073
4 water_drops 77.981728
<ipython-input-16-422451c4025f>:2: UnsupportedWarning: to_df is unsupported as of 0.4.0. Use to_pandas() instead result = tbl.search(embedding[0]).limit(5).to_df()
0. Category: water_drops
1. Category: water_drops
2. Category: water_drops
3. Category: drinking_sipping
4. Category: water_drops
Nice! It seems to be working! We can compile this into another function here, that takes an id of the audio from 0 to 1,999.
Search Audio using IDs
Category: car_horn
audio \
0 [-0.022979736328125, -0.021820068359375, -0.02...
1 [0.313934326171875, 0.312774658203125, 0.31698...
2 [0.0655517578125, 0.011505126953125, -0.024536...
3 [0.063690185546875, 0.065216064453125, 0.07296...
4 [-0.006866455078125, -0.007476806640625, -0.00...
vector sampling_rate \
0 [0.0, 0.12407931, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... 44100
1 [0.0, 0.5878662, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... 44100
2 [0.0, 0.7369921, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,... 44100
3 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... 44100
4 [0.0, 0.42053863, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0... 44100
category _distance
0 airplane 85.660736
1 washing_machine 91.059029
2 vacuum_cleaner 110.453621
3 clapping 111.933441
4 footsteps 115.770401
0. Category: airplane
<ipython-input-18-a781248f1cc6>:9: UnsupportedWarning: to_df is unsupported as of 0.4.0. Use to_pandas() instead result = tbl.search(embedding[0]).limit(5).to_df()
1. Category: washing_machine
2. Category: vacuum_cleaner
3. Category: clapping
4. Category: footsteps