LanceDB Speaker Mapping

Speaker Mapping

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learninggpt-4-visionllama-indexragmultimodallangchainSpeaker_Mapped_Transcriptionlancedb-recipes

alph-notebooks/lancedb-recipes / Speaker_Mapping.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

WHAT ARE WE BUILDING TODAY?

In this notebook, we'll focus on building an interesting application using Whisper, NeMo MSDD, and LanceDB to create an end-to-end speaker-mapped transcription from an audio file.

We'll extract speakers from the audio, generate its transcription using Whisper models, perform diarization to identify the number of speakers and map them with timestamps, and then use LanceDB to match these speakers with their correct names from a database of known speakers.

I believe this notebook will give you a kickstart in developing an end-to-end product and exploring how these technologies can be used to create innovative solutions. If you build something using this, share it on social media and tag me and LanceDB in your post.

How to use this notebook?

I found a few resource on the internet that can help you with end to end processing of multiple stages of this project. I'll be using some part of these resources in this notebook while building this project with all credits to the original creators.

Notebook -https://shorturl.at/37hfR

Blog - https://ufarooqi.com/blog/speaker-diarization-for-whisper-transcripts/?utm_source=chatgpt.com

Sharing them because they are worth a read. Once you go through the concepts, you can use this notebook better to build speaker mapped transcription using LanceDB.

First, we'll see a naive transcription using whisper and identify the issues with it.
Then we'll figure out how to connect LanceDB with Azure Blob Storage to use this in current application.
Once we are done with both of these steps, we'll jump onto building our project and create a speaker mapped transcription for an audio file

Install Necessary Libraries

[1]

Collecting nemo-toolkit>=2.dev (from nemo-toolkit[asr]>=2.dev)
  Downloading nemo_toolkit-2.2.0rc2-py3-none-any.whl.metadata (76 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.4/76.4 kB 2.1 MB/s eta 0:00:00
Requirement already satisfied: huggingface_hub>=0.24 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (0.28.1)
Requirement already satisfied: numba in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (0.61.0)
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.26.4)
Collecting onnx>=1.7.0 (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading onnx-1.17.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting protobuf==3.20.3 (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading protobuf-3.20.3-py2.py3-none-any.whl.metadata (720 bytes)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2.8.2)
Collecting ruamel.yaml (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading ruamel.yaml-0.18.10-py3-none-any.whl.metadata (23 kB)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.6.1)
Requirement already satisfied: setuptools>=70.0.0 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (75.1.0)
Requirement already satisfied: tensorboard in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2.18.0)
Requirement already satisfied: text-unidecode in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.3)
Requirement already satisfied: torch in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2.5.1+cu124)
Requirement already satisfied: tqdm>=4.41.0 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (4.67.1)
Collecting wget (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: wrapt in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.17.2)
Collecting braceexpand (from nemo-toolkit[asr]>=2.dev)
  Downloading braceexpand-0.1.7-py2.py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: editdistance in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.8.1)
Requirement already satisfied: einops in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.8.1)
Collecting g2p_en (from nemo-toolkit[asr]>=2.dev)
  Downloading g2p_en-2.1.0-py3-none-any.whl.metadata (4.5 kB)
Collecting jiwer (from nemo-toolkit[asr]>=2.dev)
  Downloading jiwer-3.1.0-py3-none-any.whl.metadata (2.6 kB)
Collecting kaldi-python-io (from nemo-toolkit[asr]>=2.dev)
  Downloading kaldi-python-io-1.2.2.tar.gz (8.8 kB)
  Preparing metadata (setup.py) ... done
Collecting kaldiio (from nemo-toolkit[asr]>=2.dev)
  Downloading kaldiio-2.18.0-py3-none-any.whl.metadata (13 kB)
Collecting lhotse>=1.26.0 (from nemo-toolkit[asr]>=2.dev)
  Downloading lhotse-1.29.0-py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: librosa>=0.10.2 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.10.2.post1)
Collecting marshmallow (from nemo-toolkit[asr]>=2.dev)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting optuna (from nemo-toolkit[asr]>=2.dev)
  Downloading optuna-4.2.1-py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (24.2)
Collecting pyannote.core (from nemo-toolkit[asr]>=2.dev)
  Downloading pyannote.core-5.0.0-py3-none-any.whl.metadata (1.4 kB)
Collecting pyannote.metrics (from nemo-toolkit[asr]>=2.dev)
  Downloading pyannote.metrics-3.2.1-py3-none-any.whl.metadata (1.3 kB)
Collecting pydub (from nemo-toolkit[asr]>=2.dev)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting pyloudnorm (from nemo-toolkit[asr]>=2.dev)
  Downloading pyloudnorm-0.1.1-py3-none-any.whl.metadata (5.6 kB)
Collecting resampy (from nemo-toolkit[asr]>=2.dev)
  Downloading resampy-0.4.3-py3-none-any.whl.metadata (3.0 kB)
Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (1.13.1)
Requirement already satisfied: soundfile in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.13.1)
Collecting sox (from nemo-toolkit[asr]>=2.dev)
  Downloading sox-1.5.0.tar.gz (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.9/63.9 kB 5.3 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting texterrors (from nemo-toolkit[asr]>=2.dev)
  Downloading texterrors-0.5.1.tar.gz (23 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (3.1.1)
Collecting fiddle (from nemo-toolkit[asr]>=2.dev)
  Downloading fiddle-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting hydra-core<=1.3.2,>1.3 (from nemo-toolkit[asr]>=2.dev)
  Downloading hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting lightning<=2.4.0,>2.2.1 (from nemo-toolkit[asr]>=2.dev)
  Downloading lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting omegaconf<=2.3 (from nemo-toolkit[asr]>=2.dev)
  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Requirement already satisfied: peft in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.14.0)
Collecting torchmetrics>=0.11.0 (from nemo-toolkit[asr]>=2.dev)
  Downloading torchmetrics-1.6.1-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: transformers>=4.45.0 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (4.48.3)
Requirement already satisfied: wandb in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.19.6)
Collecting webdataset>=0.2.86 (from nemo-toolkit[asr]>=2.dev)
  Downloading webdataset-0.2.111-py3-none-any.whl.metadata (15 kB)
Collecting datasets (from nemo-toolkit[asr]>=2.dev)
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Requirement already satisfied: inflect in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (7.5.0)
Collecting mediapy==1.1.6 (from nemo-toolkit[asr]>=2.dev)
  Downloading mediapy-1.1.6-py3-none-any.whl.metadata (4.8 kB)
Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (2.2.2)
Collecting sacremoses>=0.0.43 (from nemo-toolkit[asr]>=2.dev)
  Downloading sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB)
Requirement already satisfied: sentencepiece<1.0.0 in /usr/local/lib/python3.11/dist-packages (from nemo-toolkit[asr]>=2.dev) (0.2.0)
Requirement already satisfied: ipython in /usr/local/lib/python3.11/dist-packages (from mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (7.34.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (from mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (3.10.0)
Requirement already satisfied: Pillow in /usr/local/lib/python3.11/dist-packages (from mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (11.1.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.17.0)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.11/dist-packages (from huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2024.10.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (from huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (6.0.2)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2.32.3)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.11/dist-packages (from huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (4.12.2)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core<=1.3.2,>1.3->nemo-toolkit[asr]>=2.dev)
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 8.9 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.11/dist-packages (from lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev) (3.0.1)
Requirement already satisfied: click>=7.1.1 in /usr/local/lib/python3.11/dist-packages (from lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev) (8.1.8)
Collecting cytoolz>=0.10.1 (from lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev)
  Downloading cytoolz-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.6 kB)
Collecting intervaltree>=3.1.0 (from lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev)
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: tabulate>=0.8.1 in /usr/local/lib/python3.11/dist-packages (from lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev) (0.9.0)
Collecting lilcom>=1.1.0 (from lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev)
  Downloading lilcom-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.11/dist-packages (from librosa>=0.10.2->nemo-toolkit[asr]>=2.dev) (1.4.2)
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.11/dist-packages (from librosa>=0.10.2->nemo-toolkit[asr]>=2.dev) (4.4.2)
Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.11/dist-packages (from librosa>=0.10.2->nemo-toolkit[asr]>=2.dev) (1.8.2)
Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.11/dist-packages (from librosa>=0.10.2->nemo-toolkit[asr]>=2.dev) (0.5.0.post1)
Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.11/dist-packages (from librosa>=0.10.2->nemo-toolkit[asr]>=2.dev) (0.4)
Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.11/dist-packages (from librosa>=0.10.2->nemo-toolkit[asr]>=2.dev) (1.1.0)
Collecting lightning-utilities<2.0,>=0.10.0 (from lightning<=2.4.0,>2.2.1->nemo-toolkit[asr]>=2.dev)
  Downloading lightning_utilities-0.12.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pytorch-lightning (from lightning<=2.4.0,>2.2.1->nemo-toolkit[asr]>=2.dev)
  Downloading pytorch_lightning-2.5.0.post0-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in /usr/local/lib/python3.11/dist-packages (from numba->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (0.44.0)
Requirement already satisfied: regex in /usr/local/lib/python3.11/dist-packages (from sacremoses>=0.0.43->nemo-toolkit[asr]>=2.dev) (2024.11.6)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.5.0)
Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.11/dist-packages (from soundfile->nemo-toolkit[asr]>=2.dev) (1.17.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.1.5)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2.21.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (12.4.127)
Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Requirement already satisfied: triton==3.1.0 in /usr/local/lib/python3.11/dist-packages (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.1.0)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.3.0)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.45.0->nemo-toolkit[asr]>=2.dev) (0.21.0)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.45.0->nemo-toolkit[asr]>=2.dev) (0.5.2)
Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.11/dist-packages (from datasets->nemo-toolkit[asr]>=2.dev) (17.0.0)
Collecting dill<0.3.9,>=0.3.0 (from datasets->nemo-toolkit[asr]>=2.dev)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets->nemo-toolkit[asr]>=2.dev)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets->nemo-toolkit[asr]>=2.dev)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.11/dist-packages (from datasets->nemo-toolkit[asr]>=2.dev) (3.11.12)
Requirement already satisfied: absl-py in /usr/local/lib/python3.11/dist-packages (from fiddle->nemo-toolkit[asr]>=2.dev) (1.4.0)
Requirement already satisfied: graphviz in /usr/local/lib/python3.11/dist-packages (from fiddle->nemo-toolkit[asr]>=2.dev) (0.20.3)
Collecting libcst (from fiddle->nemo-toolkit[asr]>=2.dev)
  Downloading libcst-1.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (17 kB)
Requirement already satisfied: nltk>=3.2.4 in /usr/local/lib/python3.11/dist-packages (from g2p_en->nemo-toolkit[asr]>=2.dev) (3.9.1)
Collecting distance>=0.1.3 (from g2p_en->nemo-toolkit[asr]>=2.dev)
  Downloading Distance-0.1.3.tar.gz (180 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 180.3/180.3 kB 9.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: more_itertools>=8.5.0 in /usr/local/lib/python3.11/dist-packages (from inflect->nemo-toolkit[asr]>=2.dev) (10.6.0)
Requirement already satisfied: typeguard>=4.0.1 in /usr/local/lib/python3.11/dist-packages (from inflect->nemo-toolkit[asr]>=2.dev) (4.4.2)
Collecting rapidfuzz>=3.9.7 (from jiwer->nemo-toolkit[asr]>=2.dev)
  Downloading rapidfuzz-3.12.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting alembic>=1.5.0 (from optuna->nemo-toolkit[asr]>=2.dev)
  Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna->nemo-toolkit[asr]>=2.dev)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: sqlalchemy>=1.4.2 in /usr/local/lib/python3.11/dist-packages (from optuna->nemo-toolkit[asr]>=2.dev) (2.0.38)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas->nemo-toolkit[asr]>=2.dev) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas->nemo-toolkit[asr]>=2.dev) (2025.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.17.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.11/dist-packages (from peft->nemo-toolkit[asr]>=2.dev) (5.9.5)
Requirement already satisfied: accelerate>=0.21.0 in /usr/local/lib/python3.11/dist-packages (from peft->nemo-toolkit[asr]>=2.dev) (1.3.0)
Collecting sortedcontainers>=2.0.4 (from pyannote.core->nemo-toolkit[asr]>=2.dev)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting pyannote.database>=4.0.1 (from pyannote.metrics->nemo-toolkit[asr]>=2.dev)
  Downloading pyannote.database-5.1.3-py3-none-any.whl.metadata (1.1 kB)
Collecting docopt>=0.6.2 (from pyannote.metrics->nemo-toolkit[asr]>=2.dev)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: future>=0.16.0 in /usr/local/lib/python3.11/dist-packages (from pyloudnorm->nemo-toolkit[asr]>=2.dev) (1.0.0)
Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev)
  Downloading ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.7 kB)
Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.11/dist-packages (from tensorboard->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (1.70.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.11/dist-packages (from tensorboard->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.7)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from tensorboard->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (0.7.2)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from tensorboard->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.1.3)
Collecting pybind11 (from texterrors->nemo-toolkit[asr]>=2.dev)
  Using cached pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Collecting plac (from texterrors->nemo-toolkit[asr]>=2.dev)
  Downloading plac-1.4.3-py2.py3-none-any.whl.metadata (5.9 kB)
Collecting loguru (from texterrors->nemo-toolkit[asr]>=2.dev)
  Downloading loguru-0.7.3-py3-none-any.whl.metadata (22 kB)
Requirement already satisfied: termcolor in /usr/local/lib/python3.11/dist-packages (from texterrors->nemo-toolkit[asr]>=2.dev) (2.5.0)
Collecting Levenshtein (from texterrors->nemo-toolkit[asr]>=2.dev)
  Downloading levenshtein-0.26.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
Requirement already satisfied: docker-pycreds>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from wandb->nemo-toolkit[asr]>=2.dev) (0.4.0)
Requirement already satisfied: gitpython!=3.1.29,>=1.0.0 in /usr/local/lib/python3.11/dist-packages (from wandb->nemo-toolkit[asr]>=2.dev) (3.1.44)
Requirement already satisfied: platformdirs in /usr/local/lib/python3.11/dist-packages (from wandb->nemo-toolkit[asr]>=2.dev) (4.3.6)
Requirement already satisfied: pydantic<3,>=2.6 in /usr/local/lib/python3.11/dist-packages (from wandb->nemo-toolkit[asr]>=2.dev) (2.10.6)
Requirement already satisfied: sentry-sdk>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from wandb->nemo-toolkit[asr]>=2.dev) (2.22.0)
Requirement already satisfied: setproctitle in /usr/local/lib/python3.11/dist-packages (from wandb->nemo-toolkit[asr]>=2.dev) (1.3.4)
Collecting Mako (from alembic>=1.5.0->optuna->nemo-toolkit[asr]>=2.dev)
  Downloading Mako-1.3.9-py3-none-any.whl.metadata (2.9 kB)
Requirement already satisfied: pycparser in /usr/local/lib/python3.11/dist-packages (from cffi>=1.0->soundfile->nemo-toolkit[asr]>=2.dev) (2.22)
Requirement already satisfied: toolz>=0.8.0 in /usr/local/lib/python3.11/dist-packages (from cytoolz>=0.10.1->lhotse>=1.26.0->nemo-toolkit[asr]>=2.dev) (0.12.1)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (2.4.6)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (1.3.2)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (25.1.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp->datasets->nemo-toolkit[asr]>=2.dev) (1.18.3)
Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.11/dist-packages (from gitpython!=3.1.29,>=1.0.0->wandb->nemo-toolkit[asr]>=2.dev) (4.0.12)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (1.4.8)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (3.2.1)
Requirement already satisfied: typer>=0.12.1 in /usr/local/lib/python3.11/dist-packages (from pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit[asr]>=2.dev) (0.15.1)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=2.6->wandb->nemo-toolkit[asr]>=2.dev) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3,>=2.6->wandb->nemo-toolkit[asr]>=2.dev) (2.27.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->huggingface_hub>=0.24->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (2025.1.31)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.11/dist-packages (from sqlalchemy>=1.4.2->optuna->nemo-toolkit[asr]>=2.dev) (3.1.1)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from werkzeug>=1.0.1->tensorboard->nemo-toolkit>=2.dev->nemo-toolkit[asr]>=2.dev) (3.0.2)
Collecting jedi>=0.16 (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.7.5)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (5.7.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (3.0.50)
Requirement already satisfied: pygments in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (2.18.0)
Requirement already satisfied: backcall in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.2.0)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.1.7)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.11/dist-packages (from ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (4.9.0)
Requirement already satisfied: smmap<6,>=3.0.1 in /usr/local/lib/python3.11/dist-packages (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb->nemo-toolkit[asr]>=2.dev) (5.0.2)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /usr/local/lib/python3.11/dist-packages (from jedi>=0.16->ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.8.4)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.11/dist-packages (from pexpect>4.3->ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.11/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->mediapy==1.1.6->nemo-toolkit[asr]>=2.dev) (0.2.13)
Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from typer>=0.12.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit[asr]>=2.dev) (1.5.4)
Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.11/dist-packages (from typer>=0.12.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit[asr]>=2.dev) (13.9.4)
Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.11/dist-packages (from rich>=10.11.0->typer>=0.12.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit[asr]>=2.dev) (3.0.0)
Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.11/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer>=0.12.1->pyannote.database>=4.0.1->pyannote.metrics->nemo-toolkit[asr]>=2.dev) (0.1.2)
Downloading nemo_toolkit-2.2.0rc2-py3-none-any.whl (5.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 70.1 MB/s eta 0:00:00
Downloading protobuf-3.20.3-py2.py3-none-any.whl (162 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.1/162.1 kB 11.9 MB/s eta 0:00:00
Downloading mediapy-1.1.6-py3-none-any.whl (24 kB)
Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.5/154.5 kB 15.2 MB/s eta 0:00:00
Downloading lhotse-1.29.0-py3-none-any.whl (843 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 843.9/843.9 kB 57.3 MB/s eta 0:00:00
Downloading lightning-2.4.0-py3-none-any.whl (810 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 811.0/811.0 kB 46.2 MB/s eta 0:00:00
Downloading omegaconf-2.3.0-py3-none-any.whl (79 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 kB 7.4 MB/s eta 0:00:00
Downloading onnx-1.17.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.0/16.0 MB 83.3 MB/s eta 0:00:00
Downloading sacremoses-0.1.1-py3-none-any.whl (897 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 897.5/897.5 kB 47.4 MB/s eta 0:00:00
Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.4/363.4 MB 4.2 MB/s eta 0:00:00
Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 102.4 MB/s eta 0:00:00
Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 80.9 MB/s eta 0:00:00
Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 883.7/883.7 kB 53.6 MB/s eta 0:00:00
Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 2.7 MB/s eta 0:00:00
Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.5/211.5 MB 5.7 MB/s eta 0:00:00
Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 10.8 MB/s eta 0:00:00
Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 127.9/127.9 MB 7.3 MB/s eta 0:00:00
Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.5/207.5 MB 5.9 MB/s eta 0:00:00
Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 85.4 MB/s eta 0:00:00
Downloading torchmetrics-1.6.1-py3-none-any.whl (927 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 927.3/927.3 kB 58.1 MB/s eta 0:00:00
Downloading webdataset-0.2.111-py3-none-any.whl (85 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.5/85.5 kB 8.6 MB/s eta 0:00:00
Downloading braceexpand-0.1.7-py2.py3-none-any.whl (5.9 kB)
Downloading datasets-3.3.2-py3-none-any.whl (485 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 485.4/485.4 kB 30.2 MB/s eta 0:00:00
Downloading fiddle-0.3.0-py3-none-any.whl (419 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 419.8/419.8 kB 28.3 MB/s eta 0:00:00
Downloading g2p_en-2.1.0-py3-none-any.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 73.1 MB/s eta 0:00:00
Downloading jiwer-3.1.0-py3-none-any.whl (22 kB)
Downloading kaldiio-2.18.0-py3-none-any.whl (28 kB)
Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.9/50.9 kB 4.6 MB/s eta 0:00:00
Downloading optuna-4.2.1-py3-none-any.whl (383 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 383.6/383.6 kB 28.9 MB/s eta 0:00:00
Downloading pyannote.core-5.0.0-py3-none-any.whl (58 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.5/58.5 kB 5.1 MB/s eta 0:00:00
Downloading pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.4/51.4 kB 4.7 MB/s eta 0:00:00
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Downloading pyloudnorm-0.1.1-py3-none-any.whl (9.6 kB)
Downloading resampy-0.4.3-py3-none-any.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 75.9 MB/s eta 0:00:00
Downloading ruamel.yaml-0.18.10-py3-none-any.whl (117 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.7/117.7 kB 11.2 MB/s eta 0:00:00
Downloading alembic-1.14.1-py3-none-any.whl (233 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.6/233.6 kB 18.9 MB/s eta 0:00:00
Downloading cytoolz-1.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 71.3 MB/s eta 0:00:00
Downloading dill-0.3.8-py3-none-any.whl (116 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 11.1 MB/s eta 0:00:00
Downloading lightning_utilities-0.12.0-py3-none-any.whl (28 kB)
Downloading lilcom-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (87 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.2/87.2 kB 8.5 MB/s eta 0:00:00
Downloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 143.5/143.5 kB 13.3 MB/s eta 0:00:00
Downloading pyannote.database-5.1.3-py3-none-any.whl (48 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.1/48.1 kB 4.7 MB/s eta 0:00:00
Downloading rapidfuzz-3.12.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 75.2 MB/s eta 0:00:00
Downloading ruamel.yaml.clib-0.2.12-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (739 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 739.1/739.1 kB 44.7 MB/s eta 0:00:00
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading levenshtein-0.26.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (162 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.7/162.7 kB 12.5 MB/s eta 0:00:00
Downloading libcst-1.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 56.7 MB/s eta 0:00:00
Downloading loguru-0.7.3-py3-none-any.whl (61 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.6/61.6 kB 6.1 MB/s eta 0:00:00
Downloading plac-1.4.3-py2.py3-none-any.whl (22 kB)
Using cached pybind11-2.13.6-py3-none-any.whl (243 kB)
Downloading pytorch_lightning-2.5.0.post0-py3-none-any.whl (819 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 819.3/819.3 kB 47.2 MB/s eta 0:00:00
Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 kB 17.5 MB/s eta 0:00:00
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 55.8 MB/s eta 0:00:00
Downloading Mako-1.3.9-py3-none-any.whl (78 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 7.1 MB/s eta 0:00:00
Building wheels for collected packages: antlr4-python3-runtime, kaldi-python-io, sox, texterrors, wget, distance, docopt, intervaltree
  Building wheel for antlr4-python3-runtime (setup.py) ... done
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144555 sha256=b978e729c5ee5b684e2be45e3a4b049d4f9dee523ff0a14049739a4c8a444c64
  Stored in directory: /root/.cache/pip/wheels/1a/97/32/461f837398029ad76911109f07047fde1d7b661a147c7c56d1
  Building wheel for kaldi-python-io (setup.py) ... done
  Created wheel for kaldi-python-io: filename=kaldi_python_io-1.2.2-py3-none-any.whl size=8952 sha256=f26c869066161eead80fb367d0069abf080d52e144b460987096c20262693fe0
  Stored in directory: /root/.cache/pip/wheels/f2/86/7b/eec1bb7dc63b8aab5da6317609313873e6e75f065b65f3c29c
  Building wheel for sox (setup.py) ... done
  Created wheel for sox: filename=sox-1.5.0-py3-none-any.whl size=40037 sha256=43f3206d77d55d8ce1ed9805b22f3f91eee4164fb7714576597b5c5ffb8d391a
  Stored in directory: /root/.cache/pip/wheels/74/89/93/023fcdacaec4e5471e78b43992515e8500cc2505b307e2e6b7
  Building wheel for texterrors (setup.py) ... done
  Created wheel for texterrors: filename=texterrors-0.5.1-cp311-cp311-linux_x86_64.whl size=1077990 sha256=57d5c6a080bebc6562978dbcbc54b1fe5ef54fc76ef430cc3f7ac019e9f4a1d7
  Stored in directory: /root/.cache/pip/wheels/6f/94/c8/7edaa578fc800d26e3fda18fba557a4218ab553d078ee51b46
  Building wheel for wget (setup.py) ... done
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9656 sha256=cfe1bf696f9a8fb368455b89af7576a9a660ec7946ee53a271bd2847bbb16e77
  Stored in directory: /root/.cache/pip/wheels/40/b3/0f/a40dbd1c6861731779f62cc4babcb234387e11d697df70ee97
  Building wheel for distance (setup.py) ... done
  Created wheel for distance: filename=Distance-0.1.3-py3-none-any.whl size=16256 sha256=0cd89bdead7c404c1c356b5e8c777ae97e95f0f8ae83b674ea2224b412806977
  Stored in directory: /root/.cache/pip/wheels/fb/cd/9c/3ab5d666e3bcacc58900b10959edd3816cc9557c7337986322
  Building wheel for docopt (setup.py) ... done
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13706 sha256=264d309e13c592fb43ae9276b34c66c6b94f75262dfb906caca6ec4b8575567f
  Stored in directory: /root/.cache/pip/wheels/1a/b0/8c/4b75c4116c31f83c8f9f047231251e13cc74481cca4a78a9ce
  Building wheel for intervaltree (setup.py) ... done
  Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26097 sha256=bf54e2eb1d56dcaebab468f1875c8122da49ae31aa2bb30f508ddb552a94d30e
  Stored in directory: /root/.cache/pip/wheels/31/d7/d9/eec6891f78cac19a693bd40ecb8365d2f4613318c145ec9816
Successfully built antlr4-python3-runtime kaldi-python-io sox texterrors wget distance docopt intervaltree
Installing collected packages: wget, sortedcontainers, pydub, plac, docopt, distance, braceexpand, antlr4-python3-runtime, xxhash, webdataset, sox, sacremoses, ruamel.yaml.clib, rapidfuzz, pybind11, protobuf, omegaconf, nvidia-nvjitlink-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, marshmallow, Mako, loguru, lilcom, lightning-utilities, libcst, kaldiio, kaldi-python-io, jedi, intervaltree, dill, cytoolz, colorlog, ruamel.yaml, resampy, pyloudnorm, pyannote.core, onnx, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, Levenshtein, jiwer, hydra-core, fiddle, alembic, texterrors, optuna, nvidia-cusolver-cu12, mediapy, g2p_en, pyannote.database, datasets, torchmetrics, pyannote.metrics, nemo-toolkit, lhotse, pytorch-lightning, lightning
  Attempting uninstall: protobuf
    Found existing installation: protobuf 4.25.6
    Uninstalling protobuf-4.25.6:
      Successfully uninstalled protobuf-4.25.6
  Attempting uninstall: nvidia-nvjitlink-cu12
    Found existing installation: nvidia-nvjitlink-cu12 12.5.82
    Uninstalling nvidia-nvjitlink-cu12-12.5.82:
      Successfully uninstalled nvidia-nvjitlink-cu12-12.5.82
  Attempting uninstall: nvidia-curand-cu12
    Found existing installation: nvidia-curand-cu12 10.3.6.82
    Uninstalling nvidia-curand-cu12-10.3.6.82:
      Successfully uninstalled nvidia-curand-cu12-10.3.6.82
  Attempting uninstall: nvidia-cufft-cu12
    Found existing installation: nvidia-cufft-cu12 11.2.3.61
    Uninstalling nvidia-cufft-cu12-11.2.3.61:
      Successfully uninstalled nvidia-cufft-cu12-11.2.3.61
  Attempting uninstall: nvidia-cuda-runtime-cu12
    Found existing installation: nvidia-cuda-runtime-cu12 12.5.82
    Uninstalling nvidia-cuda-runtime-cu12-12.5.82:
      Successfully uninstalled nvidia-cuda-runtime-cu12-12.5.82
  Attempting uninstall: nvidia-cuda-nvrtc-cu12
    Found existing installation: nvidia-cuda-nvrtc-cu12 12.5.82
    Uninstalling nvidia-cuda-nvrtc-cu12-12.5.82:
      Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.5.82
  Attempting uninstall: nvidia-cuda-cupti-cu12
    Found existing installation: nvidia-cuda-cupti-cu12 12.5.82
    Uninstalling nvidia-cuda-cupti-cu12-12.5.82:
      Successfully uninstalled nvidia-cuda-cupti-cu12-12.5.82
  Attempting uninstall: nvidia-cublas-cu12
    Found existing installation: nvidia-cublas-cu12 12.5.3.2
    Uninstalling nvidia-cublas-cu12-12.5.3.2:
      Successfully uninstalled nvidia-cublas-cu12-12.5.3.2
  Attempting uninstall: nvidia-cusparse-cu12
    Found existing installation: nvidia-cusparse-cu12 12.5.1.3
    Uninstalling nvidia-cusparse-cu12-12.5.1.3:
      Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3
  Attempting uninstall: nvidia-cudnn-cu12
    Found existing installation: nvidia-cudnn-cu12 9.3.0.75
    Uninstalling nvidia-cudnn-cu12-9.3.0.75:
      Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75
  Attempting uninstall: nvidia-cusolver-cu12
    Found existing installation: nvidia-cusolver-cu12 11.6.3.83
    Uninstalling nvidia-cusolver-cu12-11.6.3.83:
      Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.62.3 requires protobuf>=4.21.6, but you have protobuf 3.20.3 which is incompatible.
tensorflow-metadata 1.16.1 requires protobuf<6.0.0dev,>=4.25.2; python_version >= "3.11", but you have protobuf 3.20.3 which is incompatible.
Successfully installed Levenshtein-0.26.1 Mako-1.3.9 alembic-1.14.1 antlr4-python3-runtime-4.9.3 braceexpand-0.1.7 colorlog-6.9.0 cytoolz-1.0.1 datasets-3.3.2 dill-0.3.8 distance-0.1.3 docopt-0.6.2 fiddle-0.3.0 g2p_en-2.1.0 hydra-core-1.3.2 intervaltree-3.1.0 jedi-0.19.2 jiwer-3.1.0 kaldi-python-io-1.2.2 kaldiio-2.18.0 lhotse-1.29.0 libcst-1.6.0 lightning-2.4.0 lightning-utilities-0.12.0 lilcom-1.8.0 loguru-0.7.3 marshmallow-3.26.1 mediapy-1.1.6 multiprocess-0.70.16 nemo-toolkit-2.2.0rc2 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nvjitlink-cu12-12.4.127 omegaconf-2.3.0 onnx-1.17.0 optuna-4.2.1 plac-1.4.3 protobuf-3.20.3 pyannote.core-5.0.0 pyannote.database-5.1.3 pyannote.metrics-3.2.1 pybind11-2.13.6 pydub-0.25.1 pyloudnorm-0.1.1 pytorch-lightning-2.5.0.post0 rapidfuzz-3.12.1 resampy-0.4.3 ruamel.yaml-0.18.10 ruamel.yaml.clib-0.2.12 sacremoses-0.1.1 sortedcontainers-2.4.0 sox-1.5.0 texterrors-0.5.1 torchmetrics-1.6.1 webdataset-0.2.111 wget-3.2 xxhash-3.5.0

Collecting git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git
  Cloning https://github.com/MahmoudAshraf97/ctc-forced-aligner.git to /tmp/pip-req-build-gnsop7x5
  Running command git clone --filter=blob:none --quiet https://github.com/MahmoudAshraf97/ctc-forced-aligner.git /tmp/pip-req-build-gnsop7x5
  Resolved https://github.com/MahmoudAshraf97/ctc-forced-aligner.git to commit 7578992b6647a98e65b539436d88bc7bba690374
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: nltk in /usr/local/lib/python3.11/dist-packages (from ctc-forced-aligner==0.3.0) (3.9.1)
Requirement already satisfied: torch in /usr/local/lib/python3.11/dist-packages (from ctc-forced-aligner==0.3.0) (2.5.1+cu124)
Requirement already satisfied: torchaudio in /usr/local/lib/python3.11/dist-packages (from ctc-forced-aligner==0.3.0) (2.5.1+cu124)
Requirement already satisfied: transformers>=4.34 in /usr/local/lib/python3.11/dist-packages (from ctc-forced-aligner==0.3.0) (4.48.3)
Collecting Unidecode (from ctc-forced-aligner==0.3.0)
  Downloading Unidecode-1.3.8-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (3.17.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.24.0 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (0.28.1)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (2024.11.6)
Requirement already satisfied: requests in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (2.32.3)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (0.21.0)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (0.5.2)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.11/dist-packages (from transformers>=4.34->ctc-forced-aligner==0.3.0) (4.67.1)
Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from nltk->ctc-forced-aligner==0.3.0) (8.1.8)
Requirement already satisfied: joblib in /usr/local/lib/python3.11/dist-packages (from nltk->ctc-forced-aligner==0.3.0) (1.4.2)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (4.12.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (3.1.5)
Requirement already satisfied: fsspec in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (2024.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.4.127)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.4.127)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.4.127)
Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (9.1.0.70)
Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.4.5.8)
Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (11.2.1.3)
Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (10.3.5.147)
Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (11.6.1.9)
Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.3.1.170)
Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (2.21.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.4.127)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (12.4.127)
Requirement already satisfied: triton==3.1.0 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (3.1.0)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch->ctc-forced-aligner==0.3.0) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch->ctc-forced-aligner==0.3.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2->torch->ctc-forced-aligner==0.3.0) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.34->ctc-forced-aligner==0.3.0) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.34->ctc-forced-aligner==0.3.0) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.34->ctc-forced-aligner==0.3.0) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests->transformers>=4.34->ctc-forced-aligner==0.3.0) (2025.1.31)
Downloading Unidecode-1.3.8-py3-none-any.whl (235 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 235.5/235.5 kB 3.8 MB/s eta 0:00:00
Building wheels for collected packages: ctc-forced-aligner
  Building wheel for ctc-forced-aligner (pyproject.toml) ... done
  Created wheel for ctc-forced-aligner: filename=ctc_forced_aligner-0.3.0-cp311-cp311-linux_x86_64.whl size=1155143 sha256=907fea8d0ef02483525b9afe0b7a78f60aa58e32bb3e60209ae840425918deb6
  Stored in directory: /tmp/pip-ephem-wheel-cache-ghpao_8x/wheels/c0/7c/67/0b6728114427b3234d95031945ea8ab5c50a1b83c90ad5424f
Successfully built ctc-forced-aligner
Installing collected packages: Unidecode, ctc-forced-aligner
Successfully installed Unidecode-1.3.8 ctc-forced-aligner-0.3.0

[1]

Download Data

You can chose to either download these sample audio files or upload your own audio samples for testing. I'll be using some of my audio samples for this project.

[1]

Downloaded: input_audio_arjun.mp3
Downloaded: input_audio_hamdeep.m4a
Downloaded: input_audio_shresth.m4a
All files downloaded successfully!
Downloaded: languages_pair.json

[3]

Naive Transcription without Speaker Information

[21]

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.

0it [00:00, ?it/s]

config.json:   0%|          | 0.00/2.80k [00:00<?, ?B/s]

model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

vocabulary.txt:   0%|          | 0.00/460k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.20M [00:00<?, ?B/s]


Timestamped Transcript:
[{'start': 1.08, 'end': 1.86, 'text': "I'm"}, {'start': 1.86, 'end': 2.36, 'text': 'recording'}, {'start': 2.36, 'end': 2.6, 'text': 'this'}, {'start': 2.6, 'end': 3.0, 'text': 'audio'}, {'start': 3.0, 'end': 3.52, 'text': 'to'}, {'start': 3.52, 'end': 3.96, 'text': 'compare'}, {'start': 3.96, 'end': 4.22, 'text': 'this'}, {'start': 4.22, 'end': 4.36, 'text': 'to'}, {'start': 4.36, 'end': 4.52, 'text': 'my'}, {'start': 4.52, 'end': 5.16, 'text': 'initial'}, {'start': 5.16, 'end': 5.6, 'text': 'audio'}, {'start': 5.6, 'end': 5.86, 'text': 'that'}, {'start': 5.86, 'end': 6.04, 'text': 'I'}, {'start': 6.04, 'end': 6.42, 'text': 'passed.'}, {'start': 7.12, 'end': 7.32, 'text': 'So'}, {'start': 7.32, 'end': 8.02, 'text': 'in'}, {'start': 8.02, 'end': 8.22, 'text': 'that'}, {'start': 8.22, 'end': 8.52, 'text': 'audio,'}, {'start': 8.68, 'end': 8.76, 'text': 'I'}, {'start': 8.76, 'end': 9.2, 'text': 'mentioned'}, {'start': 9.2, 'end': 9.62, 'text': 'how'}, {'start': 9.62, 'end': 10.58, 'text': "I'm"}, {'start': 10.58, 'end': 10.84, 'text': 'building'}, {'start': 10.84, 'end': 11.34, 'text': 'an'}, {'start': 11.34, 'end': 11.78, 'text': 'application'}, {'start': 11.78, 'end': 12.14, 'text': 'which'}, {'start': 12.14, 'end': 13.12, 'text': 'will'}, {'start': 13.12, 'end': 13.5, 'text': 'map'}, {'start': 13.5, 'end': 14.06, 'text': 'speakers'}, {'start': 14.06, 'end': 14.42, 'text': 'based'}, {'start': 14.42, 'end': 14.64, 'text': 'on'}, {'start': 14.64, 'end': 14.78, 'text': 'their'}, {'start': 14.78, 'end': 15.08, 'text': 'voices'}, {'start': 15.08, 'end': 15.36, 'text': 'in'}, {'start': 15.36, 'end': 15.5, 'text': 'the'}, {'start': 15.5, 'end': 15.98, 'text': 'transcription.'}, {'start': 16.78, 'end': 16.96, 'text': 'And'}, {'start': 16.96, 'end': 17.36, 'text': 'also'}, {'start': 17.36, 'end': 17.56, 'text': 'I'}, {'start': 17.56, 'end': 17.98, 'text': 'discussed'}, {'start': 17.98, 'end': 18.46, 'text': 'about'}, {'start': 18.46, 'end': 19.38, 'text': 'how'}, {'start': 19.38, 'end': 19.66, 'text': "I'll"}, {'start': 19.66, 'end': 19.88, 'text': 'be'}, {'start': 19.88, 'end': 20.4, 'text': 'visiting'}, {'start': 20.4, 'end': 21.62, 'text': 'an'}, {'start': 21.62, 'end': 22.14, 'text': 'event'}, {'start': 22.14, 'end': 22.6, 'text': 'tomorrow.'}, {'start': 23.25, 'end': 23.64, 'text': 'This'}, {'start': 23.64, 'end': 23.88, 'text': 'event'}, {'start': 23.88, 'end': 24.08, 'text': 'is'}, {'start': 24.08, 'end': 24.38, 'text': 'hosted'}, {'start': 24.38, 'end': 24.68, 'text': 'by'}, {'start': 24.68, 'end': 25.04, 'text': 'TensorFlow'}, {'start': 25.04, 'end': 25.62, 'text': 'Group'}, {'start': 25.62, 'end': 26.18, 'text': 'Ghaziabad'}, {'start': 26.18, 'end': 26.42, 'text': 'and'}, {'start': 26.42, 'end': 26.64, 'text': "it's"}, {'start': 26.64, 'end': 26.84, 'text': 'called'}, {'start': 26.84, 'end': 27.26, 'text': 'ML'}, {'start': 27.26, 'end': 27.8, 'text': 'Saturday.'}, {'start': 28.6, 'end': 28.9, 'text': 'So'}, {'start': 28.9, 'end': 29.26, 'text': 'it'}, {'start': 29.26, 'end': 29.44, 'text': 'is'}, {'start': 29.44, 'end': 29.56, 'text': 'on'}, {'start': 29.56, 'end': 29.96, 'text': 'Saturday,'}, {'start': 30.16, 'end': 30.32, 'text': "that's"}, {'start': 30.32, 'end': 30.42, 'text': 'why'}, {'start': 30.42, 'end': 30.52, 'text': 'it'}, {'start': 30.52, 'end': 30.74, 'text': 'is'}, {'start': 30.74, 'end': 31.18, 'text': 'named'}, {'start': 31.18, 'end': 31.46, 'text': 'as'}, {'start': 31.46, 'end': 31.72, 'text': 'ML'}, {'start': 31.72, 'end': 32.14, 'text': 'Saturday'}, {'start': 32.14, 'end': 32.46, 'text': 'where'}, {'start': 32.46, 'end': 32.62, 'text': 'we'}, {'start': 32.62, 'end': 33.22, 'text': 'have'}, {'start': 33.22, 'end': 33.6, 'text': 'some'}, {'start': 33.6, 'end': 34.18, 'text': 'professionals'}, {'start': 34.18, 'end': 34.68, 'text': 'coming'}, {'start': 34.68, 'end': 34.94, 'text': 'in'}, {'start': 34.94, 'end': 35.14, 'text': 'from'}, {'start': 35.14, 'end': 35.4, 'text': 'machine'}, {'start': 35.4, 'end': 35.7, 'text': 'learning'}, {'start': 35.7, 'end': 36.1, 'text': 'domain.'}, {'start': 36.88, 'end': 37.48, 'text': 'See'}, {'start': 37.48, 'end': 37.64, 'text': 'you'}, {'start': 37.64, 'end': 37.88, 'text': 'at'}, {'start': 37.88, 'end': 38.3, 'text': '12'}, {'start': 38.3, 'end': 38.76, 'text': 'tomorrow,'}, {'start': 39.22, 'end': 39.46, 'text': 'thank'}, {'start': 39.46, 'end': 39.62, 'text': 'you.'}]

Plain Text Transcript:
I'm recording this audio to compare this to my initial audio that I passed. So in that audio, I mentioned how I'm building an application which will map speakers based on their voices in the transcription. And also I discussed about how I'll be visiting an event tomorrow. This event is hosted by TensorFlow Group Ghaziabad and it's called ML Saturday. So it is on Saturday, that's why it is named as ML Saturday where we have some professionals coming in from machine learning domain. See you at 12 tomorrow, thank you.

While this gives good transcription results, it is boring xd. I don't know who is speaking these words. Don't worry we'll fix it. In second step, we'll see how to connect lancedb with azure so that we can use this feature during the development and then we'll jump onto building out solution.

How to use LanceDB with Azure Blob?

[15]

[9]

Connected to LanceDB on Azure Blob Storage!

[18]

[19]

[20]

I think now we are ready to build our application.

Speaker Mapping using Whisper, Nemo-MSDD and LanceDB

Create database of known speakers. You need to have mutiple audio files with correct names at this step.

[4]

[5]

hyperparams.yaml:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

/usr/local/lib/python3.11/dist-packages/speechbrain/utils/autocast.py:68: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  wrapped_fwd = torch.cuda.amp.custom_fwd(fwd, cast_inputs=cast_inputs)

embedding_model.ckpt:   0%|          | 0.00/83.3M [00:00<?, ?B/s]

mean_var_norm_emb.ckpt:   0%|          | 0.00/1.92k [00:00<?, ?B/s]

classifier.ckpt:   0%|          | 0.00/5.53M [00:00<?, ?B/s]

label_encoder.txt:   0%|          | 0.00/129k [00:00<?, ?B/s]

/usr/local/lib/python3.11/dist-packages/speechbrain/utils/checkpoints.py:200: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state_dict = torch.load(path, map_location=device)
/usr/local/lib/python3.11/dist-packages/speechbrain/processing/features.py:1311: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
stats = torch.load(path, map_location=device)

[6]

[ ]

[10]

[12]

[11]

[12]

Step 2 - Set up base parameters.

[3]

Helper Functions from referenced notebook.

You don't need to check them all. We'll be using only a few of these functions in this notebook, whichever are required. To create TXT or SRT files after mapping, please refer to the other notebook and integrate the rest of the code at the end after creating speaker mapping.

[4]

Transcription using Whisper

[5]

We are using Audio Directly

[7]

tokenizer.json:   0%|          | 0.00/2.20M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.80k [00:00<?, ?B/s]

vocabulary.txt:   0%|          | 0.00/460k [00:00<?, ?B/s]

model.bin:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

[6]

 name and address of importer z lifestyle private private limited USB Wire

[8]

[9]

Performing Diarization using Nemo-MSDD

[10]

[11]

[NeMo I 2025-02-23 11:30:34 nemo_logging:393] Loading pretrained diar_msdd_telephonic model from NGC
[NeMo I 2025-02-23 11:30:34 nemo_logging:393] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/diar_msdd_telephonic/versions/1.0.1/files/diar_msdd_telephonic.nemo to /root/.cache/torch/NeMo/NeMo_2.2.0rc2/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo
[NeMo I 2025-02-23 11:30:35 nemo_logging:393] Instantiating model from pre-trained checkpoint

[NeMo W 2025-02-23 11:30:37 nemo_logging:405] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: true
    
[NeMo W 2025-02-23 11:30:37 nemo_logging:405] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    
[NeMo W 2025-02-23 11:30:37 nemo_logging:405] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config : 
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    seq_eval_mode: false

[NeMo I 2025-02-23 11:30:37 nemo_logging:393] PADDING: 16
[NeMo I 2025-02-23 11:30:37 nemo_logging:393] PADDING: 16
[NeMo I 2025-02-23 11:30:38 nemo_logging:393] Model EncDecDiarLabelModel was successfully restored from /root/.cache/torch/NeMo/NeMo_2.2.0rc2/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo.
[NeMo I 2025-02-23 11:30:38 nemo_logging:393] PADDING: 16
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Loading pretrained vad_multilingual_marblenet model from NGC
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/vad_multilingual_marblenet/versions/1.10.0/files/vad_multilingual_marblenet.nemo to /root/.cache/torch/NeMo/NeMo_2.2.0rc2/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Instantiating model from pre-trained checkpoint

[NeMo W 2025-02-23 11:30:39 nemo_logging:405] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: /manifests/ami_train_0.63.json,/manifests/freesound_background_train.json,/manifests/freesound_laughter_train.json,/manifests/fisher_2004_background.json,/manifests/fisher_2004_speech_sampled.json,/manifests/google_train_manifest.json,/manifests/icsi_all_0.63.json,/manifests/musan_freesound_train.json,/manifests/musan_music_train.json,/manifests/musan_soundbible_train.json,/manifests/mandarin_train_sample.json,/manifests/german_train_sample.json,/manifests/spanish_train_sample.json,/manifests/french_train_sample.json,/manifests/russian_train_sample.json
    sample_rate: 16000
    labels:
    - background
    - speech
    batch_size: 256
    shuffle: true
    is_tarred: false
    tarred_audio_filepaths: null
    tarred_shard_strategy: scatter
    augmentor:
      shift:
        prob: 0.5
        min_shift_ms: -10.0
        max_shift_ms: 10.0
      white_noise:
        prob: 0.5
        min_level: -90
        max_level: -46
        norm: true
      noise:
        prob: 0.5
        manifest_path: /manifests/noise_0_1_musan_fs.json
        min_snr_db: 0
        max_snr_db: 30
        max_gain_db: 300.0
        norm: true
      gain:
        prob: 0.5
        min_gain_dbfs: -10.0
        max_gain_dbfs: 10.0
        norm: true
    num_workers: 16
    pin_memory: true
    
[NeMo W 2025-02-23 11:30:39 nemo_logging:405] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath: /manifests/ami_dev_0.63.json,/manifests/freesound_background_dev.json,/manifests/freesound_laughter_dev.json,/manifests/ch120_moved_0.63.json,/manifests/fisher_2005_500_speech_sampled.json,/manifests/google_dev_manifest.json,/manifests/musan_music_dev.json,/manifests/mandarin_dev.json,/manifests/german_dev.json,/manifests/spanish_dev.json,/manifests/french_dev.json,/manifests/russian_dev.json
    sample_rate: 16000
    labels:
    - background
    - speech
    batch_size: 256
    shuffle: false
    val_loss_idx: 0
    num_workers: 16
    pin_memory: true
    
[NeMo W 2025-02-23 11:30:39 nemo_logging:405] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config : 
    manifest_filepath: null
    sample_rate: 16000
    labels:
    - background
    - speech
    batch_size: 128
    shuffle: false
    test_loss_idx: 0

[NeMo I 2025-02-23 11:30:39 nemo_logging:393] PADDING: 16
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Model EncDecClassificationModel was successfully restored from /root/.cache/torch/NeMo/NeMo_2.2.0rc2/vad_multilingual_marblenet/670f425c7f186060b7a7268ba6dfacb2/vad_multilingual_marblenet.nemo.
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Multiscale Weights: [1, 1, 1, 1, 1]
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Clustering Parameters: {
        "oracle_num_speakers": false,
        "max_num_speakers": 8,
        "enhanced_count_thres": 80,
        "max_rp_threshold": 0.25,
        "sparse_search_volume": 30,
        "maj_vote_spk_count": false,
        "chunk_cluster_count": 50,
        "embeddings_per_chunk": 10000
    }
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Number of files to diarize: 1
[NeMo I 2025-02-23 11:30:39 nemo_logging:393] Split long audio file to avoid CUDA memory issue

splitting manifest: 100%|██████████| 1/1 [00:21<00:00, 21.26s/it]

[NeMo I 2025-02-23 11:31:00 nemo_logging:393] Perform streaming frame-level VAD
[NeMo I 2025-02-23 11:31:00 nemo_logging:393] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2025-02-23 11:31:00 nemo_logging:393] Dataset successfully loaded with 1 items and total duration provided from manifest is  0.01 hours.
[NeMo I 2025-02-23 11:31:00 nemo_logging:393] # 1 files loaded accounting to # 1 labels


vad: 100%|██████████| 1/1 [00:02<00:00,  2.22s/it]

[NeMo I 2025-02-23 11:31:03 nemo_logging:393] Generating predictions with overlapping input segments

[NeMo I 2025-02-23 11:31:03 nemo_logging:393] Converting frame level prediction to speech/no-speech segment in start and end times format.

creating speech segments: 100%|██████████| 1/1 [00:00<00:00,  7.23it/s]

[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Subsegmentation for embedding extraction: scale0, temp_outputs/speaker_outputs/subsegments_scale0.json
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Extracting embeddings for Diarization
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Dataset successfully loaded with 26 items and total duration provided from manifest is  0.01 hours.
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] # 26 files loaded accounting to # 1 labels


[1/5] extract embeddings: 100%|██████████| 1/1 [00:00<00:00,  2.24it/s]

[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Saved embedding files to temp_outputs/speaker_outputs/embeddings
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Subsegmentation for embedding extraction: scale1, temp_outputs/speaker_outputs/subsegments_scale1.json
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Extracting embeddings for Diarization
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Dataset successfully loaded with 33 items and total duration provided from manifest is  0.01 hours.
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] # 33 files loaded accounting to # 1 labels


[2/5] extract embeddings: 100%|██████████| 1/1 [00:00<00:00,  4.65it/s]

[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Saved embedding files to temp_outputs/speaker_outputs/embeddings
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Subsegmentation for embedding extraction: scale2, temp_outputs/speaker_outputs/subsegments_scale2.json
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Extracting embeddings for Diarization
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] Dataset successfully loaded with 38 items and total duration provided from manifest is  0.01 hours.
[NeMo I 2025-02-23 11:31:04 nemo_logging:393] # 38 files loaded accounting to # 1 labels


[3/5] extract embeddings: 100%|██████████| 1/1 [00:00<00:00,  5.05it/s]

[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Saved embedding files to temp_outputs/speaker_outputs/embeddings
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Subsegmentation for embedding extraction: scale3, temp_outputs/speaker_outputs/subsegments_scale3.json
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Extracting embeddings for Diarization
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Dataset successfully loaded with 55 items and total duration provided from manifest is  0.01 hours.
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] # 55 files loaded accounting to # 1 labels


[4/5] extract embeddings: 100%|██████████| 1/1 [00:00<00:00,  6.62it/s]

[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Saved embedding files to temp_outputs/speaker_outputs/embeddings
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Subsegmentation for embedding extraction: scale4, temp_outputs/speaker_outputs/subsegments_scale4.json
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Extracting embeddings for Diarization
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Filtered duration for loading collection is  0.00 hours.
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Dataset successfully loaded with 83 items and total duration provided from manifest is  0.01 hours.
[NeMo I 2025-02-23 11:31:05 nemo_logging:393] # 83 files loaded accounting to # 1 labels


[5/5] extract embeddings: 100%|██████████| 2/2 [00:00<00:00,  7.86it/s]

[NeMo I 2025-02-23 11:31:05 nemo_logging:393] Saved embedding files to temp_outputs/speaker_outputs/embeddings


clustering: 100%|██████████| 1/1 [00:00<00:00,  1.32it/s]

[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Outputs are saved in /content/temp_outputs directory


[NeMo W 2025-02-23 11:31:06 nemo_logging:405] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate

[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Loading embedding pickle file of scale:0 at temp_outputs/speaker_outputs/embeddings/subsegments_scale0_embeddings.pkl
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Loading embedding pickle file of scale:1 at temp_outputs/speaker_outputs/embeddings/subsegments_scale1_embeddings.pkl
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Loading embedding pickle file of scale:2 at temp_outputs/speaker_outputs/embeddings/subsegments_scale2_embeddings.pkl
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Loading embedding pickle file of scale:3 at temp_outputs/speaker_outputs/embeddings/subsegments_scale3_embeddings.pkl
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Loading embedding pickle file of scale:4 at temp_outputs/speaker_outputs/embeddings/subsegments_scale4_embeddings.pkl
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Loading cluster label file from temp_outputs/speaker_outputs/subsegments_scale4_cluster.label
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Filtered duration for loading collection is 0.000000.
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Total 1 session files loaded accounting to # 1 audio clips

100%|██████████| 1/1 [00:00<00:00,  6.43it/s]

[NeMo I 2025-02-23 11:31:06 nemo_logging:393]      [Threshold: 0.7000] [use_clus_as_main=False] [diar_window=50]
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Number of files to diarize: 1
[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Number of files to diarize: 1

[NeMo W 2025-02-23 11:31:06 nemo_logging:405] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate

[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Number of files to diarize: 1

[NeMo W 2025-02-23 11:31:06 nemo_logging:405] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate

[NeMo I 2025-02-23 11:31:06 nemo_logging:393] Number of files to diarize: 1

[NeMo W 2025-02-23 11:31:06 nemo_logging:405] Check if each ground truth RTTMs were present in the provided manifest file. Skipping calculation of Diariazation Error Rate

[NeMo I 2025-02-23 11:31:06 nemo_logging:393]

Extracting first 10 or more seconds for each speaker using RTTM file created above.

[12]

Saved extracted_speakers_audio\speaker_0_first_10s.wav

You can decide how to store these audio files for querying from the vector database. A good practice would be to create a folder and save the audio files with speaker IDs in the filenames. This way, you can easily use this information while mapping. You'll need to create a mapping of each speaker with their correct names (say dictionary for now) to use the next part of the code.

Querying audio from the Lancedb vector database

[14]

Identified Speaker: Arjun, Similarity Score: 1.0

Replace Speakers with their Correct Names

[16]

Modified RTTM file saved successfully!

[19]

Original RTTM File
SPEAKER mono_file 1   0.060   1.900 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   2.380   0.300 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   2.940   1.580 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   5.020   1.020 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   6.300   0.780 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   7.500   2.940 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   10.940   2.780 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   14.060   0.940 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   16.140   1.900 <NA> <NA> speaker_0 <NA> <NA>
SPEAKER mono_file 1   18.300   0.620 <NA> <NA> speaker_0 <NA> <NA>



Updated RTTM File
SPEAKER mono_file 1 0.060 1.900 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 2.380 0.300 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 2.940 1.580 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 5.020 1.020 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 6.300 0.780 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 7.500 2.940 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 10.940 2.780 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 14.060 0.940 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 16.140 1.900 <NA> <NA> Shresth <NA> <NA>
SPEAKER mono_file 1 18.300 0.620 <NA> <NA> Shresth <NA> <NA>

Next Steps

I think you now have a clear idea of how to proceed with this. Once you obtain the updated RTTM file, you need to map speakers to sentences based on their timestamps. Additionally, you need to create a word-level speaker mapping. For this step, you can refer to the reference notebook shared, where you'll find the code for forced alignment to map timestamps with words and finally write the results into an SRT/TXT file. All the best!