Main

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aitutorialsmachine-learningembeddingschatbot_using_Llama2_lanceDBfine-tuningdeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Install the packages

[1]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.4/27.4 MB 13.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 30.5/30.5 MB 12.0 MB/s eta 0:00:00
Collecting pypdf
  Downloading pypdf-5.1.0-py3-none-any.whl.metadata (7.2 kB)
Requirement already satisfied: typing_extensions>=4.0 in /usr/local/lib/python3.10/dist-packages (from pypdf) (4.12.2)
Downloading pypdf-5.1.0-py3-none-any.whl (297 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 298.0/298.0 kB 5.2 MB/s eta 0:00:00
Installing collected packages: pypdf
Successfully installed pypdf-5.1.0
Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.10/dist-packages (3.2.1)
Requirement already satisfied: transformers<5.0.0,>=4.41.0 in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (4.46.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (4.66.6)
Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (2.5.1+cu121)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (1.5.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (1.13.1)
Requirement already satisfied: huggingface-hub>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (0.26.2)
Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from sentence-transformers) (11.0.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (3.16.1)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2024.10.0)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (6.0.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2.32.3)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (4.12.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers) (3.4.2)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers) (3.1.4)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch>=1.11.0->sentence-transformers) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch>=1.11.0->sentence-transformers) (1.3.0)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (1.26.4)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (2024.9.11)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (0.4.5)
Requirement already satisfied: tokenizers<0.21,>=0.20 in /usr/local/lib/python3.10/dist-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (0.20.3)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence-transformers) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->sentence-transformers) (3.5.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers) (3.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (2024.8.30)
Collecting unstructured
  Downloading unstructured-0.16.8-py3-none-any.whl.metadata (24 kB)
Requirement already satisfied: chardet in /usr/local/lib/python3.10/dist-packages (from unstructured) (5.2.0)
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: lxml in /usr/local/lib/python3.10/dist-packages (from unstructured) (5.3.0)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from unstructured) (3.9.1)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from unstructured) (2.32.3)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from unstructured) (4.12.3)
Collecting emoji (from unstructured)
  Downloading emoji-2.14.0-py3-none-any.whl.metadata (5.7 kB)
Collecting dataclasses-json (from unstructured)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2024.10.22-py3-none-any.whl.metadata (13 kB)
Collecting langdetect (from unstructured)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 981.5/981.5 kB 17.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy<2 in /usr/local/lib/python3.10/dist-packages (from unstructured) (1.26.4)
Collecting rapidfuzz (from unstructured)
  Downloading rapidfuzz-3.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Collecting backoff (from unstructured)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from unstructured) (4.12.2)
Collecting unstructured-client (from unstructured)
  Downloading unstructured_client-0.28.1-py3-none-any.whl.metadata (20 kB)
Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from unstructured) (1.16.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from unstructured) (4.66.6)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from unstructured) (5.9.5)
Collecting python-oxmsg (from unstructured)
  Downloading python_oxmsg-0.0.1-py3-none-any.whl.metadata (5.0 kB)
Requirement already satisfied: html5lib in /usr/local/lib/python3.10/dist-packages (from unstructured) (1.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->unstructured) (2.6)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json->unstructured)
  Downloading marshmallow-3.23.1-py3-none-any.whl.metadata (7.5 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json->unstructured)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Requirement already satisfied: six>=1.9 in /usr/local/lib/python3.10/dist-packages (from html5lib->unstructured) (1.16.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from html5lib->unstructured) (0.5.1)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->unstructured) (8.1.7)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->unstructured) (1.4.2)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->unstructured) (2024.9.11)
Collecting olefile (from python-oxmsg->unstructured)
  Downloading olefile-0.47-py2.py3-none-any.whl.metadata (9.7 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->unstructured) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->unstructured) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->unstructured) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->unstructured) (2024.8.30)
Collecting aiofiles>=24.1.0 (from unstructured-client->unstructured)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: cryptography>=3.1 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (43.0.3)
Requirement already satisfied: eval-type-backport<0.3.0,>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (0.2.0)
Requirement already satisfied: httpx>=0.27.0 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (0.27.2)
Collecting jsonpath-python<2.0.0,>=1.0.6 (from unstructured-client->unstructured)
  Downloading jsonpath_python-1.0.6-py3-none-any.whl.metadata (12 kB)
Requirement already satisfied: nest-asyncio>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (1.6.0)
Requirement already satisfied: pydantic<2.10.0,>=2.9.2 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (2.9.2)
Requirement already satisfied: pypdf>=4.0 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (5.1.0)
Requirement already satisfied: python-dateutil<3.0.0,>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (2.8.2)
Requirement already satisfied: requests-toolbelt>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from unstructured-client->unstructured) (1.0.0)
Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.10/dist-packages (from cryptography>=3.1->unstructured-client->unstructured) (1.17.1)
Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.27.0->unstructured-client->unstructured) (3.7.1)
Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.10/dist-packages (from httpx>=0.27.0->unstructured-client->unstructured) (1.0.7)
Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.27.0->unstructured-client->unstructured) (1.3.1)
Requirement already satisfied: h11<0.15,>=0.13 in /usr/local/lib/python3.10/dist-packages (from httpcore==1.*->httpx>=0.27.0->unstructured-client->unstructured) (0.14.0)
Requirement already satisfied: packaging>=17.0 in /usr/local/lib/python3.10/dist-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->unstructured) (24.2)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2.10.0,>=2.9.2->unstructured-client->unstructured) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<2.10.0,>=2.9.2->unstructured-client->unstructured) (2.23.4)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json->unstructured)
  Downloading mypy_extensions-1.0.0-py3-none-any.whl.metadata (1.1 kB)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.12->cryptography>=3.1->unstructured-client->unstructured) (2.22)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx>=0.27.0->unstructured-client->unstructured) (1.2.2)
Downloading unstructured-0.16.8-py3-none-any.whl (1.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 63.3 MB/s eta 0:00:00
Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Downloading emoji-2.14.0-py3-none-any.whl (586 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 586.9/586.9 kB 33.7 MB/s eta 0:00:00
Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Downloading python_iso639-2024.10.22-py3-none-any.whl (274 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 274.9/274.9 kB 20.1 MB/s eta 0:00:00
Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Downloading python_oxmsg-0.0.1-py3-none-any.whl (31 kB)
Downloading rapidfuzz-3.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 58.2 MB/s eta 0:00:00
Downloading unstructured_client-0.28.1-py3-none-any.whl (62 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.9/62.9 kB 4.8 MB/s eta 0:00:00
Downloading aiofiles-24.1.0-py3-none-any.whl (15 kB)
Downloading jsonpath_python-1.0.6-py3-none-any.whl (7.6 kB)
Downloading marshmallow-3.23.1-py3-none-any.whl (49 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.5/49.5 kB 3.5 MB/s eta 0:00:00
Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Downloading olefile-0.47-py2.py3-none-any.whl (114 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.6/114.6 kB 8.9 MB/s eta 0:00:00
Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Building wheels for collected packages: langdetect
  Building wheel for langdetect (setup.py) ... done
  Created wheel for langdetect: filename=langdetect-1.0.9-py3-none-any.whl size=993222 sha256=0de5007c21affc6fe5be588a8f8884cba29a9828c1e3065b5b73c8b2d1c3dd6a
  Stored in directory: /root/.cache/pip/wheels/95/03/7d/59ea870c70ce4e5a370638b5462a7711ab78fba2f655d05106
Successfully built langdetect
Installing collected packages: filetype, rapidfuzz, python-magic, python-iso639, olefile, mypy-extensions, marshmallow, langdetect, jsonpath-python, emoji, backoff, aiofiles, typing-inspect, python-oxmsg, unstructured-client, dataclasses-json, unstructured
Successfully installed aiofiles-24.1.0 backoff-2.2.1 dataclasses-json-0.6.7 emoji-2.14.0 filetype-1.2.0 jsonpath-python-1.0.6 langdetect-1.0.9 marshmallow-3.23.1 mypy-extensions-1.0.0 olefile-0.47 python-iso639-2024.10.22 python-magic-0.4.27 python-oxmsg-0.0.1 rapidfuzz-3.10.1 typing-inspect-0.9.0 unstructured-0.16.8 unstructured-client-0.28.1
Collecting ctransformers
  Downloading ctransformers-0.2.27-py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from ctransformers) (0.26.2)
Requirement already satisfied: py-cpuinfo<10.0.0,>=9.0.0 in /usr/local/lib/python3.10/dist-packages (from ctransformers) (9.0.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (3.16.1)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (2024.10.0)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (24.2)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (6.0.2)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (4.66.6)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub->ctransformers) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->ctransformers) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->ctransformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->ctransformers) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub->ctransformers) (2024.8.30)
Downloading ctransformers-0.2.27-py3-none-any.whl (9.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 56.3 MB/s eta 0:00:00
Installing collected packages: ctransformers
Successfully installed ctransformers-0.2.27

Data Loading

Download data and upload it in your google drive to run this example. While this step is in process, Take a break ☕

Once this step is done, Now you are ready to move with next steps

Mount your drive for uploading data & model

[ ]

create folder inside drive & navigate to it while creating this example folder with name llm inside google drive is used.

[ ]

Import Packages

[ ]

Let's load the embedding model The all-mpnet-base-v2 model provides the best quality, while all-MiniLM-L6-v2 is 5 times faster and still offers good quality. you can try different models we are using all -MiniLM-L6-V2 models for this demo, & we are using cpu for inference

[ ]

select anyone below method

[ ]
[ ]

Now that we have our raw documents loaded, we need to pre-process them to generate embeddings:

Initialize a RecursiveCharacterTextSplitter to split text documents into smaller chunks. This helps in processing large text data efficiently.

[ ]

Lets use lanceDB as vectore store database & crete temp table

[ ]

Now, as we are using the Llama 2 quantized model, you need to download and upload it to your Google Drive. You can choose different versions based on your system specifications. For demonstration purposes, we are using a small model, but feel free to explore other options download the models from Thebloke repo.

https://huggingface.co/TheBloke/Llama-2-7B-GGML/tree/main

download the .bin file & kept in your current directory

[ ]

You can ask the quetions to your data & i will give you response

[ ]
[ ]
[ ]