Main

archived_examplesagentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsparent_document_retrieverfine-tuningexamplesdeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

image.png

Installing dependencies

[53]
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-openai 0.1.23 requires langchain-core<0.3.0,>=0.2.35, but you have langchain-core 0.0.13 which is incompatible.
langchain-openai 0.1.23 requires openai<2.0.0,>=1.40.0, but you have openai 0.28.0 which is incompatible.
langchain-text-splitters 0.2.4 requires langchain-core<0.3.0,>=0.2.38, but you have langchain-core 0.0.13 which is incompatible.

Importing libraries

[54]
[42]

Embeddings

[4]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:89: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]
config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]
README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]
sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]
model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]
tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]
vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]
1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Chunk the data

[5]

LanceDB connection

[6]

Load Eminem Lyrics Dataset

Convert it to a LangChain Documents

[7]
Downloading data:   0%|          | 0.00/2.47M [00:00<?, ?B/s]
Generating train split:   0%|          | 0/1285 [00:00<?, ? examples/s]

Method 1: Retrieve Parent Document

When you run a Query, match it against the smaller chunks and retrieve it's Parent Document for passing to LLM as a context

[8]
[9]
[10]
You are a true Queen, and I mean that in every sense of the word. I will never forget the opportunities you have given me. You will always be in my heart, my thoughts, and my prayers. As I have said before, you have no idea how much your son and his music has inspired, not only the Hip Hop world, but, speaking for myself, has inspired my whole career. He was, and still is, the true definition of a Soldier. When I was feeling at my worst; I knew I could put that 2Pac tape in, and suddenly, things werent so
[11]
Letter to Tupac’s Mother LyricsDear Afeni
Sorry if it looks a little sloppy, I couldve done a little better if I had the right pencils. Instead, I had to draw it in pen. Plus, I just kind of thought of the idea a little too late. But Ive been drawing since I was 10, and I thought you might like it. Anyways, thank you for always being so kind to me. You are a true Queen, and I mean that in every sense of the word. I will never forget the opportunities you have given me. You will always be in my heart, my thoughts, and my prayers. As I have said before, you have no idea how much your son and his music has inspired, not only the Hip Hop world, but, speaking for myself, has inspired my whole career. He was, and still is, the true definition of a Soldier. When I was feeling at my worst; I knew I could put that 2Pac tape in, and suddenly, things werent so bad. He gave me the courage to stand up and say Fk the world! This is who I am! And if you dont like it, go fk yourself! Thank you for giving us his spirit, and yours! God Bless you!
Love
Marshall17Embed

Method 2: Retrieving Larger chunks

If, small chunks are not needed as they don't have the whole context BUT the full documents are too big to be needing or fitting into LLM, we split the raw documents into larger chunks, and then split it into smaller chunks. Then index the smaller chunks, but on retrieval we retrieve the larger chunks as a replacement of full documents.

[12]
[13]
Letter to Tupac’s Mother LyricsDear Afeni
Sorry if it looks a little sloppy, I couldve done a little better if I had the right pencils. Instead, I had to draw it in pen. Plus, I just kind of thought of the idea a little too late. But Ive been drawing since I was 10, and I thought you might like it. Anyways, thank you for always being so kind to me. You are a true Queen, and I mean that in every sense of the word. I will never forget the opportunities you have given me. You will always be in my heart, my thoughts, and my prayers. As I have said before, you have no idea how much your son and his music has inspired, not only the Hip Hop world, but, speaking for myself, has inspired my whole career. He was, and still is, the true definition of a Soldier. When I was feeling at my worst; I knew I could put that 2Pac tape in, and suddenly, things werent so bad. He gave me the courage to stand up and say Fk the world! This is who I am! And if you dont like it, go fk yourself! Thank you for giving us his spirit, and yours! God Bless you!
Love
Marshall17Embed

Dummy LLM Use

[57]
/usr/local/lib/python3.10/dist-packages/langchain/llms/openai.py:244: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/langchain/llms/openai.py:1043: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
[58]
[61]
'Em refers to Eminem, the rapper. His real name is Marshall Mathers, as indicated in the "Letter to Tupac’s Mother Lyrics". Eminem often refers to himself as "Em" in his music.'
[ ]