Main

agentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learningHybrid_search_bm25_lancedbgpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Hybrid Search

BM25 is a sophisticated ranking function used in information retrieval. Acting like a highly efficient librarian, it excels in navigating through extensive collections of documents. Its effectiveness lies in term Frequency: Evaluating how often search terms appear in each document. Document Length Normalization: Ensuring a fair chance for both short and long documents in search results. Bias-Free Information Retrieval: Ideal for large data sets where unbiased results are critical. About LanceDB (VectorDB) LanceDB extends our search capabilities beyond mere keyword matching. It brings in a layer of contextual understanding, interpreting the semantics of search queries to provide results that align with the intended meaning

Hybrid Search Approach - Our hybrid search system synergizes BM25's keyword-focused precision with LanceDB's semantic understanding. This duo delivers nuanced, comprehensive search results, perfect for complex and varied datasets.

Installing all the dependencies

[5]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 27.3 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 43.2 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.5/409.5 kB 19.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 70.0 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.5/49.5 kB 3.1 MB/s eta 0:00:00
[2]

OpenSource Models

https://github.com/lancedb/vectordb-recipes/blob/main/tutorials/chatbot_using_Llama2_&_lanceDB

You can also compare your results with normal retriever vs ensemble retriever

Hybrid Search

BM25 Retriever - Sparse retriever

Embeddings - Dense retrievers Lancedb

Hybrid search = Sparse + Dense retriever

Load the data

[3]
--2024-11-24 07:35:55--  https://pdf.usaid.gov/pdf_docs/PA00TBCT.pdf
Resolving pdf.usaid.gov (pdf.usaid.gov)... 96.17.46.187, 2600:1408:7:1b8::1923, 2600:1408:7:1b4::1923
Connecting to pdf.usaid.gov (pdf.usaid.gov)|96.17.46.187|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6419525 (6.1M) [application/pdf]
Saving to: ‘PA00TBCT.pdf’

PA00TBCT.pdf        100%[===================>]   6.12M  --.-KB/s    in 0.1s    

2024-11-24 07:35:55 (52.6 MB/s) - ‘PA00TBCT.pdf’ saved [6419525/6419525]

[6]

Importing all the libraries

[7]

Initialize Embeddings

[8]
<ipython-input-8-4cb8ecc446d1>:2: LangChainDeprecationWarning: The class `OpenAIEmbeddings` was deprecated in LangChain 0.0.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import OpenAIEmbeddings``.
  embedding = OpenAIEmbeddings()

Initialize the BM25

[17]
type of bm25 <class 'langchain_community.retrievers.bm25.BM25Retriever'>

Initialize the database

[ ]

Instantiate the retriever

[24]

Query

[25]
[Document(metadata={'page': 46, 'source': '/content/PA00TBCT.pdf'}, page_content='Food and Nutrition Handbook for Extension Workers\n35\nNutrition\tfor\tbreastfeeding\tmothers\nNutritional requirements during breastfeeding are higher than during \npregnancy because the mother has to produce enough milk to sustain a \nbaby (bigger than the foetus) for the first six months and beyond. Breast-\nfeeding women need to eat a wide variety of foods.\nNutrition guidelines for pregnant women as well apply here but a \nlactating mother needs to eat much more; that is to say one extra meal \n(five meals in total).\nBreastfeeding mothers should also take a lot of fluids to cater for the \nhigh amounts of water used to make breast milk. They should avoid \nself-medication, smoking and alcohol to prevent intoxicating the baby.\nBreastfeeding mothers should avoid stress and have enough rest.\nKEY MESSAGES \n• Ensure that a pregnant mother has a balanced diet, with a vari-\nety of foods from the food groups, and has one additional meal \nin addition to the 3 meals she receives daily. The fourth meal \ncaters to her physiological needs.\n• Pregnant women should take iron and folate tablets daily in \naddition to foods rich in iron, calcium and vitamin A.\nKEY MESSAGES \n• Ensure that a breastfeeding mother take a balanced diet and in \naddition to 3 meals daily receives 2 extra meals a day to main-\ntain her health and that of her baby.\n• A pregnant woman and breastfeeding mother should eat a \nvariety of foods from the main food groups daily.'),
, Document(metadata={'page': 45, 'source': '/content/PA00TBCT.pdf'}, page_content='Food and Nutrition Handbook for Extension Workers\n34\nguidelines for selecting energy-giving foods, body-building foods \nand protective foods. Pregnant women especially need foods rich in \niron and vitamin A in addition to the balanced diet. Iron needs are \nhighly increased partly due to the need to build reserves for child \nup to six months after birth before initiating complementary food \nintake.\n• Pregnant women need to take foods rich in calcium, e.g., milk and \nmukene (silver fish) partly to take care of the increased requirement \nfor building the foetus skeletal structure.\n• Pregnant women have higher needs for nutrients generally and \nshould take snacks in between meals.\nIn addition, pregnant women should be educated to strictly observe the \nfollowing:\n1. Take the required amounts of iron and folic acid supplements to \nprevent anaemia.\n2. Sleep under an insecticide-treated mosquito net.\n3. Visit the nearest health facility at least four (4) times for antenatal \ncare. This will enable them access a number of services that prepare \nthem to deliver a healthy baby.\n4. Deliver in a healthy facility with the help of a skilled health worker.\n5. Get deworming pills, IPT and tetanus vaccine from a health facility.\n6. Avoid excessive workloads therefore community and family support \nmechanisms should be encouraged.\n7. Pregnant women should limit intake of alcohol, cigarettes. These \ncause negative effects on the foetus.\n8. Should strictly take drugs on advice of the health personnel as some \nof them are potentially harmful to the unborn child.\n9. Avoid negative cultural practices that reduce the intake of nutritious \nfoods or impact negatively on their health such as:\n• Not consuming chicken and eggs.\n• Pregnant women not defecating in toilets/pit latrines.'),
, Document(metadata={'source': '/content/PA00TBCT.pdf', 'page': 44}, page_content='Food and Nutrition Handbook for Extension Workers\n33\nCHAPTER FOUR\nESSENTIAL NUTRITION ACTIONS IN \nAGRICULTURE\nT\nhe Ministry of Agriculture, Animal Industry and Fisheries shares a \nrole in executing essential nutrition actions. Those areas where the \nministry of agriculture can contribute towards nutrition improvement \nare:\n• Promoting control of anaemia.\n• Promoting production and consumption of iron-rich foods.\n• Promoting production and consumption of vitamin A-rich foods.\n• Promoting consumption of iodized salt.\n• Promoting vitamin A supplementation.\n• Ensuring adequate intake of quality food for the household mem-\nbers.\n• Reduction of women workload in agriculture.\nTherefore, consistent with these actions, the Ministry is concerned with \nnutrition for pregnant mothers, breastfeeding mothers and children \nbelow five years.\nNutrition\tfor\tpregnant\twomen\nIt is necessary that a woman is well nourished before pregnancy so that \nby the time she conceives, the body has sufficient capacity to meet both \nher and the baby’s needs. A malnourished woman may fail to deliver \nbaby alive or if she does, the baby is likely to be underweight (the normal \nrange is 2.5–4.5 kg at birth). One of the leading causes of maternal death \nat childbirth is insufficient blood.\nDuring pregnancy women have high nutrient needs because they have \nto build foetus tissue, build reserves for breast milk and also cater for \ntheir own nutritional needs. On average women should gain 8 –12 kg in \nthe course of pregnancy. Pregnant women need to eat more food rather \nthan decrease the intake.\n• Pregnant women need to consume balanced diet following the'),
, Document(metadata={'source': '/content/PA00TBCT.pdf', 'page': 31}, page_content='Food and Nutrition Handbook for Extension Workers\n20\nPrevalence\tof\tmalnutrition\tin\tUganda\nMalnutrition is one of the main public health and economic and devel -\nopment problems facing Uganda. Children below the age of five years \nand women in reproductive age including pregnant women and lactating \nmothers are mostly affected (UDHS 2011). Children below the age of 5 \nyears suffer mostly from under nutrition with:\n• 33% of these children suffer from chronic undernutrition (they are \nstunted)\n• 14% are underweight (body weight too light for their age)\n• 49% suffer from iron deficiency anaemia (lack of iron/blood)\n• 60% suffer from different forms of iodine deficiency disorders (IDD) \nLikewise women in reproductive age (15–49 years) also suffer from \nmalnutrition:\n• 52% of pregnant women and lactating mothers have vitamin A defi-\nciency\n• 23% suffered from iron deficiency anaemia\n \nFigure\t1.\tSummary\tof\ttypes\tand\tcategories\tof\tmalnutrition')]

Ask questions on this retriever doc

[26]
<ipython-input-26-ad04468c4fb1>:9: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead.
  qa.run(query)
'Pregnant women need to consume a balanced diet with a variety of foods from the main food groups daily. They should include foods rich in iron, calcium, and vitamin A. Additionally, pregnant women should take iron and folate tablets daily, get adequate rest, avoid stress, and have regular antenatal care visits.'
[27]
'Foods that are needed for building strong bones and teeth include sources of calcium, magnesium, vitamin D, and fluoride. Calcium and vitamin D are essential for bone health, while magnesium plays a role in bone structure. Fluoride is important for tooth formation and preventing tooth decay. Sources of these nutrients include:\n\n- Calcium: milk and dairy products, fish eaten with bones, dark green vegetables.\n- Magnesium: legumes, whole-grain cereals, nuts, and dark-green vegetables.\n- Vitamin D: sun exposure, Vitamin D-fortified milk, eggs, fatty fish.\n- Fluoride: seafood, tea, coffee, soybeans, iodized salt.\n\nThese nutrients play crucial roles in building and maintaining strong bones and teeth.'

Bonus

FTS is another important feature for extracting all info .. if any one word is matching

Usecase : E-Commerce Product Search

Context: Customers searching for products on an e-commerce website.

Application: When a customer types a query (like "fitness t-shirt"), the system uses the ensemble retriever to find the most relevant products from the product descriptions. The BM25 component helps capture keyword-based matches, while the dense vector retriever (LanceDB) understands the semantic context of the query.

[28]
Collecting tantivy==0.20.1
  Downloading tantivy-0.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Downloading tantivy-0.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 32.2 MB/s eta 0:00:00
Installing collected packages: tantivy
Successfully installed tantivy-0.20.1
[29]
['Frodo was a happy puppy']