Main

archived_examplesagentsllmsvector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learningFood_recommendationgpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Food Recommendation System

Overview

This project is a vector-based food recommendation system utilizing LanceDB for full-text search (FTS), hybrid search, and vector search. It integrates the reranker model to enhance search results and provide accurate food recommendations.

Features

  • Vector-Based Recommendations: Utilizes advanced vector search to find similar food items.
  • Full-Text Search (FTS): Enables efficient searching of food items based on text descriptions.
  • Hybrid Search: Combines both vector search and full-text search for comprehensive results.
  • Jina Reranker Model: Improves search result accuracy by reranking models.

Install required dependencies

[1]
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.2.2)
Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.2)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)
Collecting lancedb
  Downloading lancedb-0.18.0-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (4.0 kB)
Collecting deprecation (from lancedb)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl.metadata (4.6 kB)
Collecting pylance==0.22.0 (from lancedb)
  Downloading pylance-0.22.0-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (7.2 kB)
Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (4.67.1)
Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.10.4)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from lancedb) (24.2)
Collecting overrides>=0.7 (from lancedb)
  Downloading overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: pyarrow>=14 in /usr/local/lib/python3.10/dist-packages (from pylance==0.22.0->lancedb) (17.0.0)
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from pylance==0.22.0->lancedb) (1.26.4)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (2.27.2)
Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (4.12.2)
Downloading lancedb-0.18.0-cp39-abi3-manylinux_2_28_x86_64.whl (32.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32.2/32.2 MB 29.4 MB/s eta 0:00:00
Downloading pylance-0.22.0-cp39-abi3-manylinux_2_28_x86_64.whl (38.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.3/38.3 MB 16.8 MB/s eta 0:00:00
Downloading overrides-7.7.0-py3-none-any.whl (17 kB)
Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: overrides, deprecation, pylance, lancedb
Successfully installed deprecation-2.1.0 lancedb-0.18.0 overrides-7.7.0 pylance-0.22.0
[21]
Requirement already satisfied: tantivy in /usr/local/lib/python3.10/dist-packages (0.22.0)
Collecting rerankers
  Downloading rerankers-0.6.1-py3-none-any.whl.metadata (29 kB)
Requirement already satisfied: pydantic in /usr/local/lib/python3.10/dist-packages (from rerankers) (2.10.4)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from rerankers) (4.67.1)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic->rerankers) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.10/dist-packages (from pydantic->rerankers) (2.27.2)
Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.10/dist-packages (from pydantic->rerankers) (4.12.2)
Downloading rerankers-0.6.1-py3-none-any.whl (41 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.5/41.5 kB 1.9 MB/s eta 0:00:00
Installing collected packages: rerankers
Successfully installed rerankers-0.6.1

Download Data

For this notebook walkthrough, we will use food recommendation data from Kaggle. You can download the dataset from the following link:

Download the food recommendation data from Kaggle

https://www.kaggle.com/datasets/schemersays/food-recommendation-system

[ ]
[2]
[3]
[4]
[5]

Data Preprocessing

[6]
[7]
'peri peri chicken satay Snack non-veg: boneless skinless chicken thigh (trimmed), salt and pepper, yogurt, chilli powder, ginger garlic paste, coriander leaves, oil to fry, peri peri sauce, potato fries'
[8]

To improve accuracy, we should include both numerical and string representations of ratings. First, add a new column, rating_str, containing the string values for each rating. Then, append both the numerical and string ratings to the text column. This approach increases the chances of achieving better accuracy. this kind of trick exp you need to do for improving your accuracy

[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[22]
Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0)
No device set
Using device cpu
No dtype set
Using dtype torch.float32
Loading model colbert-ir/colbertv2.0, this might take a while...
tokenizer_config.json:   0%|          | 0.00/405 [00:00<?, ?B/s]
vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]
tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]
special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]
config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]
model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]
Linear Dim set to: 128 for downcasting
[23]
  Food_ID                      Name        C_Type Veg_Non Rating
0     303                  red rice  Healthy Food     veg      6
1      10  broccoli and almond soup  Healthy Food     veg      6
2      10  broccoli and almond soup  Healthy Food     veg      6
3      36     spicy watermelon soup  Healthy Food     veg      6
[24]
  Food_ID                                      Name        C_Type  Veg_Non  \
0      87  roasted spring chicken with root veggies  Healthy Food  non-veg   
1     247                   microwave chicken steak  Healthy Food  non-veg   
2      86         roast turkey with cranberry sauce  Healthy Food  non-veg   
3      86         roast turkey with cranberry sauce  Healthy Food  non-veg   

  Rating  
0      8  
1      5  
2      4  
3      4  
[25]
  Food_ID                          Name    C_Type  Veg_Non Rating
0     292                 chicken tikka    Indian  non-veg      8
1      69  banana and maple ice lollies   Dessert      veg      8
2     232         apple and walnut cake   Dessert      veg      8
3      81             fruit infused tea  Beverage      veg      8
[26]
  Food_ID                                               Name   C_Type  \
0     142  fish skewers with coriander and red wine vineg...     Thai   
1     185                red wine braised mushroom flatbread  Italian   
2      85  garlic and pinenut soup with burnt butter essence   French   
3      85  garlic and pinenut soup with burnt butter essence   French   

   Veg_Non Rating  
0  non-veg      6  
1      veg      7  
2      veg      3  
3      veg     10  
[27]
  Food_ID                                               Name        C_Type  \
0     303                                           red rice  Healthy Food   
1      10                           broccoli and almond soup  Healthy Food   
2      36                              spicy watermelon soup  Healthy Food   
3     221  amaranthus granola with lemon yogurt, berries ...  Healthy Food   

  Veg_Non Rating  
0     veg      6  
1     veg      6  
2     veg      6  
3     veg      6  
[28]
  Food_ID                     Name        C_Type Veg_Non Rating
0     270  jalapeno cheese fingers       Mexican     veg      3
1     270  jalapeno cheese fingers       Mexican     veg      5
2     301               brown rice  Healthy Food     veg      1
3     300               black rice  Healthy Food     veg      9
[29]
  Food_ID                            Name    C_Type  Veg_Non Rating
0      93  buldak (hot and spicy chicken)  Japanese  non-veg      7
1     100             spicy chicken curry    Indian  non-veg      3
2     100             spicy chicken curry    Indian  non-veg      4
3     100             spicy chicken curry    Indian  non-veg      1
[30]
  Food_ID           Name    C_Type Veg_Non Rating
0      83  spiced coffee  Beverage     veg      9
1      84  filter coffee  Beverage     veg     10
2      84  filter coffee  Beverage     veg     10
3      84  filter coffee  Beverage     veg      2
[31]
  Food_ID                           Name        C_Type  Veg_Non Rating
0     162              prawn potato soup          Thai      veg      9
1      79  beetroot and green apple soup  Healthy Food      veg      1
2     302                 koldil chicken       Chinese  non-veg      5
3     298                     chicken 65       Chinese  non-veg      4

Due to limited data, there may be instances where mixed results are returned, especially with a recommendation limit set to 4. The key to achieving better results lies in how you prepare your text data and optimize various hyperparameters, such as query types (hybrid, FTS, vector search). Additionally, experiment with different reranker methods. For further improvements, refer to our vector recipe repository for enhancing RAG methods and consult the LanceDB documentation for more details. docs: https://lancedb.github.io/lancedb/search/ more such genai projects:https://github.com/lancedb/vectordb-recipes