Main
Food Recommendation System
Overview
This project is a vector-based food recommendation system utilizing LanceDB for full-text search (FTS), hybrid search, and vector search. It integrates the reranker model to enhance search results and provide accurate food recommendations.
Features
- Vector-Based Recommendations: Utilizes advanced vector search to find similar food items.
- Full-Text Search (FTS): Enables efficient searching of food items based on text descriptions.
- Hybrid Search: Combines both vector search and full-text search for comprehensive results.
- Jina Reranker Model: Improves search result accuracy by reranking models.
Install required dependencies
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.2.2) Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.26.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0) Collecting lancedb Downloading lancedb-0.18.0-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (4.0 kB) Collecting deprecation (from lancedb) Downloading deprecation-2.1.0-py2.py3-none-any.whl.metadata (4.6 kB) Collecting pylance==0.22.0 (from lancedb) Downloading pylance-0.22.0-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (7.2 kB) Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (4.67.1) Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.10.4) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from lancedb) (24.2) Collecting overrides>=0.7 (from lancedb) Downloading overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB) Requirement already satisfied: pyarrow>=14 in /usr/local/lib/python3.10/dist-packages (from pylance==0.22.0->lancedb) (17.0.0) Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from pylance==0.22.0->lancedb) (1.26.4) Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (0.7.0) Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (2.27.2) Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (4.12.2) Downloading lancedb-0.18.0-cp39-abi3-manylinux_2_28_x86_64.whl (32.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32.2/32.2 MB 29.4 MB/s eta 0:00:00 Downloading pylance-0.22.0-cp39-abi3-manylinux_2_28_x86_64.whl (38.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.3/38.3 MB 16.8 MB/s eta 0:00:00 Downloading overrides-7.7.0-py3-none-any.whl (17 kB) Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB) Installing collected packages: overrides, deprecation, pylance, lancedb Successfully installed deprecation-2.1.0 lancedb-0.18.0 overrides-7.7.0 pylance-0.22.0
Requirement already satisfied: tantivy in /usr/local/lib/python3.10/dist-packages (0.22.0) Collecting rerankers Downloading rerankers-0.6.1-py3-none-any.whl.metadata (29 kB) Requirement already satisfied: pydantic in /usr/local/lib/python3.10/dist-packages (from rerankers) (2.10.4) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from rerankers) (4.67.1) Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic->rerankers) (0.7.0) Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.10/dist-packages (from pydantic->rerankers) (2.27.2) Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.10/dist-packages (from pydantic->rerankers) (4.12.2) Downloading rerankers-0.6.1-py3-none-any.whl (41 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.5/41.5 kB 1.9 MB/s eta 0:00:00 Installing collected packages: rerankers Successfully installed rerankers-0.6.1
Download Data
For this notebook walkthrough, we will use food recommendation data from Kaggle. You can download the dataset from the following link:
Download the food recommendation data from Kaggle
https://www.kaggle.com/datasets/schemersays/food-recommendation-system
Data Preprocessing
'peri peri chicken satay Snack non-veg: boneless skinless chicken thigh (trimmed), salt and pepper, yogurt, chilli powder, ginger garlic paste, coriander leaves, oil to fry, peri peri sauce, potato fries'
To improve accuracy, we should include both numerical and string representations of ratings. First, add a new column, rating_str, containing the string values for each rating. Then, append both the numerical and string ratings to the text column. This approach increases the chances of achieving better accuracy. this kind of trick exp you need to do for improving your accuracy
Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0) No device set Using device cpu No dtype set Using dtype torch.float32 Loading model colbert-ir/colbertv2.0, this might take a while...
tokenizer_config.json: 0%| | 0.00/405 [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/466k [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/112 [00:00<?, ?B/s]
config.json: 0%| | 0.00/743 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/438M [00:00<?, ?B/s]
Linear Dim set to: 128 for downcasting
Food_ID Name C_Type Veg_Non Rating 0 303 red rice Healthy Food veg 6 1 10 broccoli and almond soup Healthy Food veg 6 2 10 broccoli and almond soup Healthy Food veg 6 3 36 spicy watermelon soup Healthy Food veg 6
Food_ID Name C_Type Veg_Non \ 0 87 roasted spring chicken with root veggies Healthy Food non-veg 1 247 microwave chicken steak Healthy Food non-veg 2 86 roast turkey with cranberry sauce Healthy Food non-veg 3 86 roast turkey with cranberry sauce Healthy Food non-veg Rating 0 8 1 5 2 4 3 4
Food_ID Name C_Type Veg_Non Rating 0 292 chicken tikka Indian non-veg 8 1 69 banana and maple ice lollies Dessert veg 8 2 232 apple and walnut cake Dessert veg 8 3 81 fruit infused tea Beverage veg 8
Food_ID Name C_Type \ 0 142 fish skewers with coriander and red wine vineg... Thai 1 185 red wine braised mushroom flatbread Italian 2 85 garlic and pinenut soup with burnt butter essence French 3 85 garlic and pinenut soup with burnt butter essence French Veg_Non Rating 0 non-veg 6 1 veg 7 2 veg 3 3 veg 10
Food_ID Name C_Type \ 0 303 red rice Healthy Food 1 10 broccoli and almond soup Healthy Food 2 36 spicy watermelon soup Healthy Food 3 221 amaranthus granola with lemon yogurt, berries ... Healthy Food Veg_Non Rating 0 veg 6 1 veg 6 2 veg 6 3 veg 6
Food_ID Name C_Type Veg_Non Rating 0 270 jalapeno cheese fingers Mexican veg 3 1 270 jalapeno cheese fingers Mexican veg 5 2 301 brown rice Healthy Food veg 1 3 300 black rice Healthy Food veg 9
Food_ID Name C_Type Veg_Non Rating 0 93 buldak (hot and spicy chicken) Japanese non-veg 7 1 100 spicy chicken curry Indian non-veg 3 2 100 spicy chicken curry Indian non-veg 4 3 100 spicy chicken curry Indian non-veg 1
Food_ID Name C_Type Veg_Non Rating 0 83 spiced coffee Beverage veg 9 1 84 filter coffee Beverage veg 10 2 84 filter coffee Beverage veg 10 3 84 filter coffee Beverage veg 2
Food_ID Name C_Type Veg_Non Rating 0 162 prawn potato soup Thai veg 9 1 79 beetroot and green apple soup Healthy Food veg 1 2 302 koldil chicken Chinese non-veg 5 3 298 chicken 65 Chinese non-veg 4
Due to limited data, there may be instances where mixed results are returned, especially with a recommendation limit set to 4. The key to achieving better results lies in how you prepare your text data and optimize various hyperparameters, such as query types (hybrid, FTS, vector search). Additionally, experiment with different reranker methods. For further improvements, refer to our vector recipe repository for enhancing RAG methods and consult the LanceDB documentation for more details. docs: https://lancedb.github.io/lancedb/search/ more such genai projects:https://github.com/lancedb/vectordb-recipes