Main
Product Recommender using Collaborative Filtering and LanceDB
Collaborative filtering is a method to recommend movies by analyzing user preferences. It works by finding patterns in what users like. For example:
-
User-based filtering: If two users have similar tastes, movies liked by one can be suggested to the other.
-
Item-based filtering: If two movies are often liked together, recommending one suggests the other.
This approach uses past data, like movie ratings, to predict what someone might enjoy.

In this example, we’ll use LanceDB and Collaborative Filtering to recommend products based on a user's purchase history. The data comes from the Instacart dataset.
Download dataset from Kaggle
To downloading dataset in this example, you must have a Kaggle account. To get the Kaggle API credentials,
Go to the Your Profile -> Settings -> Create Token
This will download kaggle.json, a file containing your API credentials.
Upload Kaggle credentials kaggle.json in Google Colab, run the snippet below.
Requirement already satisfied: kaggle in /usr/local/lib/python3.10/dist-packages (1.5.16) Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.10/dist-packages (from kaggle) (1.16.0) Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from kaggle) (2024.2.2) Requirement already satisfied: python-dateutil in /usr/local/lib/python3.10/dist-packages (from kaggle) (2.8.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from kaggle) (2.31.0) Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from kaggle) (4.66.2) Requirement already satisfied: python-slugify in /usr/local/lib/python3.10/dist-packages (from kaggle) (8.0.4) Requirement already satisfied: urllib3 in /usr/local/lib/python3.10/dist-packages (from kaggle) (2.0.7) Requirement already satisfied: bleach in /usr/local/lib/python3.10/dist-packages (from kaggle) (6.1.0) Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach->kaggle) (0.5.1) Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.10/dist-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->kaggle) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->kaggle) (3.6)
Install dependencies
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.25.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (1.5.3)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.4)
Collecting implicit
Downloading implicit-0.7.2-cp310-cp310-manylinux2014_x86_64.whl (8.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9/8.9 MB 15.5 MB/s eta 0:00:00
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (2.1.0+cu121)
Collecting lancedb
Downloading lancedb-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 20.1 MB/s eta 0:00:00
Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2023.4)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from implicit) (4.66.2)
Requirement already satisfied: threadpoolctl in /usr/local/lib/python3.10/dist-packages (from implicit) (3.3.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch) (4.10.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch) (2023.6.0)
Requirement already satisfied: triton==2.1.0 in /usr/local/lib/python3.10/dist-packages (from torch) (2.1.0)
Collecting deprecation (from lancedb)
Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Collecting pylance==0.10.1 (from lancedb)
Downloading pylance-0.10.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.5/21.5 MB 28.9 MB/s eta 0:00:00
Collecting ratelimiter~=1.0 (from lancedb)
Downloading ratelimiter-1.2.0.post0-py3-none-any.whl (6.6 kB)
Collecting retry>=0.9.2 (from lancedb)
Downloading retry-0.9.2-py2.py3-none-any.whl (8.0 kB)
Requirement already satisfied: pydantic>=1.10 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.6.3)
Requirement already satisfied: attrs>=21.3.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (23.2.0)
Collecting semver>=3.0 (from lancedb)
Downloading semver-3.0.2-py3-none-any.whl (17 kB)
Requirement already satisfied: cachetools in /usr/local/lib/python3.10/dist-packages (from lancedb) (5.3.3)
Requirement already satisfied: pyyaml>=6.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (6.0.1)
Requirement already satisfied: click>=8.1.7 in /usr/local/lib/python3.10/dist-packages (from lancedb) (8.1.7)
Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from lancedb) (2.31.0)
Collecting overrides>=0.7 (from lancedb)
Downloading overrides-7.7.0-py3-none-any.whl (17 kB)
Requirement already satisfied: pyarrow>=12 in /usr/local/lib/python3.10/dist-packages (from pylance==0.10.1->lancedb) (14.0.2)
Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (0.6.0)
Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic>=1.10->lancedb) (2.16.3)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->lancedb) (2024.2.2)
Requirement already satisfied: decorator>=3.4.2 in /usr/local/lib/python3.10/dist-packages (from retry>=0.9.2->lancedb) (4.4.2)
Collecting py<2.0.0,>=1.4.26 (from retry>=0.9.2->lancedb)
Downloading py-1.11.0-py2.py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.7/98.7 kB 13.8 MB/s eta 0:00:00
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from deprecation->lancedb) (23.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch) (1.3.0)
Installing collected packages: ratelimiter, semver, py, overrides, deprecation, retry, pylance, implicit, lancedb
Successfully installed deprecation-2.1.0 implicit-0.7.2 lancedb-0.6.1 overrides-7.7.0 py-1.11.0 pylance-0.10.1 ratelimiter-1.2.0.post0 retry-0.9.2 semver-3.0.2
Importing libraries
Load the dataset
Now to download datasets, You need to get into competition of the instacart-market-basket-analysis competition, which you can do so here.
Downloading instacart-market-basket-analysis.zip to /content 93% 183M/196M [00:01<00:00, 115MB/s] 100% 196M/196M [00:01<00:00, 118MB/s]
We must now extract the zip files.
Now we can move on to loading the dataset. We'll first read the csv files and create dataframes.
Since there isn't a user rating attribute, we'll gather "confidence" data by looking at the frequency of each item purchased by a user, and store this in the data dataframe.
Data Manipulation
Let's create a couple of test users to examine the recommendations later:
- 1st test user: buys 50 sodas: Zero Calorie Cola
- 2nd test user: buys organic produce: Organic Whole Milk and Organic Blackberries
In the next step, we will extract user and product unique ids, in order to create a CSR (Compressed Sparse Row) matrix. This will allow us to perform collaborative filtering.
Let's now create a recommender model using the implicit library. The recommendation model is based off the algorithms described in the paper Collaborative Filtering for Implicit Feedback Datasets with performance optimizations described in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.
Difference between Collaborative and Content-based filtering

/usr/local/lib/python3.10/dist-packages/implicit/cpu/als.py:95: RuntimeWarning: OpenBLAS is configured to use 2 threads. It is highly recommended to disable its internal threadpool by setting the environment variable 'OPENBLAS_NUM_THREADS=1' or by calling 'threadpoolctl.threadpool_limits(1, "blas")'. Having OpenBLAS use a threadpool can lead to severe performance issues here. check_blas_config()
0%| | 0/50 [00:00<?, ?it/s]
Let's now evaluate the model.
0%| | 0/192999 [00:00<?, ?it/s]
{'precision': 0.27477883977578244,
, 'map': 0.04505803167409894,
, 'ndcg': 0.14491547666623716,
, 'auc': 0.6550619166364096} From the model, we'll be able to retrieve item and user factors, which we can use later on to store in LanceDB as vector embeddings.
array([[-0.01073535, 0.01225309, 0.00282226, -0.00914562, 0.01481111, , 0.00767373, -0.00427731, 0.0056481 , 0.00795351, 0.00424179, , -0.00455681, -0.00175643, -0.00220297, -0.0138361 , -0.00829704, , -0.00559029, -0.01200527, 0.00596893, 0.00808288, -0.01018421, , 0.01595827, 0.00867552, 0.02999683, 0.00679287, 0.00992141, , 0.01169722, 0.00303244, 0.00791476, 0.01493086, -0.00200432, , 0.00475327, 0.01365075, -0.00702923, 0.00941817, 0.00221444, , 0.00278489, 0.01576312, 0.00883053, 0.00070464, 0.00061513, , -0.00012623, 0.00052815, 0.01637699, 0.00285431, 0.01877954, , 0.01524585, -0.00794455, 0.01723802, 0.00804117, 0.00352978, , 0.01410676, -0.00625158, -0.00453345, 0.02724608, 0.01960974, , -0.01250265, 0.01295316, -0.00220814, 0.01525659, 0.02175995, , -0.00712163, 0.02181616, 0.00632107, 0.01416669, 0.00973109, , 0.00702811, -0.00343407, -0.01017761, 0.00894559, -0.01581176, , 0.00393035, 0.01568489, -0.00015587, 0.0087583 , 0.00432176, , 0.01403052, -0.01219444, 0.00682962, 0.03258877, 0.00117012, , 0.01065344, 0.01794718, -0.01094627, -0.00213753, -0.01522113, , 0.01338973, 0.01311625, -0.0051905 , -0.00040473, 0.0117563 , , 0.00617041, -0.00183781, 0.01292013, 0.01622365, 0.01600826, , 0.01047292, 0.00679411, 0.02034847, 0.00313357, 0.00643453, , 0.00643994, 0.0294148 , 0.00119474, 0.00918875, 0.00874455, , -0.00066996, -0.0076339 , 0.00600638, 0.02156091, 0.00289343, , 0.01004079, -0.00886633, 0.00642741, 0.01046264, 0.00393741, , 0.00406919, 0.01451393, -0.00050027, 0.01081037, -0.00308605, , 0.0040453 , 0.00611117, 0.01038004, -0.00762702, 0.00672026, , 0.00491092, 0.00185958, -0.00262321], , [ 0.00545662, 0.007053 , 0.00540562, 0.00373609, 0.00635536, , 0.00629239, 0.00513481, 0.00277516, 0.007132 , 0.00724808, , 0.00476382, 0.0071835 , 0.00610066, 0.00605023, 0.00480638, , 0.00770767, 0.00343371, 0.00492609, 0.00286885, 0.00230649, , 0.00343586, 0.00512864, 0.00704206, 0.00227453, 0.00775074, , 0.00259635, 0.00464828, 0.00654242, 0.00264723, 0.00269244, , 0.00486744, 0.00405304, 0.0053956 , 0.00702862, 0.00516442, , 0.00619653, 0.00276694, 0.00035582, 0.00418825, 0.00154351, , 0.00676016, 0.00293786, 0.00331635, 0.00074961, 0.00679756, , 0.00322292, 0.00703768, 0.0019903 , 0.00339576, 0.00558988, , 0.00578342, 0.00551919, 0.00649765, 0.00622123, 0.00219081, , 0.00116638, 0.00816363, 0.0051754 , 0.00474575, 0.00373885, , 0.00484177, 0.00307221, 0.00550832, 0.00405297, 0.00600216, , 0.00068458, 0.00389447, 0.00340401, 0.00041786, 0.00438944, , 0.00359013, 0.00517367, 0.00413423, 0.0033591 , 0.00573929, , 0.00269938, 0.00455329, 0.00603866, 0.00790164, 0.00580972, , 0.00060218, 0.00565166, 0.00748183, 0.00426076, 0.00486007, , 0.00501308, 0.00768831, 0.00909834, 0.00239457, 0.00698307, , 0.00221974, 0.00474268, 0.00050845, 0.00146767, 0.00812766, , -0.00106332, 0.00576758, 0.00434267, 0.00688091, 0.00063075, , 0.00535236, 0.00246389, 0.00355543, 0.00545268, 0.00545283, , 0.00351201, 0.00507428, 0.00600283, 0.0009795 , 0.00358418, , 0.00566337, 0.00459488, 0.00394963, 0.00848473, 0.00374577, , -0.00012899, 0.00295235, 0.00417557, 0.00134743, 0.00116836, , 0.00667214, 0.00117854, 0.0023317 , 0.00432837, 0.00205162, , 0.00543584, 0.00155425, 0.00754672]], dtype=float32)
array([[ 2.35114765e+00, -9.82077837e-01, 9.20681953e-02, , -1.55748022e+00, 2.61008650e-01, 1.38084328e+00, , -1.04197145e+00, 2.08925948e-01, 1.45271456e+00, , -4.09525931e-01, -2.79641271e-01, -1.06512582e+00, , -2.45185947e+00, -8.88424039e-01, -9.62235093e-01, , -3.62847820e-02, -9.97323275e-01, 3.57037872e-01, , 1.39508307e-01, -7.77906895e-01, -3.02864462e-01, , -2.49430239e-01, 2.07240963e+00, -1.16224551e+00, , 7.26323247e-01, 1.34066701e-01, -1.00640464e+00, , 6.03325069e-02, 1.24448466e+00, 3.97046000e-01, , -1.01987794e-01, -2.13813528e-01, -5.79491258e-02, , -3.17022443e-01, 7.47085869e-01, 1.62657106e+00, , 9.75901306e-01, 1.17893267e+00, -6.45162404e-01, , -1.40145004e+00, -6.50845766e-01, 4.65424120e-01, , 1.01861715e+00, 1.16076279e+00, 7.42953658e-01, , -5.01821935e-01, 4.48503673e-01, 3.03975850e-01, , -8.14426184e-01, -5.65647744e-02, 5.86561143e-01, , -3.05516303e-01, -1.21209860e+00, -4.88223583e-01, , 5.93207955e-01, -7.97120512e-01, 3.37936103e-01, , -1.40010929e+00, -5.07596850e-01, 1.20076036e+00, , 9.60147753e-02, -7.36100137e-01, 7.32163787e-01, , -6.26076534e-02, -9.86503780e-01, 1.08208275e+00, , 2.48168632e-01, -1.40475631e+00, -1.70012355e+00, , -8.03964674e-01, -4.82192487e-02, 2.58276653e+00, , -6.63681030e-01, 6.28947258e-01, -1.30332559e-01, , 6.26426578e-01, -7.09159493e-01, -2.51678526e-01, , 3.70608002e-01, 6.90244198e-01, 1.52901638e+00, , -9.07164812e-01, -3.33825918e-03, 2.82642663e-01, , -1.56681025e+00, -7.89902925e-01, -1.48571885e+00, , 4.32960272e-01, -3.47612590e-01, 2.16205135e-01, , 1.89403951e+00, -7.34427869e-01, 1.24272621e+00, , 8.83789957e-01, -8.86934042e-01, 2.14222240e+00, , 1.24191558e+00, 2.07501030e+00, -1.30105615e+00, , 1.14052501e-02, 1.34931052e+00, 1.88309300e+00, , -1.72559297e+00, -3.85144413e-01, 2.95971125e-01, , -8.28353167e-01, -6.39615953e-02, 1.42373240e+00, , 2.33709216e+00, 4.29843925e-02, 1.47847342e+00, , -2.92032450e-01, 6.43620074e-01, 8.92000616e-01, , -3.62094373e-01, 1.07280612e+00, -2.14163110e-01, , -1.21664122e-01, 8.64231884e-01, -1.27431108e-02, , -2.09421575e-01, 6.34409264e-02, -7.02818632e-01, , -4.97576185e-02, -1.50734171e-01, 2.71373838e-01, , -7.60752439e-01, -2.56484568e-01], , [-2.70342708e-01, 8.88925731e-01, 7.41030201e-02, , 2.24988461e+00, -4.16443706e-01, -6.09414756e-01, , -6.63036764e-01, -1.03103137e+00, -1.12276042e+00, , -1.73997521e+00, -1.05744338e+00, -3.40162873e-01, , -4.80260178e-02, -1.28994131e+00, -9.22097385e-01, , 2.52364874e-01, 3.80463481e-01, -2.41020039e-01, , -1.05217624e+00, 4.85703856e-01, -2.21715212e-01, , -5.14087617e-01, -9.42420840e-01, 7.15354204e-01, , -6.49898648e-01, 2.98441458e+00, 5.90562761e-01, , 1.27846611e+00, 7.21186638e-01, 4.63127196e-01, , -2.18219861e-01, -1.13364458e+00, 8.96203935e-01, , 3.13969404e-01, -1.23078191e+00, 1.81982982e+00, , 1.67659032e+00, 9.17877018e-01, -8.09818059e-02, , -8.91748905e-01, -3.56716752e-01, -5.39918005e-01, , 1.46798015e+00, -7.61051416e-01, -1.02508759e+00, , -6.00555420e-01, -5.49519420e-01, -4.13337052e-01, , -2.15971828e+00, -7.64563233e-02, -1.52905011e+00, , -7.08452106e-01, -2.03598022e+00, -9.20440614e-01, , 1.53826848e-01, 1.56537902e+00, -1.45322108e+00, , 2.59730071e-01, 2.66617507e-01, -3.77679914e-01, , 3.37540567e-01, -4.00173254e-02, 8.33883584e-01, , 8.45754921e-01, 1.39245242e-01, 9.91499722e-01, , 4.64247793e-01, -3.97137880e-01, -1.03083467e+00, , -1.72587514e+00, -4.60681379e-01, 1.62118340e+00, , 3.89182389e-01, -9.17263985e-01, -1.27384162e+00, , 1.91881967e+00, 1.76994383e+00, 7.85243988e-01, , -1.10948071e-01, 1.41002858e+00, 2.25326085e+00, , -6.71080649e-01, 6.25545800e-01, -6.13183640e-02, , 5.39246261e-01, 8.63722503e-01, 1.46043479e-01, , -7.11409628e-01, 3.97266221e+00, 2.32369137e+00, , 2.12601995e+00, -1.27442431e+00, 5.20430267e-01, , -2.87687361e-01, -2.77719474e+00, 4.49669933e+00, , -7.76941776e-02, -1.42210579e+00, 1.07571304e+00, , 2.24175000e+00, 1.94092798e+00, -4.92816478e-01, , 1.43253422e+00, -2.90138405e-02, 1.13699532e+00, , 1.20133042e+00, -3.55294824e-01, 2.76309562e+00, , 2.45419478e+00, -6.42450869e-01, -2.90709686e+00, , 1.28045070e+00, -3.66204560e-01, -4.94375974e-01, , -2.83194995e+00, -8.55712235e-01, 3.17946784e-02, , -6.97229877e-02, 1.12658954e+00, 1.04045498e+00, , -7.16470957e-01, -5.65994203e-01, -1.13983297e+00, , 2.50437784e+00, 3.62668425e-01, 1.46130455e+00, , -9.03123736e-01, -3.25637698e-01]], dtype=float32)
Let's save the data and create a empty LanceDB Table using a Pydantic model
A Table is designed to store large numbers of columns and huge quantities of data! For those interested, a LanceDB is columnar-based, and uses Lance, an open data format to store data.
Let's now store our item factors into the table via the vector column of product_entries.
Let's create an ANN index in order to speed up retrieval. This might take a while.
This is a helper method for analysing recommendations later. This method returns top N products that someone bought in the past (based on product quantity).
Let's retrieve our test users so we can query for recommendations.