LanceDB Main

Main

agentsllmsmovie-recommendervector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

alph-notebooks/lancedb-recipes / main.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Movie Recommender with Collaborative Filtering

Collaborative filtering is a method to recommend movies by analyzing user preferences. It works by finding patterns in what users like. For example:

User-based filtering: If two users have similar tastes, movies liked by one can be suggested to the other.
Item-based filtering: If two movies are often liked together, recommending one suggests the other.

This approach uses past data, like movie ratings, to predict what someone might enjoy.

Collaborative Filtering

Collaborative filtering is a key method used in recommendation systems to predict user preferences based on the preferences of others. It assumes that users with similar past preferences are likely to share similar future preferences. To make accurate predictions, we use Singular Value Decomposition (SVD), a powerful matrix factorization technique. For our movie recommender, we’ll use Numpy’s implementation of SVD.

This example demonstrates how to create a Movie Recommender System using SVD, a collaborative filtering technique. It utilizes the MovieLens Latest Small Dataset.

Let's start by installing and importing the libraries

[ ]

Load and unpack the dataset

After unzipping the dataset from MovieLens, we'll load the user ratings into a dataframe ratings. Since we don't need timestamp, we can drop that column. We also have to prepare the columns in proper formats.

[ ]

We can now create a matrix of user ratings, where each row represents a user and each column represents a movie.

[ ]

Now, we apply the SVD. vh represents the orthogonal matrix that describes the relationship between columns of the original matrix. Because the

columns of vh correspond to the movies, we can flip it to get the movie embeddings on the rows.

[ ]

array([[-7.0449896e-02, -3.8539346e-02, -1.5912922e-02, ...,
,        -6.4683605e-05, -6.4683605e-05, -2.7172931e-04],
,       [ 2.7591195e-02,  2.0666271e-03,  2.4714615e-02, ...,
,        -5.9758622e-04, -5.9758622e-04, -1.2723620e-03],
,       [-7.8443885e-02, -5.6844711e-02, -1.8005114e-02, ...,
,         8.7109387e-05,  8.7109387e-05, -1.2283334e-04],
,       ...,
,       [ 2.0168955e-02,  1.3377998e-02, -1.9083519e-02, ...,
,        -1.9121233e-03, -1.9121233e-03, -2.1089672e-03],
,       [ 3.5461895e-02, -1.0232525e-01, -1.1956284e-02, ...,
,         9.4549108e-04,  9.4549108e-04,  2.5773358e-03],
,       [ 6.7939058e-02,  1.2740049e-02, -2.3002084e-02, ...,
,        -9.3789837e-05, -9.3789837e-05, -1.9718462e-03]], dtype=float32)

[ ]

Now, let's take the movies dataset to gather the metadata for each movie. We'll create an array data that contains all this information.

[ ]

Let's see some data

[ ]

Connect to LanceDB

We can now connect a path to LanceDB, which stores our vector database. We'll also create a new table movie_set with the data we just created.

[ ]

Get the Recommendations

Finally, we can create a function that takes a movie title and returns the top 5 similar movies. By searching in our vector store for the embeddings of the movie, we can return a dataframe of the most similar movies. We can also add some flair reading and displaying the links of each movie.

[ ]

Let's test it out!

[ ]

[['Moana (2016)', 'https://www.imdb.com/title/tt3521164'],
, ['The Boss Baby (2017)', 'https://www.imdb.com/title/tt3874544'],
, ['The Book of Life (2014)', 'https://www.imdb.com/title/tt2262227'],
, ['Kubo and the Two Strings (2016)', 'https://www.imdb.com/title/tt4302938'],
, ['Bad Moms (2016)', 'https://www.imdb.com/title/tt4651520']]

[ ]

[['Rogue One: A Star Wars Story (2016)',
,  'https://www.imdb.com/title/tt3748528'],
, ['Wonder Woman (2017)', 'https://www.imdb.com/title/tt0451279'],
, ['Miss Sloane (2016)', 'https://www.imdb.com/title/tt4540710'],
, ['Passengers (2016)', 'https://www.imdb.com/title/tt1355644'],
, ['Man of Steel (2013)', 'https://www.imdb.com/title/tt0770828']]

Get Recommendation for Movie of your Choice

[ ]

Tada!! your first movie recommendation system is live

Of course, this won't be completely accurate. There are other ways improve the accuracy, such as reducing the dimensions of the original data, or filtering out users/movies with few ratings. But this is a good start to building a movie recommender system.