Main

agentsllmslancedb_cloudmovie-recommendervector-databaselancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Building a Simple Movie Recommender with LanceDB Cloud

Credentials

Copy and paste the project name and the api key from your project page. These will be used later to connect to LanceDB Cloud

[ ]
[ ]

You can also set the LANCEDB_API_KEY as an environment variable with one of the options below

[27]
[22]

Download the raw data

Download and unzip the dataset from MovieLens. The example uses the 100k small dataset.

[ ]
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  955k  100  955k    0     0   426k      0  0:00:02  0:00:02 --:--:--  426k
Archive:  ml-latest-small.zip
replace ml-latest-small/links.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

Let's start by installing and importing the libraries

[ ]
[7]

Load and unpack the dataset

After unzipping the dataset from MovieLens, we'll load the user ratings into a dataframe ratings. Since we don't need timestamp, we can drop that column. We also have to prepare the columns in proper formats.

[8]

We can now create a matrix of user ratings, where each row represents a user and each column represents a movie.

[9]

Now, we apply the SVD. vh represents the orthogonal matrix that describes the relationship between columns of the original matrix. Because the


columns of vh correspond to the movies, we can flip it to get the movie embeddings on the rows.

[10]
array([[-7.0449896e-02, -3.8539346e-02, -1.5912922e-02, ...,
,        -6.4683605e-05, -6.4683605e-05, -2.7172931e-04],
,       [ 2.7591195e-02,  2.0666271e-03,  2.4714615e-02, ...,
,        -5.9758622e-04, -5.9758622e-04, -1.2723620e-03],
,       [-7.8443885e-02, -5.6844711e-02, -1.8005114e-02, ...,
,         8.7109387e-05,  8.7109387e-05, -1.2283334e-04],
,       ...,
,       [ 2.0168955e-02,  1.3377998e-02, -1.9083519e-02, ...,
,        -1.9121233e-03, -1.9121233e-03, -2.1089672e-03],
,       [ 3.5461895e-02, -1.0232525e-01, -1.1956284e-02, ...,
,         9.4549108e-04,  9.4549108e-04,  2.5773358e-03],
,       [ 6.7939058e-02,  1.2740049e-02, -2.3002084e-02, ...,
,        -9.3789837e-05, -9.3789837e-05, -1.9718462e-03]], dtype=float32)
[11]

Now, let's take the movies dataset to gather the metadata for each movie. We'll create an array data that contains all this information.

[12]

Let's see some data

[13]

Connect to LanceDB Cloud

We can now connect to LanceDB Cloud, with the API key that is accquired from the dashboard. We'll also create a new table movie_set with the data we just created.

[25]

Get the Recommendations

Finally, we can create a function that takes a movie title and returns the top 5 similar movies. By searching in our vector store for the embeddings of the movie, we can return a dataframe of the most similar movies. We can also add some flair reading and displaying the links of each movie.

[ ]

Let's test it out!

[ ]
[['Moana (2016)', 'https://www.imdb.com/title/tt3521164'],
, ['The Boss Baby (2017)', 'https://www.imdb.com/title/tt3874544'],
, ['The Book of Life (2014)', 'https://www.imdb.com/title/tt2262227'],
, ['Kubo and the Two Strings (2016)', 'https://www.imdb.com/title/tt4302938'],
, ['Bad Moms (2016)', 'https://www.imdb.com/title/tt4651520']]
[ ]
[['Rogue One: A Star Wars Story (2016)',
,  'https://www.imdb.com/title/tt3748528'],
, ['Wonder Woman (2017)', 'https://www.imdb.com/title/tt0451279'],
, ['Miss Sloane (2016)', 'https://www.imdb.com/title/tt4540710'],
, ['Passengers (2016)', 'https://www.imdb.com/title/tt1355644'],
, ['Man of Steel (2013)', 'https://www.imdb.com/title/tt0770828']]

Tada!! your first movie recommendation system is live

Of course, this won't be completely accurate. There are other ways improve the accuracy, such as reducing the dimensions of the original data, or filtering out users/movies with few ratings. But this is a good start to building a movie recommender system.