Notebooks
L
LanceDB
Movie Recommendation With Doc2vec And Lancedb

Movie Recommendation With Doc2vec And Lancedb

archived_examplesagentsllmsvector-databasemovie-recommendation-with-genreslancedbgptopenaiAImultimodal-aimachine-learningembeddingsfine-tuningexamplesdeep-learninggpt-4-visionllama-indexragmultimodallangchainlancedb-recipes

Movie Recommendation System using Doc2vec Embeddings and Vector DB

This Colab notebook aims to illustrate the process of creating a recommendation system using embeddings and a Vector DB.

This approach involves combining the various movie genres or characteristics of a movie to form Doc2Vec embeddings, which offer a comprehensive portrayal of the movie content.

These embeddings serve dual purposes: they can either be directly inputted into a classification model for genre classification or stored in a VectorDB. By storing embeddings in a VectorDB, efficient retrieval and query search for recommendations become possible at a later stage.

Installing the relevant dependencies

[ ]

Kaggle Configuration and Data Needs

We are using a movies metadata data which is being uploaded on the Kaggle. To download the dataset and use it for our recommendation system, we will need a kaggle.json file containing our creds.

You can download the kaggle.json file from your Kaggle account settings. Follow these steps and make your life easy.

  1. Go to Kaggle and log in to your account.
  2. Navigate to Your Account Settings and click on your profile picture in the top right corner of the page, Now From the dropdown menu, select Account.
  3. Scroll down to the API section, Click on Create New API Token. This will download a file named kaggle.json to your computer.

Once you have the kaggle.json file, you need to upload it here on colab data space. After uploading the kaggle.json file, run the following code to set up the credentials and download the dataset in data directory

[ ]
[ ]
[ ]
100%|██████████| 1000/1000 [00:00<00:00, 5050.83it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5161.29it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5006.18it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5222.83it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5216.24it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5171.35it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5109.78it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5222.42it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5133.39it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5024.74it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5117.18it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4963.78it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5405.55it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5369.51it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5349.33it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5374.53it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5194.32it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5296.75it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5204.32it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5309.43it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5333.12it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5289.35it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5317.42it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5322.46it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5378.43it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5488.32it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5546.43it/s]
100%|██████████| 1000/1000 [00:00<00:00, 2502.38it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5369.91it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4354.99it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5193.60it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5536.27it/s]
100%|██████████| 1000/1000 [00:00<00:00, 3476.56it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4819.07it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4500.37it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5184.11it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5098.14it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5523.73it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4655.12it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5113.63it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5336.63it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5564.83it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5310.91it/s]
100%|██████████| 1000/1000 [00:00<00:00, 5533.46it/s]
100%|██████████| 1000/1000 [00:00<00:00, 4255.41it/s]
100%|██████████| 466/466 [00:00<00:00, 5617.03it/s]
Building Vocabulary: 100%|██████████| 44506/44506 [00:00<00:00, 104121.48it/s]
Epoch 1: 100%|██████████| 44506/44506 [00:02<00:00, 20444.80it/s]
Epoch 2: 100%|██████████| 44506/44506 [00:02<00:00, 20700.43it/s]
Epoch 3: 100%|██████████| 44506/44506 [00:02<00:00, 20831.06it/s]
Epoch 4: 100%|██████████| 44506/44506 [00:02<00:00, 20885.78it/s]
Epoch 5: 100%|██████████| 44506/44506 [00:02<00:00, 19616.38it/s]
Epoch 6: 100%|██████████| 44506/44506 [00:02<00:00, 19634.24it/s]
Epoch 7: 100%|██████████| 44506/44506 [00:02<00:00, 20579.08it/s]
Epoch 8: 100%|██████████| 44506/44506 [00:02<00:00, 20727.00it/s]
Epoch 9: 100%|██████████| 44506/44506 [00:02<00:00, 21242.19it/s]
Epoch 10: 100%|██████████| 44506/44506 [00:02<00:00, 18476.39it/s]
Epoch 11: 100%|██████████| 44506/44506 [00:02<00:00, 21169.07it/s]
Epoch 12: 100%|██████████| 44506/44506 [00:02<00:00, 20967.64it/s]
Epoch 13: 100%|██████████| 44506/44506 [00:02<00:00, 20192.34it/s]
Epoch 14: 100%|██████████| 44506/44506 [00:02<00:00, 18910.62it/s]
Epoch 15: 100%|██████████| 44506/44506 [00:02<00:00, 20810.41it/s]
Epoch 16: 100%|██████████| 44506/44506 [00:02<00:00, 21361.88it/s]
Epoch 17: 100%|██████████| 44506/44506 [00:02<00:00, 18440.51it/s]
Epoch 18: 100%|██████████| 44506/44506 [00:02<00:00, 21206.01it/s]
Epoch 19: 100%|██████████| 44506/44506 [00:02<00:00, 20086.00it/s]
Epoch 20: 100%|██████████| 44506/44506 [00:02<00:00, 20943.08it/s]

Training a Neural Network for the Genre Classification Task

[ ]

Testing the model to see if our model is able to predict the genres for the movies from the test dataset

[ ]

Storing the Doc2Vec Embeddings into LanceDB VectorDatabase

[ ]
[ ]

D-Day : Let's generate some recommendations

[ ]