Movie Recommendation With Doc2vec And Lancedb
Movie Recommendation System using Doc2vec Embeddings and Vector DB
This Colab notebook aims to illustrate the process of creating a recommendation system using embeddings and a Vector DB.
This approach involves combining the various movie genres or characteristics of a movie to form Doc2Vec embeddings, which offer a comprehensive portrayal of the movie content.
These embeddings serve dual purposes: they can either be directly inputted into a classification model for genre classification or stored in a VectorDB. By storing embeddings in a VectorDB, efficient retrieval and query search for recommendations become possible at a later stage.
Installing the relevant dependencies
Kaggle Configuration and Data Needs
We are using a movies metadata data which is being uploaded on the Kaggle. To download the dataset and use it for our recommendation system, we will need a kaggle.json file containing our creds.
You can download the kaggle.json file from your Kaggle account settings. Follow these steps and make your life easy.
- Go to Kaggle and log in to your account.
- Navigate to Your Account Settings and click on your profile picture in the top right corner of the page, Now From the dropdown menu, select
Account. - Scroll down to the
APIsection, Click onCreate New API Token. This will download a file named kaggle.json to your computer.
Once you have the kaggle.json file, you need to upload it here on colab data space. After uploading the kaggle.json file, run the following code to set up the credentials and download the dataset in data directory
100%|██████████| 1000/1000 [00:00<00:00, 5050.83it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5161.29it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5006.18it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5222.83it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5216.24it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5171.35it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5109.78it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5222.42it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5133.39it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5024.74it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5117.18it/s] 100%|██████████| 1000/1000 [00:00<00:00, 4963.78it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5405.55it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5369.51it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5349.33it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5374.53it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5194.32it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5296.75it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5204.32it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5309.43it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5333.12it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5289.35it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5317.42it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5322.46it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5378.43it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5488.32it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5546.43it/s] 100%|██████████| 1000/1000 [00:00<00:00, 2502.38it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5369.91it/s] 100%|██████████| 1000/1000 [00:00<00:00, 4354.99it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5193.60it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5536.27it/s] 100%|██████████| 1000/1000 [00:00<00:00, 3476.56it/s] 100%|██████████| 1000/1000 [00:00<00:00, 4819.07it/s] 100%|██████████| 1000/1000 [00:00<00:00, 4500.37it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5184.11it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5098.14it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5523.73it/s] 100%|██████████| 1000/1000 [00:00<00:00, 4655.12it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5113.63it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5336.63it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5564.83it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5310.91it/s] 100%|██████████| 1000/1000 [00:00<00:00, 5533.46it/s] 100%|██████████| 1000/1000 [00:00<00:00, 4255.41it/s] 100%|██████████| 466/466 [00:00<00:00, 5617.03it/s] Building Vocabulary: 100%|██████████| 44506/44506 [00:00<00:00, 104121.48it/s] Epoch 1: 100%|██████████| 44506/44506 [00:02<00:00, 20444.80it/s] Epoch 2: 100%|██████████| 44506/44506 [00:02<00:00, 20700.43it/s] Epoch 3: 100%|██████████| 44506/44506 [00:02<00:00, 20831.06it/s] Epoch 4: 100%|██████████| 44506/44506 [00:02<00:00, 20885.78it/s] Epoch 5: 100%|██████████| 44506/44506 [00:02<00:00, 19616.38it/s] Epoch 6: 100%|██████████| 44506/44506 [00:02<00:00, 19634.24it/s] Epoch 7: 100%|██████████| 44506/44506 [00:02<00:00, 20579.08it/s] Epoch 8: 100%|██████████| 44506/44506 [00:02<00:00, 20727.00it/s] Epoch 9: 100%|██████████| 44506/44506 [00:02<00:00, 21242.19it/s] Epoch 10: 100%|██████████| 44506/44506 [00:02<00:00, 18476.39it/s] Epoch 11: 100%|██████████| 44506/44506 [00:02<00:00, 21169.07it/s] Epoch 12: 100%|██████████| 44506/44506 [00:02<00:00, 20967.64it/s] Epoch 13: 100%|██████████| 44506/44506 [00:02<00:00, 20192.34it/s] Epoch 14: 100%|██████████| 44506/44506 [00:02<00:00, 18910.62it/s] Epoch 15: 100%|██████████| 44506/44506 [00:02<00:00, 20810.41it/s] Epoch 16: 100%|██████████| 44506/44506 [00:02<00:00, 21361.88it/s] Epoch 17: 100%|██████████| 44506/44506 [00:02<00:00, 18440.51it/s] Epoch 18: 100%|██████████| 44506/44506 [00:02<00:00, 21206.01it/s] Epoch 19: 100%|██████████| 44506/44506 [00:02<00:00, 20086.00it/s] Epoch 20: 100%|██████████| 44506/44506 [00:02<00:00, 20943.08it/s]