Notebooks
E
Elastic
Building Your Own Spotify Wrapped

Building Your Own Spotify Wrapped

spotify-wrapped-dashboardopenai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasePythonsearchgenaistacksupporting-blog-contentvectorelasticsearch-labslangchainapplications

Building your own Spotify Wrapped

In this notebook we will generate a custom version of the top artists, songs, and trends over the year based on our downloadable spotify personal history.

You can request your data from Spotify via this link. Make sure to check your extended data. This process can take up to a month so you will have to wait for a few weeks before your json files are generated and sent to you. You can then add these files in the data folder to run the indexing process and build your own dashboard.

Alternatively, you can test the notebook with the mini sample data provided.

Exploring Spotify Streaming Data

Once data has been exported we can take a look at the stats. Spotify provides some helpful metadata to help understand the format:

Let's do a quick test to view our data - only selecting certain columns for some personal data privacy:

[60]
[54]

Connecting to your Elastic cluster

[51]

Adding the documents into an elasticsearch index

Once your data is available you can add your documents in a local folder. In my example I put my json files for the 5 years of data history I got into the data folder. For the purpose of this demo notebook I have also added a simplified sample of my streaming data with some hidden fields for data privacy that can be used as an example to run the following cells.

[12]
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'spotify-history-eli'})

We can now open these files with a json reader and directly generate documents for our elasticsearch index from the files.

[3]
[13]
[14]

Once the data is added into elastic, a mapping is automatically generated. The data from Spotify is already high quality so this mapping is accurate enough by default that we don't need to pre-define it manually. The main important detail to pay attention to is that fields like artist name also generate as a keyword which will enable us to run more complex aggregations in the following steps.

Here's what it will look like on the Elastic side:

[ ]

We can now run queries on our data

[55]
We get back 5653 results, here are the first ones:
My Love Will Never Die
Angel Of Small Death & The Codeine Scene
Someone New - Live

My top artists of all time

[56]
{'key': 'Hozier', 'doc_count': 5653}
{'key': 'Ariana Grande', 'doc_count': 1543}
{'key': 'Billie Eilish', 'doc_count': 1226}
{'key': 'Halsey', 'doc_count': 1076}
{'key': 'Taylor Swift', 'doc_count': 650}
{'key': 'Cardi B', 'doc_count': 547}
{'key': 'Beyoncé', 'doc_count': 525}
{'key': 'Avril Lavigne', 'doc_count': 469}
{'key': 'BLACKPINK', 'doc_count': 413}
{'key': 'Paramore', 'doc_count': 397}

Artists of 2024 by # of times playes

[57]
Linkin Park played 271 times.
Hozier played 268 times.
Dua Lipa played 112 times.
Taylor Swift played 106 times.
Måneskin played 61 times.
Avril Lavigne played 55 times.
Evanescence played 40 times.
Paramore played 35 times.
The Pretty Reckless played 34 times.
Green Day played 33 times.

Top artists by amount of time played

[58]
Hozier played 268 times; for a total of 13.69 hours
Linkin Park played 271 times; for a total of 12.21 hours
Dua Lipa played 112 times; for a total of 4.43 hours
Taylor Swift played 106 times; for a total of 4.39 hours
Måneskin played 61 times; for a total of 2.48 hours
Avril Lavigne played 55 times; for a total of 2.05 hours
Evanescence played 40 times; for a total of 1.96 hours
Adele played 27 times; for a total of 1.6 hours
Billie Eilish played 32 times; for a total of 1.57 hours
Green Day played 33 times; for a total of 1.5 hours

From here - you can read the blog on how to build the visualizations