Get Embeddings From Dataset
Get embeddings
This notebook contains some helpful snippets you can use to embed text with the 'text-embedding-ada-002' model via Azure OpenAI API.
Installation
Install the Azure Open AI SDK using the below command.
Run this cell, it will prompt you for the apiKey, endPoint, and embedding deployment
Import namesapaces and create an instance of OpenAiClient using the azureOpenAIEndpoint and the azureOpenAIKey
1. Load the dataset
The dataset used in this example is fine-food reviews from Amazon. The dataset contains a total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of this dataset, consisting of 1,000 most recent reviews for illustration purposes. The reviews are in English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review title (Summary) and review body (Text).
We will combine the review summary and review text into a single combined text. The model will encode this combined text and it will output a single vector embedding.
Let's load the fine_food_reviews_1k.csv dataset using the value kernel
Loading Microsoft.Data.Analysis lastest package
Loading extensions from `C:\Users\dicolomb\.nuget\packages\microsoft.data.analysis\0.21.0\interactive-extensions\dotnet\Microsoft.Data.Analysis.Interactive.dll`
use tokenizer to calculate the token count
2. Get embeddings and save them for future reuse
Use the batch approach when calculating a lot of embeddings.