Local Embedding Example
Getting Started
For embedding, we'll be using a model from SentenceTransformers, which allows us to use a fast and light version of BERT without the heavy compute overhead.
Make sure you have access to your Elastic Cloud ID and Elastic API Key. The next code snippet will prompt you for both and connect to Elasticsearch.
1. Embedding text into vectors
Our book documents are located in ../data/books.json. They do not currently have any vectors. The next section will parse through a small batch of 25 book objects and create vector embeddings for each book_description. If you would like to run this on all 10,909 book objects, change the file_path to ../data/books.json instead of ../data/small_books.json.
We can inspect the file and observe that there are now vector embeddings added to each document. Let's take a look at the first book in the small_books_embedded.json file
Now lets create an index for our books using the Elasticsearch client.
Now that we have created an index in Elasticsearch, we can index our local book objects. This bulk_ingest_books method will make indexing documents much faster than if we were to run an index function on each individual book.
Lets also add one book, as this would be a standard function as you add new books to your vector database