Elastic Pdf Chunking Ingest

Pdf Chunking Ingest

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticingestion-and-chunkingopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

alph-notebooks/elasticsearch-labs / pdf-chunking-ingest.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

PDF Extraction and Ingest with ELSER Example

This workbook demonstrates how to extract the contents of a single PDF, create passages and ingest into Elasticsearch.

In this example we will:

load the PDF using pypdf
chunk the text with LangChain document splitter
ingest into Elasticsearch with LangChain Elasticsearch Vectorstore.

We will also setup your Elasticsearch cluster with ELSER model, so we can use it to embed the passages.

[ ]

Connecting to Elasticsearch

[16]

Deploying ELSER

[ ]

Importing PDF chunks into Index

This will load the PDF from the url provided, and then chunk the text into passage docs.

[6]

Ingesting the passages into Elasticsearch

This will ingest the passage docs into the Elasticsearch index, under the specified INDEX_NAME.

[ ]