Notebooks
E
Elastic
Pdf Chunking Ingest

Pdf Chunking Ingest

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticingestion-and-chunkingopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

PDF Extraction and Ingest with ELSER Example

Open In Colab

This workbook demonstrates how to extract the contents of a single PDF, create passages and ingest into Elasticsearch.

In this example we will:

  • load the PDF using pypdf
  • chunk the text with LangChain document splitter
  • ingest into Elasticsearch with LangChain Elasticsearch Vectorstore.

We will also setup your Elasticsearch cluster with ELSER model, so we can use it to embed the passages.

[ ]

Connecting to Elasticsearch

[16]

Deploying ELSER

[ ]

Importing PDF chunks into Index

This will load the PDF from the url provided, and then chunk the text into passage docs.

[6]

Ingesting the passages into Elasticsearch

This will ingest the passage docs into the Elasticsearch index, under the specified INDEX_NAME.

[ ]