Pdf Chunking Ingest
openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticingestion-and-chunkingopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications
Export
PDF Extraction and Ingest with ELSER Example
This workbook demonstrates how to extract the contents of a single PDF, create passages and ingest into Elasticsearch.
In this example we will:
- load the PDF using pypdf
- chunk the text with LangChain document splitter
- ingest into Elasticsearch with LangChain Elasticsearch Vectorstore.
We will also setup your Elasticsearch cluster with ELSER model, so we can use it to embed the passages.
[ ]
Connecting to Elasticsearch
[16]
Deploying ELSER
[ ]
Importing PDF chunks into Index
This will load the PDF from the url provided, and then chunk the text into passage docs.
[6]
Ingesting the passages into Elasticsearch
This will ingest the passage docs into the Elasticsearch index, under the specified INDEX_NAME.
[ ]