Scrapegraph Burr Lancedb
🕷️ Chat with your webpage with scrapegraph, burr and lancedb
🔧 Install dependencies
🔑 Import ScrapeGraph and OpenAI API keys
You can find the Scrapegraph API key here
Scrapegraph API key: ·········· OpenAI API key: ··········
🚀 Define the extraction and query flow using burr and run it
burr is an open-source orchestrator framework that makes it easy to develop applications that make decisions (chatbots, agents, simulations, etc...). It also features a cool self-hosted UI to trace what's happening in the application.
Check the Github Repo
Our goal is to define a flow (DAG) that:
- Fetches markdown from webpages (scrapegraph)
- Chunks the content and stores it in a vector store (lancedb)
- Allows to query the db and generate an answer using a LLM
We can see all of this as Nodes connected together in a Graph, where the Nodes are the actions we want to perform.
In burr we define actions by simply adding the action decorator to each function and specifying what that function needs to read and write from the graph's state.
All imports
🔍 Define fetch_webpage action to fetch and convert a webpage into markdown
Here we use markdownify to fetch a webpage and convert it into markdown format, which is suitable to LLM ingestion.
You can find more info in the official scrapegraph documentation
💬 2024-12-29 13:56:08,576 🔑 Initializing Client INFO:scrapegraph:🔑 Initializing Client 💬 2024-12-29 13:56:08,582 ✅ Client initialized successfully INFO:scrapegraph:✅ Client initialized successfully
📁 Define embed_and_store action to chunk the markdown and store it in a local vector store
Define the data structure to hold the chunks in the lancedb vector store
Utils to create chunks based on the number of tokens
Let's define the action. It creates a webpages local vector store if not present and add the chunks to the chunks table
💬 Define ask_question action to retrieve the most relevant chunks from the vector store and query them with a llm
Fetches the first 3 relevant chunks based on the user query and generate and answer
🤖 Define the burr application graph and run it
💬 2024-12-29 13:51:54,568 🔍 Starting markdownify request for https://scrapegraphai.com/ INFO:scrapegraph:🔍 Starting markdownify request for https://scrapegraphai.com/ 💬 2024-12-29 13:51:54,577 🚀 Making POST request to https://api.scrapegraphai.com/v1/markdownify INFO:scrapegraph:🚀 Making POST request to https://api.scrapegraphai.com/v1/markdownify 💬 2024-12-29 13:51:57,646 ✅ Request completed successfully: POST https://api.scrapegraphai.com/v1/markdownify INFO:scrapegraph:✅ Request completed successfully: POST https://api.scrapegraphai.com/v1/markdownify 💬 2024-12-29 13:51:57,649 ✨ Markdownify request completed successfully INFO:scrapegraph:✨ Markdownify request completed successfully
Request ID: d646737f-2dbd-4c6d-aecb-fc5f2c3e132d Markdown Content: [Star us on GitHub0](https://github.com/ScrapeGraphAI/Scrapegraph-ai) ## Transform Websites into Structured Data ### Just One Prompt Away Transform any website into clean, organized data for AI a... (truncated)
The founders of ScrapeGraphAI are: 1. **** - Founder & Technical Lead - [LinkedIn profile of ](https://www.linkedin.com/in/perinim/) 2. **Marco Vinciguerra** - Founder & Software Engineer - [LinkedIn profile of Marco Vinciguerra](https://www.linkedin.com/in/marco-vinciguerra-7ba365242/) 3. **Lorenzo Padoan** - Founder & Product Engineer - [LinkedIn profile of Lorenzo Padoan](https://www.linkedin.com/in/lorenzo-padoan-4521a2154)
🖼️ Visualize the traces with Burr UI
🔗 Resources
- 🚀 Get your API Key: ScrapeGraphAI Dashboard
- 🐙 GitHub: ScrapeGraphAI GitHub
- 💼 LinkedIn: ScrapeGraphAI LinkedIn
- 🐦 Twitter: ScrapeGraphAI Twitter
- 💬 Discord: Join our Discord Community
- ⏩ Burr: Github
- 🛢️ LanceDB: Github
Made with ❤️ by the ScrapeGraphAI Team