Scrapegraph Langgraph Tavily
š·ļø Research Agent with scrapegraph, langgraph, and tavily
š§ Install dependencies
š Import ScrapeGraph, Tavily and OpenAI API keys
You can find the Scrapegraph API key here
Scrapegraph API key: Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā· Tavily API key: Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā· OpenAI API key: Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·
š Defining an Output Schema for Webpage Content Extraction
If you already know what you want to extract from a webpage, you can define an output schema using Pydantic. This schema acts as a "blueprint" that tells the AI how to structure the response.
Pydantic Schema Quick Guide
Types of Schemas
- Simple Schema
Use this when you want to extract straightforward information, such as a single piece of content.
from pydantic import BaseModel, Field
# Simple schema for a single webpage
class PageInfoSchema(BaseModel):
title: str = Field(description="The title of the webpage")
description: str = Field(description="The description of the webpage")
# Example Output JSON after AI extraction
{
"title": "ScrapeGraphAI: The Best Content Extraction Tool",
"description": "ScrapeGraphAI provides powerful tools for structured content extraction from websites."
}
- Complex Schema (Nested)
If you need to extract structured information with multiple related items (like a list of repositories), you can nest schemas.
from pydantic import BaseModel, Field
from typing import List
# Define a schema for a single repository
class RepositorySchema(BaseModel):
name: str = Field(description="Name of the repository (e.g., 'owner/repo')")
description: str = Field(description="Description of the repository")
stars: int = Field(description="Star count of the repository")
forks: int = Field(description="Fork count of the repository")
today_stars: int = Field(description="Stars gained today")
language: str = Field(description="Programming language used")
# Define a schema for a list of repositories
class ListRepositoriesSchema(BaseModel):
repositories: List[RepositorySchema] = Field(description="List of GitHub trending repositories")
# Example Output JSON after AI extraction
{
"repositories": [
{
"name": "google-gemini/cookbook",
"description": "Examples and guides for using the Gemini API",
"stars": 8036,
"forks": 1001,
"today_stars": 649,
"language": "Jupyter Notebook"
},
{
"name": "TEN-framework/TEN-Agent",
"description": "TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.",
"stars": 3224,
"forks": 311,
"today_stars": 361,
"language": "Python"
}
]
}
Key Takeaways
- Simple Schema: Perfect for small, straightforward extractions.
- Complex Schema: Use nesting to extract lists or structured data, like "a list of repositories."
Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.
š Initialize scrapegraph and tavily tools and langgraph prebuilt agent and run the extraction
Here we use SmartScraperTool to extract structured data using AI from a webpage.
If you already have an HTML file, you can upload it and use
LocalScraperToolinstead.
You can find more info in the official langchain documentation
We then initialize the llm model we want to use in the agent
Here we use create_react_agent to quickly use one of the prebuilt agents from langgraph.prebuilt module
You can find more info in the official langgraph documentation
Let's visualize the graph
Run the graph and stream the agent reasoning.
We are going to ask the agent to extract the content from a specific webpage.
================================ Human Message ================================= Find latest news related to robotics December 2024 ================================== Ai Message ================================== Tool Calls: urls_finder (call_MpBC8kxJoRPFaBEoXLmSZ4RX) Call ID: call_MpBC8kxJoRPFaBEoXLmSZ4RX Args: query: latest robotics news December 2024 ================================= Tool Message ================================= Name: urls_finder [{"url": "https://www.therobotreport.com/category/news/", "content": "The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024. By The Robot Report Staff | December 19, ... Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks. ... December 17, 2024. Slip Robotics picks up"}] ================================== Ai Message ================================== Tool Calls: SmartScraper (call_enar8djgNJfbcZAsp4nY1leM) Call ID: call_enar8djgNJfbcZAsp4nY1leM Args: user_prompt: Extract the latest news articles related to robotics from December 2024, including the title, date, and a brief summary of each article. website_url: https://www.therobotreport.com/category/news/ ================================= Tool Message ================================= Name: SmartScraper {"news": [{"title": "Matternet adds ANRA's UTM tech to expand drone delivery", "link": "https://www.therobotreport.com/matternet-adds-anras-utm-tech-to-expand-drone-delivery/", "description": "This latest partnership follows Matternetās recent launch of a drone delivery operation in Silicon Valley."}, {"title": "Helm.ai upgrades generative AI model to enrich autonomous driving data", "link": "https://www.therobotreport.com/helm-ai-upgrades-generative-ai-model-to-enrich-autonomous-driving-data/", "description": "Helm.ai said the new model enables automakers to generate diverse, realistic video data tailored to specific requirements."}, {"title": "New research analyzes safety of Waymo robotaxis", "link": "https://www.therobotreport.com/new-research-analyzes-safety-of-waymo-robotaxis/", "description": "Waymo shared research with Swiss Re, one of the worldās largest insurance providers, analyzing liability claims related to collisions from 25.3 million fully autonomous miles driven."}, {"title": "From AI to humanoids: top robotics trends of 2024", "link": "https://www.therobotreport.com/from-ai-to-humanoids-top-robotics-trends-of-2024/", "description": "The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024."}, {"title": "Symbotic acquires OhmniLabs, maker of disinfection & telepresence robots", "link": "https://www.therobotreport.com/symbotic-buys-healthcare-robot-maker-ohmnilabs/", "description": "With the acquisition of OhmniLabs, Symbotic said it will be better positioned to expand its capabilities for supply chain customers."}, {"title": "Sanctuary AI shows new dexterity with in-hand manipulation skills", "link": "https://www.therobotreport.com/sanctuary-ai-showing-new-dexterity-with-in-hand-manipulation-skills/", "description": "Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks."}, {"title": "Apptronik partners with Google DeepMind to advance humanoid robots with AI", "link": "https://www.therobotreport.com/apptronik-partners-google-deepmind-advance-humanoid-robots-ai/", "description": "Apptronik will combine its iterative design experience and Apollo humanoid in testing with Google DeepMindās AI platforms."}, {"title": "Alimak Group, Skyline Robotics create autonomous building maintenance unit", "link": "https://www.therobotreport.com/alimak-group-skyline-robotics-create-autonomous-building-maintenance-unit/", "description": "Skyline Robotics said the joint system can help the industry handle increasingly complex design challenges and labor shortages."}, {"title": "DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall", "link": "https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/", "description": "Beginning today, when certain DoorDash customers in Texas select drone delivery, their order will be delivered via Wing."}, {"title": "Mcity says open-source digital twin enables cheaper autonomous vehicle testing", "link": "https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/", "description": "The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere."}, {"title": "2024: The year humanoids woke up", "link": "https://www.therobotreport.com/2024-the-year-humanoids-woke-up/", "description": "Humanoids empowered by AI are coming, and the long-term market could be huge, Persona AIās Nic Radford tells columnist Oliver Mitchell."}, {"title": "Waymo robotaxis head to Tokyo with the help of Nihan Kotsu and GO", "link": "https://www.therobotreport.com/waymo-is-heading-to-tokyo-with-the-help-of-nihan-kotsu-and-go/", "description": "The first all-electric Jaguar I-PACEs for Waymo will arrive in Tokyo in early 2025 and will initially be driven by safety drivers."}, {"title": "Realbotix earns Amazon development subsidy; partners with UOL", "link": "https://www.therobotreport.com/realbotix-earns-amazon-development-subsidy-partners-with-uol/", "description": "Realbotix plans to use the funding to directly support the completion of initiatives including the development of Robot Controller 3.0."}, {"title": "Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception", "link": "https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/", "description": "Prototype of the Eyeonic Trace Laser Line Scanner, designed to provide subāmillimeter depth precision for next generation warehouse automation, robotics, farming, construction and manufacturing applications."}, {"title": "Slip Robotics picks up $28M for trailer loading/unloading robots", "link": "https://www.therobotreport.com/slip-robotics-picks-up-28m-for-trailer-loading-unloading-robots/", "description": "Slip Robotics plans to use its latest funding to continue RɦD on its trailer loading/unloading robots as it serves commercial customers."}, {"title": "Jetson Orin Nano Super developer kit available from NVIDIA", "link": "https://www.therobotreport.com/jetson-orin-nano-super-developer-kit-available/", "description": "NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users."}, {"title": "Mbodi and T-Robotics are ABB Robotics' AI Startup Challenge winners", "link": "https://www.therobotreport.com/mbodi-and-t-robotics-are-abb-robotics-ai-startup-challenge-winners/", "description": "ABB Robotics is working with Mbodi and T-Robotics to make industrial robots easier to program and enable them to learn on their own."}, {"title": "IEEE Awards announce Daniela Rus as 2025 Edison Medal recipient", "link": "https://www.therobotreport.com/ieee-awards-announce-daniela-rus-2025-edison-medal-recipient/", "description": "Currently the director of MIT CSAIL, Daniela Rusā research interests include robotics, mobile computing, and data science."}, {"title": "Eureka Robotics raises $10.5M to scale its vision systems in the U.S.", "link": "https://www.therobotreport.com/eureka-robotics-raises-10-5m-scale-its-vision-systems-in-u-s/", "description": "Eureka Robotics provides software and system to automate tasks that require high accuracy and high agility."}, {"title": "Vision-guided cobot automates paint process for DENSO", "link": "https://www.therobotreport.com/denso-automates-paint-process-vision-guided-cobot/", "description": "DENSO deployed a 3D-vision-guided cobot with AI-based motion planning and control software to relieve employees of strenuous, tedious tasks."}, {"title": "Brushed DC motors find use in robot applications, humanoid development", "link": "https://www.therobotreport.com/brushed-dc-motors-find-use-in-robot-applications-humanoid-development/", "description": "Recent research from Portescap found that brushed DC motors best fulfill the high requirements of humanoid robots."}, {"title": "Diversity and inclusion can accelerate robotics innovation, finds Max Planck study", "link": "https://www.therobotreport.com/diversity-and-inclusion-can-accelerate-robotic-innovation-finds-max-planck-study/", "description": "The study outlined seven distinct benefits that diversity and inclusion bring to robotics research and innovation."}, {"title": "Advanced Precision Strain Wave Gear Offers Torque Sensing to Robots", "link": "https://www.therobotreport.com/advanced-precision-strain-wave-gear-offers-torque-sensing-to-robots/", "description": "NA"}, {"title": "Innovative motion solutions are supporting the latest trends in robotics", "link": "https://www.therobotreport.com/innovative-motion-solutions-are-supporting-the-latest-trends-in-robotics/", "description": "NA"}, {"title": "Renishaw and RLS help to drive a robot revolution", "link": "https://www.therobotreport.com/renishaw-and-rls-help-to-drive-a-robot-revolution/", "description": "NA"}, {"title": "Ask an Expert Podcast: flexible conveyance for materials handling", "link": "https://www.therobotreport.com/ask-an-expert-flexible-conveyors-for-materials-handling/", "description": "NA"}, {"title": "Hop Onboard the AMR Revolution: Vision & Localization Unleashed", "link": "https://www.therobotreport.com/hop-onboard-the-amr-revolution-vision-localization-unleashed/", "description": "NA"}]}
Print the response
{
"news": [
{
"title": "Matternet adds ANRA's UTM tech to expand drone delivery",
"link": "https://www.therobotreport.com/matternet-adds-anras-utm-tech-to-expand-drone-delivery/",
"description": "This latest partnership follows Matternet\u2019s recent launch of a drone delivery operation in Silicon Valley."
},
{
"title": "Helm.ai upgrades generative AI model to enrich autonomous driving data",
"link": "https://www.therobotreport.com/helm-ai-upgrades-generative-ai-model-to-enrich-autonomous-driving-data/",
"description": "Helm.ai said the new model enables automakers to generate diverse, realistic video data tailored to specific requirements."
},
{
"title": "New research analyzes safety of Waymo robotaxis",
"link": "https://www.therobotreport.com/new-research-analyzes-safety-of-waymo-robotaxis/",
"description": "Waymo shared research with Swiss Re, one of the world\u2019s largest insurance providers, analyzing liability claims related to collisions from 25.3 million fully autonomous miles driven."
},
{
"title": "From AI to humanoids: top robotics trends of 2024",
"link": "https://www.therobotreport.com/from-ai-to-humanoids-top-robotics-trends-of-2024/",
"description": "The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024."
},
{
"title": "Symbotic acquires OhmniLabs, maker of disinfection & telepresence robots",
"link": "https://www.therobotreport.com/symbotic-buys-healthcare-robot-maker-ohmnilabs/",
"description": "With the acquisition of OhmniLabs, Symbotic said it will be better positioned to expand its capabilities for supply chain customers."
},
{
"title": "Sanctuary AI shows new dexterity with in-hand manipulation skills",
"link": "https://www.therobotreport.com/sanctuary-ai-showing-new-dexterity-with-in-hand-manipulation-skills/",
"description": "Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks."
},
{
"title": "Apptronik partners with Google DeepMind to advance humanoid robots with AI",
"link": "https://www.therobotreport.com/apptronik-partners-google-deepmind-advance-humanoid-robots-ai/",
"description": "Apptronik will combine its iterative design experience and Apollo humanoid in testing with Google DeepMind\u2019s AI platforms."
},
{
"title": "Alimak Group, Skyline Robotics create autonomous building maintenance unit",
"link": "https://www.therobotreport.com/alimak-group-skyline-robotics-create-autonomous-building-maintenance-unit/",
"description": "Skyline Robotics said the joint system can help the industry handle increasingly complex design challenges and labor shortages."
},
{
"title": "DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall",
"link": "https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/",
"description": "Beginning today, when certain DoorDash customers in Texas select drone delivery, their order will be delivered via Wing."
},
{
"title": "Mcity says open-source digital twin enables cheaper autonomous vehicle testing",
"link": "https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/",
"description": "The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere."
},
{
"title": "2024: The year humanoids woke up",
"link": "https://www.therobotreport.com/2024-the-year-humanoids-woke-up/",
"description": "Humanoids empowered by AI are coming, and the long-term market could be huge, Persona AI\u2019s Nic Radford tells columnist Oliver Mitchell."
},
{
"title": "Waymo robotaxis head to Tokyo with the help of Nihan Kotsu and GO",
"link": "https://www.therobotreport.com/waymo-is-heading-to-tokyo-with-the-help-of-nihan-kotsu-and-go/",
"description": "The first all-electric Jaguar I-PACEs for Waymo will arrive in Tokyo in early 2025 and will initially be driven by safety drivers."
},
{
"title": "Realbotix earns Amazon development subsidy; partners with UOL",
"link": "https://www.therobotreport.com/realbotix-earns-amazon-development-subsidy-partners-with-uol/",
"description": "Realbotix plans to use the funding to directly support the completion of initiatives including the development of Robot Controller 3.0."
},
{
"title": "Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception",
"link": "https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/",
"description": "Prototype of the Eyeonic Trace Laser Line Scanner, designed to provide sub\u2013millimeter depth precision for next generation warehouse automation, robotics, farming, construction and manufacturing applications."
},
{
"title": "Slip Robotics picks up $28M for trailer loading/unloading robots",
"link": "https://www.therobotreport.com/slip-robotics-picks-up-28m-for-trailer-loading-unloading-robots/",
"description": "Slip Robotics plans to use its latest funding to continue R\u0266D on its trailer loading/unloading robots as it serves commercial customers."
},
{
"title": "Jetson Orin Nano Super developer kit available from NVIDIA",
"link": "https://www.therobotreport.com/jetson-orin-nano-super-developer-kit-available/",
"description": "NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users."
},
{
"title": "Mbodi and T-Robotics are ABB Robotics' AI Startup Challenge winners",
"link": "https://www.therobotreport.com/mbodi-and-t-robotics-are-abb-robotics-ai-startup-challenge-winners/",
"description": "ABB Robotics is working with Mbodi and T-Robotics to make industrial robots easier to program and enable them to learn on their own."
},
{
"title": "IEEE Awards announce Daniela Rus as 2025 Edison Medal recipient",
"link": "https://www.therobotreport.com/ieee-awards-announce-daniela-rus-2025-edison-medal-recipient/",
"description": "Currently the director of MIT CSAIL, Daniela Rus\u2019 research interests include robotics, mobile computing, and data science."
},
{
"title": "Eureka Robotics raises $10.5M to scale its vision systems in the U.S.",
"link": "https://www.therobotreport.com/eureka-robotics-raises-10-5m-scale-its-vision-systems-in-u-s/",
"description": "Eureka Robotics provides software and system to automate tasks that require high accuracy and high agility."
},
{
"title": "Vision-guided cobot automates paint process for DENSO",
"link": "https://www.therobotreport.com/denso-automates-paint-process-vision-guided-cobot/",
"description": "DENSO deployed a 3D-vision-guided cobot with AI-based motion planning and control software to relieve employees of strenuous, tedious tasks."
},
{
"title": "Brushed DC motors find use in robot applications, humanoid development",
"link": "https://www.therobotreport.com/brushed-dc-motors-find-use-in-robot-applications-humanoid-development/",
"description": "Recent research from Portescap found that brushed DC motors best fulfill the high requirements of humanoid robots."
},
{
"title": "Diversity and inclusion can accelerate robotics innovation, finds Max Planck study",
"link": "https://www.therobotreport.com/diversity-and-inclusion-can-accelerate-robotic-innovation-finds-max-planck-study/",
"description": "The study outlined seven distinct benefits that diversity and inclusion bring to robotics research and innovation."
},
{
"title": "Advanced Precision Strain Wave Gear Offers Torque Sensing to Robots",
"link": "https://www.therobotreport.com/advanced-precision-strain-wave-gear-offers-torque-sensing-to-robots/",
"description": "NA"
},
{
"title": "Innovative motion solutions are supporting the latest trends in robotics",
"link": "https://www.therobotreport.com/innovative-motion-solutions-are-supporting-the-latest-trends-in-robotics/",
"description": "NA"
},
{
"title": "Renishaw and RLS help to drive a robot revolution",
"link": "https://www.therobotreport.com/renishaw-and-rls-help-to-drive-a-robot-revolution/",
"description": "NA"
},
{
"title": "Ask an Expert Podcast: flexible conveyance for materials handling",
"link": "https://www.therobotreport.com/ask-an-expert-flexible-conveyors-for-materials-handling/",
"description": "NA"
},
{
"title": "Hop Onboard the AMR Revolution: Vision & Localization Unleashed",
"link": "https://www.therobotreport.com/hop-onboard-the-amr-revolution-vision-localization-unleashed/",
"description": "NA"
}
]
}
š¾ Save the output to a CSV file
Let's create a pandas dataframe and show the table with the extracted content
Save it to CSV
Data saved to news.csv
š Resources
- š Get your API Key: ScrapeGraphAI Dashboard
- š GitHub: ScrapeGraphAI GitHub
- š¼ LinkedIn: ScrapeGraphAI LinkedIn
- š¦ Twitter: ScrapeGraphAI Twitter
- š¬ Discord: Join our Discord Community
- š¦ Langchain: ScrapeGraph docs
Made with ā¤ļø by the ScrapeGraphAI Team