Notebooks
S
ScrapeGraphAI
Scrapegraph Langgraph Tavily

Scrapegraph Langgraph Tavily

research-agentscrapegraph-pysdk-nodejscookbookscrapingjson-schemasdk-pythonweb-scraping-pythonweb-scrapingsdk-jsapiPythonscrapegraphweb-crawler

šŸ•·ļø Research Agent with scrapegraph, langgraph, and tavily

Presentazione senza titolo.pptx (9).png

šŸ”§ Install dependencies

[1]

šŸ”‘ Import ScrapeGraph, Tavily and OpenAI API keys

You can find the Scrapegraph API key here

[2]
Scrapegraph API key:
Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·
Tavily API key:
Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·
OpenAI API key:
Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·Ā·

šŸ“ Defining an Output Schema for Webpage Content Extraction

If you already know what you want to extract from a webpage, you can define an output schema using Pydantic. This schema acts as a "blueprint" that tells the AI how to structure the response.

Pydantic Schema Quick Guide

Types of Schemas

  1. Simple Schema
    Use this when you want to extract straightforward information, such as a single piece of content.
	from pydantic import BaseModel, Field

# Simple schema for a single webpage
class PageInfoSchema(BaseModel):
    title: str = Field(description="The title of the webpage")
    description: str = Field(description="The description of the webpage")

# Example Output JSON after AI extraction
{
    "title": "ScrapeGraphAI: The Best Content Extraction Tool",
    "description": "ScrapeGraphAI provides powerful tools for structured content extraction from websites."
}

  1. Complex Schema (Nested)
    If you need to extract structured information with multiple related items (like a list of repositories), you can nest schemas.
	from pydantic import BaseModel, Field
from typing import List

# Define a schema for a single repository
class RepositorySchema(BaseModel):
    name: str = Field(description="Name of the repository (e.g., 'owner/repo')")
    description: str = Field(description="Description of the repository")
    stars: int = Field(description="Star count of the repository")
    forks: int = Field(description="Fork count of the repository")
    today_stars: int = Field(description="Stars gained today")
    language: str = Field(description="Programming language used")

# Define a schema for a list of repositories
class ListRepositoriesSchema(BaseModel):
    repositories: List[RepositorySchema] = Field(description="List of GitHub trending repositories")

# Example Output JSON after AI extraction
{
    "repositories": [
        {
            "name": "google-gemini/cookbook",
            "description": "Examples and guides for using the Gemini API",
            "stars": 8036,
            "forks": 1001,
            "today_stars": 649,
            "language": "Jupyter Notebook"
        },
        {
            "name": "TEN-framework/TEN-Agent",
            "description": "TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.",
            "stars": 3224,
            "forks": 311,
            "today_stars": 361,
            "language": "Python"
        }
    ]
}

Key Takeaways

  • Simple Schema: Perfect for small, straightforward extractions.
  • Complex Schema: Use nesting to extract lists or structured data, like "a list of repositories."

Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.

[17]

šŸš€ Initialize scrapegraph and tavily tools and langgraph prebuilt agent and run the extraction

Here we use SmartScraperTool to extract structured data using AI from a webpage.

If you already have an HTML file, you can upload it and use LocalScraperTool instead.

You can find more info in the official langchain documentation

[18]

We then initialize the llm model we want to use in the agent

[5]

Here we use create_react_agent to quickly use one of the prebuilt agents from langgraph.prebuilt module

You can find more info in the official langgraph documentation

[19]

Let's visualize the graph

[8]
Output

Run the graph and stream the agent reasoning.

We are going to ask the agent to extract the content from a specific webpage.

[20]
================================ Human Message =================================

Find latest news related to robotics December 2024
================================== Ai Message ==================================
Tool Calls:
  urls_finder (call_MpBC8kxJoRPFaBEoXLmSZ4RX)
 Call ID: call_MpBC8kxJoRPFaBEoXLmSZ4RX
  Args:
    query: latest robotics news December 2024
================================= Tool Message =================================
Name: urls_finder

[{"url": "https://www.therobotreport.com/category/news/", "content": "The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024. By The Robot Report Staff | December 19, ... Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks. ... December 17, 2024. Slip Robotics picks up"}]
================================== Ai Message ==================================
Tool Calls:
  SmartScraper (call_enar8djgNJfbcZAsp4nY1leM)
 Call ID: call_enar8djgNJfbcZAsp4nY1leM
  Args:
    user_prompt: Extract the latest news articles related to robotics from December 2024, including the title, date, and a brief summary of each article.
    website_url: https://www.therobotreport.com/category/news/
================================= Tool Message =================================
Name: SmartScraper

{"news": [{"title": "Matternet adds ANRA's UTM tech to expand drone delivery", "link": "https://www.therobotreport.com/matternet-adds-anras-utm-tech-to-expand-drone-delivery/", "description": "This latest partnership follows Matternet’s recent launch of a drone delivery operation in Silicon Valley."}, {"title": "Helm.ai upgrades generative AI model to enrich autonomous driving data", "link": "https://www.therobotreport.com/helm-ai-upgrades-generative-ai-model-to-enrich-autonomous-driving-data/", "description": "Helm.ai said the new model enables automakers to generate diverse, realistic video data tailored to specific requirements."}, {"title": "New research analyzes safety of Waymo robotaxis", "link": "https://www.therobotreport.com/new-research-analyzes-safety-of-waymo-robotaxis/", "description": "Waymo shared research with Swiss Re, one of the world’s largest insurance providers, analyzing liability claims related to collisions from 25.3 million fully autonomous miles driven."}, {"title": "From AI to humanoids: top robotics trends of 2024", "link": "https://www.therobotreport.com/from-ai-to-humanoids-top-robotics-trends-of-2024/", "description": "The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024."}, {"title": "Symbotic acquires OhmniLabs, maker of disinfection & telepresence robots", "link": "https://www.therobotreport.com/symbotic-buys-healthcare-robot-maker-ohmnilabs/", "description": "With the acquisition of OhmniLabs, Symbotic said it will be better positioned to expand its capabilities for supply chain customers."}, {"title": "Sanctuary AI shows new dexterity with in-hand manipulation skills", "link": "https://www.therobotreport.com/sanctuary-ai-showing-new-dexterity-with-in-hand-manipulation-skills/", "description": "Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks."}, {"title": "Apptronik partners with Google DeepMind to advance humanoid robots with AI", "link": "https://www.therobotreport.com/apptronik-partners-google-deepmind-advance-humanoid-robots-ai/", "description": "Apptronik will combine its iterative design experience and Apollo humanoid in testing with Google DeepMind’s AI platforms."}, {"title": "Alimak Group, Skyline Robotics create autonomous building maintenance unit", "link": "https://www.therobotreport.com/alimak-group-skyline-robotics-create-autonomous-building-maintenance-unit/", "description": "Skyline Robotics said the joint system can help the industry handle increasingly complex design challenges and labor shortages."}, {"title": "DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall", "link": "https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/", "description": "Beginning today, when certain DoorDash customers in Texas select drone delivery, their order will be delivered via Wing."}, {"title": "Mcity says open-source digital twin enables cheaper autonomous vehicle testing", "link": "https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/", "description": "The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere."}, {"title": "2024: The year humanoids woke up", "link": "https://www.therobotreport.com/2024-the-year-humanoids-woke-up/", "description": "Humanoids empowered by AI are coming, and the long-term market could be huge, Persona AI’s Nic Radford tells columnist Oliver Mitchell."}, {"title": "Waymo robotaxis head to Tokyo with the help of Nihan Kotsu and GO", "link": "https://www.therobotreport.com/waymo-is-heading-to-tokyo-with-the-help-of-nihan-kotsu-and-go/", "description": "The first all-electric Jaguar I-PACEs for Waymo will arrive in Tokyo in early 2025 and will initially be driven by safety drivers."}, {"title": "Realbotix earns Amazon development subsidy; partners with UOL", "link": "https://www.therobotreport.com/realbotix-earns-amazon-development-subsidy-partners-with-uol/", "description": "Realbotix plans to use the funding to directly support the completion of initiatives including the development of Robot Controller 3.0."}, {"title": "Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception", "link": "https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/", "description": "Prototype of the Eyeonic Trace Laser Line Scanner, designed to provide sub–millimeter depth precision for next generation warehouse automation, robotics, farming, construction and manufacturing applications."}, {"title": "Slip Robotics picks up $28M for trailer loading/unloading robots", "link": "https://www.therobotreport.com/slip-robotics-picks-up-28m-for-trailer-loading-unloading-robots/", "description": "Slip Robotics plans to use its latest funding to continue RɦD on its trailer loading/unloading robots as it serves commercial customers."}, {"title": "Jetson Orin Nano Super developer kit available from NVIDIA", "link": "https://www.therobotreport.com/jetson-orin-nano-super-developer-kit-available/", "description": "NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users."}, {"title": "Mbodi and T-Robotics are ABB Robotics' AI Startup Challenge winners", "link": "https://www.therobotreport.com/mbodi-and-t-robotics-are-abb-robotics-ai-startup-challenge-winners/", "description": "ABB Robotics is working with Mbodi and T-Robotics to make industrial robots easier to program and enable them to learn on their own."}, {"title": "IEEE Awards announce Daniela Rus as 2025 Edison Medal recipient", "link": "https://www.therobotreport.com/ieee-awards-announce-daniela-rus-2025-edison-medal-recipient/", "description": "Currently the director of MIT CSAIL, Daniela Rus’ research interests include robotics, mobile computing, and data science."}, {"title": "Eureka Robotics raises $10.5M to scale its vision systems in the U.S.", "link": "https://www.therobotreport.com/eureka-robotics-raises-10-5m-scale-its-vision-systems-in-u-s/", "description": "Eureka Robotics provides software and system to automate tasks that require high accuracy and high agility."}, {"title": "Vision-guided cobot automates paint process for DENSO", "link": "https://www.therobotreport.com/denso-automates-paint-process-vision-guided-cobot/", "description": "DENSO deployed a 3D-vision-guided cobot with AI-based motion planning and control software to relieve employees of strenuous, tedious tasks."}, {"title": "Brushed DC motors find use in robot applications, humanoid development", "link": "https://www.therobotreport.com/brushed-dc-motors-find-use-in-robot-applications-humanoid-development/", "description": "Recent research from Portescap found that brushed DC motors best fulfill the high requirements of humanoid robots."}, {"title": "Diversity and inclusion can accelerate robotics innovation, finds Max Planck study", "link": "https://www.therobotreport.com/diversity-and-inclusion-can-accelerate-robotic-innovation-finds-max-planck-study/", "description": "The study outlined seven distinct benefits that diversity and inclusion bring to robotics research and innovation."}, {"title": "Advanced Precision Strain Wave Gear Offers Torque Sensing to Robots", "link": "https://www.therobotreport.com/advanced-precision-strain-wave-gear-offers-torque-sensing-to-robots/", "description": "NA"}, {"title": "Innovative motion solutions are supporting the latest trends in robotics", "link": "https://www.therobotreport.com/innovative-motion-solutions-are-supporting-the-latest-trends-in-robotics/", "description": "NA"}, {"title": "Renishaw and RLS help to drive a robot revolution", "link": "https://www.therobotreport.com/renishaw-and-rls-help-to-drive-a-robot-revolution/", "description": "NA"}, {"title": "Ask an Expert Podcast: flexible conveyance for materials handling", "link": "https://www.therobotreport.com/ask-an-expert-flexible-conveyors-for-materials-handling/", "description": "NA"}, {"title": "Hop Onboard the AMR Revolution: Vision & Localization Unleashed", "link": "https://www.therobotreport.com/hop-onboard-the-amr-revolution-vision-localization-unleashed/", "description": "NA"}]}
[21]

Print the response

[23]
{
  "news": [
    {
      "title": "Matternet adds ANRA's UTM tech to expand drone delivery",
      "link": "https://www.therobotreport.com/matternet-adds-anras-utm-tech-to-expand-drone-delivery/",
      "description": "This latest partnership follows Matternet\u2019s recent launch of a drone delivery operation in Silicon Valley."
    },
    {
      "title": "Helm.ai upgrades generative AI model to enrich autonomous driving data",
      "link": "https://www.therobotreport.com/helm-ai-upgrades-generative-ai-model-to-enrich-autonomous-driving-data/",
      "description": "Helm.ai said the new model enables automakers to generate diverse, realistic video data tailored to specific requirements."
    },
    {
      "title": "New research analyzes safety of Waymo robotaxis",
      "link": "https://www.therobotreport.com/new-research-analyzes-safety-of-waymo-robotaxis/",
      "description": "Waymo shared research with Swiss Re, one of the world\u2019s largest insurance providers, analyzing liability claims related to collisions from 25.3 million fully autonomous miles driven."
    },
    {
      "title": "From AI to humanoids: top robotics trends of 2024",
      "link": "https://www.therobotreport.com/from-ai-to-humanoids-top-robotics-trends-of-2024/",
      "description": "The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024."
    },
    {
      "title": "Symbotic acquires OhmniLabs, maker of disinfection & telepresence robots",
      "link": "https://www.therobotreport.com/symbotic-buys-healthcare-robot-maker-ohmnilabs/",
      "description": "With the acquisition of OhmniLabs, Symbotic said it will be better positioned to expand its capabilities for supply chain customers."
    },
    {
      "title": "Sanctuary AI shows new dexterity with in-hand manipulation skills",
      "link": "https://www.therobotreport.com/sanctuary-ai-showing-new-dexterity-with-in-hand-manipulation-skills/",
      "description": "Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks."
    },
    {
      "title": "Apptronik partners with Google DeepMind to advance humanoid robots with AI",
      "link": "https://www.therobotreport.com/apptronik-partners-google-deepmind-advance-humanoid-robots-ai/",
      "description": "Apptronik will combine its iterative design experience and Apollo humanoid in testing with Google DeepMind\u2019s AI platforms."
    },
    {
      "title": "Alimak Group, Skyline Robotics create autonomous building maintenance unit",
      "link": "https://www.therobotreport.com/alimak-group-skyline-robotics-create-autonomous-building-maintenance-unit/",
      "description": "Skyline Robotics said the joint system can help the industry handle increasingly complex design challenges and labor shortages."
    },
    {
      "title": "DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall",
      "link": "https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/",
      "description": "Beginning today, when certain DoorDash customers in Texas select drone delivery, their order will be delivered via Wing."
    },
    {
      "title": "Mcity says open-source digital twin enables cheaper autonomous vehicle testing",
      "link": "https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/",
      "description": "The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere."
    },
    {
      "title": "2024: The year humanoids woke up",
      "link": "https://www.therobotreport.com/2024-the-year-humanoids-woke-up/",
      "description": "Humanoids empowered by AI are coming, and the long-term market could be huge, Persona AI\u2019s Nic Radford tells columnist Oliver Mitchell."
    },
    {
      "title": "Waymo robotaxis head to Tokyo with the help of Nihan Kotsu and GO",
      "link": "https://www.therobotreport.com/waymo-is-heading-to-tokyo-with-the-help-of-nihan-kotsu-and-go/",
      "description": "The first all-electric Jaguar I-PACEs for Waymo will arrive in Tokyo in early 2025 and will initially be driven by safety drivers."
    },
    {
      "title": "Realbotix earns Amazon development subsidy; partners with UOL",
      "link": "https://www.therobotreport.com/realbotix-earns-amazon-development-subsidy-partners-with-uol/",
      "description": "Realbotix plans to use the funding to directly support the completion of initiatives including the development of Robot Controller 3.0."
    },
    {
      "title": "Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception",
      "link": "https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/",
      "description": "Prototype of the Eyeonic Trace Laser Line Scanner, designed to provide sub\u2013millimeter depth precision for next generation warehouse automation, robotics, farming, construction and manufacturing applications."
    },
    {
      "title": "Slip Robotics picks up $28M for trailer loading/unloading robots",
      "link": "https://www.therobotreport.com/slip-robotics-picks-up-28m-for-trailer-loading-unloading-robots/",
      "description": "Slip Robotics plans to use its latest funding to continue R\u0266D on its trailer loading/unloading robots as it serves commercial customers."
    },
    {
      "title": "Jetson Orin Nano Super developer kit available from NVIDIA",
      "link": "https://www.therobotreport.com/jetson-orin-nano-super-developer-kit-available/",
      "description": "NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users."
    },
    {
      "title": "Mbodi and T-Robotics are ABB Robotics' AI Startup Challenge winners",
      "link": "https://www.therobotreport.com/mbodi-and-t-robotics-are-abb-robotics-ai-startup-challenge-winners/",
      "description": "ABB Robotics is working with Mbodi and T-Robotics to make industrial robots easier to program and enable them to learn on their own."
    },
    {
      "title": "IEEE Awards announce Daniela Rus as 2025 Edison Medal recipient",
      "link": "https://www.therobotreport.com/ieee-awards-announce-daniela-rus-2025-edison-medal-recipient/",
      "description": "Currently the director of MIT CSAIL, Daniela Rus\u2019 research interests include robotics, mobile computing, and data science."
    },
    {
      "title": "Eureka Robotics raises $10.5M to scale its vision systems in the U.S.",
      "link": "https://www.therobotreport.com/eureka-robotics-raises-10-5m-scale-its-vision-systems-in-u-s/",
      "description": "Eureka Robotics provides software and system to automate tasks that require high accuracy and high agility."
    },
    {
      "title": "Vision-guided cobot automates paint process for DENSO",
      "link": "https://www.therobotreport.com/denso-automates-paint-process-vision-guided-cobot/",
      "description": "DENSO deployed a 3D-vision-guided cobot with AI-based motion planning and control software to relieve employees of strenuous, tedious tasks."
    },
    {
      "title": "Brushed DC motors find use in robot applications, humanoid development",
      "link": "https://www.therobotreport.com/brushed-dc-motors-find-use-in-robot-applications-humanoid-development/",
      "description": "Recent research from Portescap found that brushed DC motors best fulfill the high requirements of humanoid robots."
    },
    {
      "title": "Diversity and inclusion can accelerate robotics innovation, finds Max Planck study",
      "link": "https://www.therobotreport.com/diversity-and-inclusion-can-accelerate-robotic-innovation-finds-max-planck-study/",
      "description": "The study outlined seven distinct benefits that diversity and inclusion bring to robotics research and innovation."
    },
    {
      "title": "Advanced Precision Strain Wave Gear Offers Torque Sensing to Robots",
      "link": "https://www.therobotreport.com/advanced-precision-strain-wave-gear-offers-torque-sensing-to-robots/",
      "description": "NA"
    },
    {
      "title": "Innovative motion solutions are supporting the latest trends in robotics",
      "link": "https://www.therobotreport.com/innovative-motion-solutions-are-supporting-the-latest-trends-in-robotics/",
      "description": "NA"
    },
    {
      "title": "Renishaw and RLS help to drive a robot revolution",
      "link": "https://www.therobotreport.com/renishaw-and-rls-help-to-drive-a-robot-revolution/",
      "description": "NA"
    },
    {
      "title": "Ask an Expert Podcast: flexible conveyance for materials handling",
      "link": "https://www.therobotreport.com/ask-an-expert-flexible-conveyors-for-materials-handling/",
      "description": "NA"
    },
    {
      "title": "Hop Onboard the AMR Revolution: Vision & Localization Unleashed",
      "link": "https://www.therobotreport.com/hop-onboard-the-amr-revolution-vision-localization-unleashed/",
      "description": "NA"
    }
  ]
}

šŸ’¾ Save the output to a CSV file

Let's create a pandas dataframe and show the table with the extracted content

[24]

Save it to CSV

[26]
Data saved to news.csv

šŸ”— Resources

ScrapeGraph API Banner

Made with ā¤ļø by the ScrapeGraphAI Team