Scrapegraph Sdk

scrapegraph-pysdk-nodejscookbookscrapingjson-schemahomes-forsalesdk-pythonweb-scraping-pythonweb-scrapingsdk-jsapiPythonscrapegraphweb-crawler

🕷️ Extract Houses Listing with Official Scrapegraph SDK

Presentazione senza titolo.pptx (6).png

🔧 Install dependencies

[ ]

🔑 Import ScrapeGraph API key

You can find the Scrapegraph API key here

[ ]
SGAI_API_KEY found in environment.

📝 Defining an Output Schema for Webpage Content Extraction

If you already know what you want to extract from a webpage, you can define an output schema using Pydantic. This schema acts as a "blueprint" that tells the AI how to structure the response.

Pydantic Schema Quick Guide

Types of Schemas

  1. Simple Schema
    Use this when you want to extract straightforward information, such as a single piece of content.
	from pydantic import BaseModel, Field

# Simple schema for a single webpage
class PageInfoSchema(BaseModel):
    title: str = Field(description="The title of the webpage")
    description: str = Field(description="The description of the webpage")

# Example Output JSON after AI extraction
{
    "title": "ScrapeGraphAI: The Best Content Extraction Tool",
    "description": "ScrapeGraphAI provides powerful tools for structured content extraction from websites."
}

  1. Complex Schema (Nested)
    If you need to extract structured information with multiple related items (like a list of repositories), you can nest schemas.
	from pydantic import BaseModel, Field
from typing import List

# Define a schema for a single repository
class RepositorySchema(BaseModel):
    name: str = Field(description="Name of the repository (e.g., 'owner/repo')")
    description: str = Field(description="Description of the repository")
    stars: int = Field(description="Star count of the repository")
    forks: int = Field(description="Fork count of the repository")
    today_stars: int = Field(description="Stars gained today")
    language: str = Field(description="Programming language used")

# Define a schema for a list of repositories
class ListRepositoriesSchema(BaseModel):
    repositories: List[RepositorySchema] = Field(description="List of GitHub trending repositories")

# Example Output JSON after AI extraction
{
    "repositories": [
        {
            "name": "google-gemini/cookbook",
            "description": "Examples and guides for using the Gemini API",
            "stars": 8036,
            "forks": 1001,
            "today_stars": 649,
            "language": "Jupyter Notebook"
        },
        {
            "name": "TEN-framework/TEN-Agent",
            "description": "TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.",
            "stars": 3224,
            "forks": 311,
            "today_stars": 361,
            "language": "Python"
        }
    ]
}

Key Takeaways

  • Simple Schema: Perfect for small, straightforward extractions.
  • Complex Schema: Use nesting to extract lists or structured data, like "a list of repositories."

Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.

[ ]

🚀 Initialize SGAI Client and start extraction

Initialize the client for scraping (there's also an async version here)

[ ]

Here we use Smartscraper service to extract structured data using AI from a webpage.

If you already have an HTML file, you can upload it and use Localscraper instead.

[ ]

Print the response

[ ]
Request ID: 4e023916-2a41-40ea-bea5-efc422daf33e
{
  "houses": [
    {
      "price": 549000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 477,
      "address": "380 14th St Unit 405",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94103",
      "tags": [
        "New construction"
      ],
      "agent_name": "Eddie O'Sullivan",
      "agency": "Elevation Real Estate"
    },
    {
      "price": 1799000,
      "bedrooms": 4,
      "bathrooms": 2,
      "square_feet": 2735,
      "address": "123 Grattan St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94117",
      "tags": [],
      "agent_name": "Sean Engmann",
      "agency": "eXp Realty of Northern CA Inc."
    },
    {
      "price": 1995000,
      "bedrooms": 7,
      "bathrooms": 3,
      "square_feet": 3330,
      "address": "1590 Washington St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94109",
      "tags": [],
      "agent_name": "Eddie O'Sullivan",
      "agency": "Elevation Real Estate"
    },
    {
      "price": 549000,
      "bedrooms": 0,
      "bathrooms": 1,
      "square_feet": 477,
      "address": "240 Lombard St Unit 835",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94111",
      "tags": [],
      "agent_name": "Tim Gullicksen",
      "agency": "Corcoran Icon Properties"
    },
    {
      "price": 5495000,
      "bedrooms": 10,
      "bathrooms": 7,
      "square_feet": 6505,
      "address": "1057 Steiner St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94115",
      "tags": [],
      "agent_name": "Bonnie Spindler",
      "agency": "Corcoran Icon Properties"
    },
    {
      "price": 925000,
      "bedrooms": 2,
      "bathrooms": 1,
      "square_feet": 779,
      "address": "2 Fallon Place Unit 57",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94133",
      "tags": [],
      "agent_name": "Eddie O'Sullivan",
      "agency": "Elevation Real Estate"
    },
    {
      "price": 898000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1175,
      "address": "5160 Diamond Heights Blvd Unit 208C",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94131",
      "tags": [],
      "agent_name": "Joe Polyak",
      "agency": "Rise Homes"
    },
    {
      "price": 1700000,
      "bedrooms": 4,
      "bathrooms": 2,
      "square_feet": 1950,
      "address": "1351 26th Ave",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94122",
      "tags": [],
      "agent_name": "Glenda Queensbury",
      "agency": "Referral Realty-BV"
    },
    {
      "price": 1899000,
      "bedrooms": 3,
      "bathrooms": 2,
      "square_feet": 1560,
      "address": "340 Yerba Buena Ave",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94127",
      "tags": [],
      "agent_name": "Jeannie Anderson",
      "agency": "Coldwell Banker Realty"
    },
    {
      "price": 850000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1055,
      "address": "588 Minna Unit 604",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94103",
      "tags": [],
      "agent_name": "Mohamed Lakdawala",
      "agency": "Remax Prestigious Properties"
    },
    {
      "price": 1990000,
      "bedrooms": 3,
      "bathrooms": 1,
      "square_feet": 1280,
      "address": "1450 Diamond St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94131",
      "tags": [],
      "agent_name": "Mary Anne Villamil",
      "agency": "Kinetic Real Estate"
    },
    {
      "price": 849000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 855,
      "address": "81 Lansing St Unit 401",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94105",
      "tags": [],
      "agent_name": "Kristen Haenggi",
      "agency": "Compass"
    },
    {
      "price": 1080000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 936,
      "address": "451 Kansas St Unit 466",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94107",
      "tags": [],
      "agent_name": "Maureen DeBoer",
      "agency": "LKJ Realty"
    },
    {
      "price": 1499000,
      "bedrooms": 4,
      "bathrooms": 2,
      "square_feet": 2145,
      "address": "486 Yale St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94134",
      "tags": [],
      "agent_name": "Alicia Atienza",
      "agency": "Statewide Realty"
    },
    {
      "price": 1140000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 998,
      "address": "588 Minna Unit 801",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94103",
      "tags": [],
      "agent_name": "Milan Jezdimirovic",
      "agency": "Compass"
    },
    {
      "price": 1988000,
      "bedrooms": 2,
      "bathrooms": 1,
      "square_feet": 3800,
      "address": "183 19th Ave",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94121",
      "tags": [
        "Amazing Property",
        "Marina Style",
        "Needs TLC"
      ],
      "agent_name": "Leo Cheung",
      "agency": "eXp Realty of California, Inc"
    },
    {
      "price": 1218000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1275,
      "address": "1998 Pacific Ave Unit 202",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94109",
      "tags": [
        "Light-filled",
        "Freshly painted",
        "Walker's paradise"
      ],
      "agent_name": "Grace Sun",
      "agency": "Compass"
    },
    {
      "price": 895000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 837,
      "address": "425 1st St Unit 2501",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94105",
      "tags": [
        "Unobstructed bay bridge views",
        "Open layout"
      ],
      "agent_name": "Matt Fuller",
      "agency": "Jackson Fuller Real Estate"
    },
    {
      "price": 1499000,
      "bedrooms": 3,
      "bathrooms": 1,
      "square_feet": 1500,
      "address": "Unlisted Address",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "NA",
      "tags": [
        "Contractor's Special",
        "Fixer-upper"
      ],
      "agent_name": "Jaymee Faith Sagisi",
      "agency": "IMPACT"
    },
    {
      "price": 900000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 930,
      "address": "1101 Green St Unit 302",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94109",
      "tags": [
        "Historic Art Deco",
        "Iconic views"
      ],
      "agent_name": "NA",
      "agency": "NA"
    },
    {
      "price": 858000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 1104,
      "address": "260 King St Unit 557",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94107",
      "tags": [],
      "agent_name": "Miyuki Takami",
      "agency": "eXp Realty of California, Inc"
    },
    {
      "price": 945000,
      "bedrooms": 2,
      "bathrooms": 1,
      "square_feet": 767,
      "address": "307 Page St Unit 1",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94102",
      "tags": [],
      "agent_name": "NA",
      "agency": "NA"
    },
    {
      "price": 1099000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1330,
      "address": "1080 Sutter St Unit 202",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94109",
      "tags": [],
      "agent_name": "Annette Liberty",
      "agency": "Coldwell Banker Realty"
    },
    {
      "price": 950000,
      "bedrooms": 4,
      "bathrooms": 3,
      "square_feet": 2090,
      "address": "3328 26th St Unit 3330",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94110",
      "tags": [],
      "agent_name": "Isaac Munene",
      "agency": "Coldwell Banker Realty"
    },
    {
      "price": 1088000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1065,
      "address": "1776 Sacramento St Unit 503",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94109",
      "tags": [],
      "agent_name": "Marilyn Becklehimer",
      "agency": "Dio Real Estate"
    },
    {
      "price": 1788888,
      "bedrooms": 4,
      "bathrooms": 3,
      "square_feet": 1856,
      "address": "2317 15th St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94114",
      "tags": [],
      "agent_name": "Joel Gile",
      "agency": "Sequoia Real Estate"
    },
    {
      "price": 1650000,
      "bedrooms": 3,
      "bathrooms": 2,
      "square_feet": 1547,
      "address": "2475 47th Ave",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94116",
      "tags": [],
      "agent_name": "Lucy Goldenshteyn",
      "agency": "Redfin"
    },
    {
      "price": 998000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1202,
      "address": "50 Lansing St Unit 201",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94105",
      "tags": [],
      "agent_name": "Tracey Broadman",
      "agency": "Vanguard Properties"
    },
    {
      "price": 1595000,
      "bedrooms": 3,
      "bathrooms": 5,
      "square_feet": 1995,
      "address": "15 Joy St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94110",
      "tags": [],
      "agent_name": "Mike Stack",
      "agency": "Vanguard Properties"
    },
    {
      "price": 1028000,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1065,
      "address": "50 Lansing St Unit 403",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94105",
      "tags": [],
      "agent_name": "Robyn Kaufman",
      "agency": "Vivre Real Estate"
    },
    {
      "price": 999000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 1021,
      "address": "338 Spear St Unit 6J",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94105",
      "tags": [
        "Spacious",
        "Balcony",
        "Bright courtyard views"
      ],
      "agent_name": "Paul Hwang",
      "agency": "Skybox Realty"
    },
    {
      "price": 799800,
      "bedrooms": 2,
      "bathrooms": 2,
      "square_feet": 1109,
      "address": "10 Innes Ct",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94124",
      "tags": [
        "New Construction"
      ],
      "agent_name": "Lennar",
      "agency": "Lennar"
    },
    {
      "price": 529880,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 740,
      "address": "10 Innes Ct",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94124",
      "tags": [
        "New Construction"
      ],
      "agent_name": "Lennar",
      "agency": "Lennar"
    },
    {
      "price": 489000,
      "bedrooms": 1,
      "bathrooms": 1,
      "square_feet": 741,
      "address": "10 Innes Ct",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94124",
      "tags": [
        "New Construction"
      ],
      "agent_name": "Lennar",
      "agency": "Lennar"
    },
    {
      "price": 1359000,
      "bedrooms": 4,
      "bathrooms": 2,
      "square_feet": 1845,
      "address": "170 Thrift St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94112",
      "tags": [
        "Updated",
        "Single-family home"
      ],
      "agent_name": "Cristal Wright",
      "agency": "Vanguard Properties"
    },
    {
      "price": 1295000,
      "bedrooms": 3,
      "bathrooms": 1,
      "square_feet": 1214,
      "address": "1922 43rd Ave",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94116",
      "tags": [],
      "agent_name": "Mila Romprey",
      "agency": "Premier Realty Associates"
    },
    {
      "price": 1098000,
      "bedrooms": 3,
      "bathrooms": 1,
      "square_feet": 1006,
      "address": "150 Putnam St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94110",
      "tags": [],
      "agent_name": "Genie Mantzoros",
      "agency": "Epic Real Estate & Asso. Inc."
    },
    {
      "price": 1189870,
      "bedrooms": 3,
      "bathrooms": 2,
      "square_feet": 1436,
      "address": "327 Ordway St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94134",
      "tags": [],
      "agent_name": "Shawn Zahraie",
      "agency": "Affinity Enterprises, Inc"
    },
    {
      "price": 899000,
      "bedrooms": 2,
      "bathrooms": 1,
      "square_feet": 1118,
      "address": "272 Farallones St",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94112",
      "tags": [],
      "agent_name": "Janice Lee",
      "agency": "Coldwell Banker Realty"
    },
    {
      "price": 30000,
      "bedrooms": 0,
      "bathrooms": 0,
      "square_feet": 0,
      "address": "0 Evans Ave",
      "city": "San Francisco",
      "state": "CA",
      "zip_code": "94124",
      "tags": [
        "Land",
        "0.12 Acre",
        "$251,467 per Acre"
      ],
      "agent_name": "Heidy Carrera",
      "agency": "Berkshire Hathaway HomeService"
    }
  ]
}

💾 Save the output to a CSV file

Let's create a pandas dataframe and show the table with the extracted content

[ ]

Save it to CSV

[ ]
Data saved to zillow_forsale.csv

🔗 Resources

ScrapeGraph API Banner

Made with ❤️ by the ScrapeGraphAI Team