Skip to content

Latest commit

 

History

History
168 lines (119 loc) · 4.4 KB

File metadata and controls

168 lines (119 loc) · 4.4 KB
title Installation
description Install and get started with ScrapeGraphAI v2 SDKs

Prerequisites


Python SDK

Requires Python ≥ 3.12.

pip install "scrapegraph-py>=2.1.0"

Usage:

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI(api_key="your-api-key-here")

# Extract data from a website
res = sgai.extract(
    "Extract information about the company",
    url="https://scrapegraphai.com",
)
print(res.data.json_data if res.status == "success" else res.error)
You can also set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `sgai = ScrapeGraphAI()`.

For more advanced usage, see the Python SDK documentation.


JavaScript SDK

Requires Node.js >= 22.

Install using npm, pnpm, yarn, or bun:

# Using npm
npm i scrapegraph-js

# Using pnpm
pnpm i scrapegraph-js

# Using yarn
yarn add scrapegraph-js

# Using bun
bun add scrapegraph-js

Usage:

import scrapegraphai from "scrapegraph-js";

const sgai = scrapegraphai({ apiKey: "your-api-key-here" });

const { data } = await sgai.extract(
  "https://scrapegraphai.com",
  { prompt: "What does the company do?" }
);

console.log(data);
Store your API keys securely in environment variables. Use `.env` files and libraries like `dotenv` to load them into your app.

For more advanced usage, see the JavaScript SDK documentation.


Key Concepts

Scrape (formerly Markdownify)

Convert any webpage into markdown, HTML, screenshot, or branding format. Learn more

Extract (formerly SmartScraper)

Extract specific information from any webpage using AI. Provide a URL and a prompt describing what you want to extract. Learn more

Search (formerly SearchScraper)

Search and extract information from multiple web sources using AI. Start with just a query - Search will find relevant websites and extract the information you need. Learn more

Crawl (formerly SmartCrawler)

Multi-page website crawling with flexible output formats. Traverse multiple pages, follow links, and return content in your preferred format. Learn more

Monitor

Scheduled web monitoring with AI-powered extraction. Set up recurring scraping jobs that automatically extract data on a cron schedule. Learn more

Structured Output with Schemas

Both SDKs support structured output using schemas:

  • Python: Use Pydantic models
  • JavaScript: Use Zod schemas

Example: Extract Structured Data with Schema

Python Example

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI(api_key="your-api-key")

res = sgai.extract(
    "Extract company information",
    url="https://scrapegraphai.com",
    schema={
        "type": "object",
        "properties": {
            "company_name": {"type": "string", "description": "The company name"},
            "description":  {"type": "string", "description": "Company description"},
            "website":      {"type": "string", "description": "Company website URL"},
            "industry":     {"type": "string", "description": "Industry sector"},
        },
        "required": ["company_name"],
    },
)
print(res.data.json_data if res.status == "success" else res.error)

JavaScript Example

import scrapegraphai from "scrapegraph-js";
import { z } from "zod";

const sgai = scrapegraphai({ apiKey: "your-api-key" });

const CompanySchema = z.object({
  companyName: z.string().describe("The company name"),
  description: z.string().describe("Company description"),
  website: z.string().url().describe("Company website URL"),
  industry: z.string().describe("Industry sector"),
});

const { data } = await sgai.extract(
  "https://scrapegraphai.com",
  {
    prompt: "Extract company information",
    schema: CompanySchema,
  }
);
console.log(data);

Next Steps