Skip to content

feat: update scrapegraph tool for scrapegraph-py v2.0.0 SDK#7584

Open
VinciGit00 wants to merge 1 commit intoagno-agi:mainfrom
VinciGit00:feat-scrapegraph-sdk-v2
Open

feat: update scrapegraph tool for scrapegraph-py v2.0.0 SDK#7584
VinciGit00 wants to merge 1 commit intoagno-agi:mainfrom
VinciGit00:feat-scrapegraph-sdk-v2

Conversation

@VinciGit00
Copy link
Copy Markdown
Contributor

@VinciGit00 VinciGit00 commented Apr 19, 2026

Closes #7603

Summary

The upstream scrapegraph-py SDK has been rewritten in v2.0.0 with a new API surface (ScrapeGraphAI client, typed pydantic request models, unified ApiResult[T] returns, endpoint additions/removals). This PR updates ScrapeGraphTools so it works against the new SDK and continues to expose a stable toolkit API to agents.

Mapping of the toolkit methods onto the new SDK:

Toolkit method SDK v1 SDK v2
smartscraper(url, prompt) client.smartscraper(...) client.extract(ExtractRequest(...))
markdownify(url) client.markdownify(...) client.scrape(ScrapeRequest(formats=[MarkdownFormatConfig()]))
searchscraper(prompt) client.searchscraper(...) client.search(SearchRequest(query=...))
crawl(url, prompt, schema, ...) client.crawl(...) client.crawl.start(CrawlRequest(formats=[JsonFormatConfig(prompt=..., schema=...)]))
scrape(url, headers) client.scrape(...) client.scrape(ScrapeRequest(formats=[HtmlFormatConfig()], fetch_config=...))
agentic_crawler(...) client.agenticscraper(...) removed (endpoint no longer exists in v2)

Other changes:

  • sgai_logger import/usage removed — the SDK no longer ships a logger
  • render_heavy_js=True now maps to fetchConfig.mode="js" on the relevant requests
  • SDK returns ApiResult[T] instead of raising — a small _unwrap() helper translates status="error" into an exception so existing try/except paths in each tool continue to return the "Error: ..." string users expect
  • Extras pin updated: scrapegraph = ["scrapegraph-py>=2.0.0"]

Type of change

  • Improvement
  • Breaking change — agentic_crawler / enable_agentic_crawler are removed, and the crawl() signature no longer accepts the v1-only params (cache_website, same_domain_only, batch_size)

Testing

  • Unit tests rewritten against the new SDK mocks — all 8 pass locally (pytest libs/agno/tests/unit/tools/test_scrapegraph.py)
  • Live-tested against the v2 API with a real API key: markdownify, scrape, smartscraper (extract) and searchscraper all return successful responses on https://example.com
  • ruff check / ruff format clean on the changed files; no new mypy errors introduced (the one remaining error in libs/agno/agno/models/base.py:2180 is pre-existing on main)

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts on changed files
  • Self-review completed
  • Documentation updated (docstrings)
  • Examples and guides: cookbook/91_tools/scrapegraph_tools.py still works without changes (the toolkit's public method signatures are preserved)
  • Tested in clean environment (fresh venv)
  • Tests added/updated

Duplicate and AI-Generated PR Check

  • I have searched existing open pull requests and confirmed that no other PR already addresses this
  • This PR was drafted with AI assistance (Claude Code) and human-reviewed

Additional Notes

Users upgrading will need scrapegraph-py>=2.0.0, which itself requires Python 3.12+. Since scrapegraph is an optional extra, this does not change the minimum Python version for agno itself.

The scrapegraph-py SDK underwent a major rewrite in 2.0.0:
- Client class renamed from `Client` to `ScrapeGraphAI`
- Methods now take typed pydantic request models (ScrapeRequest,
  ExtractRequest, SearchRequest, CrawlRequest) and return a unified
  `ApiResult[T]` rather than raising on error
- `smartscraper` is now `extract`
- `searchscraper` is now `search`
- `markdownify` is gone — use `scrape` with a MarkdownFormatConfig
- `crawl` is under the `crawl.start(...)` resource
- `agenticscraper` has been removed upstream
- `sgai_logger` has been removed upstream

This refactors the ScrapeGraphTools toolkit to use the new SDK surface
while keeping the tool's public method names stable:
- smartscraper → SDK extract with JSON extraction
- markdownify → SDK scrape with markdown format
- searchscraper → SDK search
- crawl → SDK crawl.start with JSON format + schema
- scrape → SDK scrape with html format (render_heavy_js maps to
  fetchConfig.mode="js")

The `agentic_crawler` method has been removed since the underlying
endpoint no longer exists. Unit tests are updated to match the new
SDK surface and response shapes. Live-tested against the v2 API for
smartscraper, markdownify, scrape and searchscraper.
@VinciGit00 VinciGit00 requested a review from a team as a code owner April 19, 2026 08:15
@github-actions
Copy link
Copy Markdown
Contributor

PR Triage

Missing issue link: Please link the issue this PR addresses using fixes #<issue_number>, closes #<issue_number>, or resolves #<issue_number> in the PR description. If there is no existing issue, please create one first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update ScrapeGraphTools for scrapegraph-py v2.0.0 SDK

1 participant