feat: update scrapegraph tool for scrapegraph-py v2.0.0 SDK#7584
Open
VinciGit00 wants to merge 1 commit intoagno-agi:mainfrom
Open
feat: update scrapegraph tool for scrapegraph-py v2.0.0 SDK#7584VinciGit00 wants to merge 1 commit intoagno-agi:mainfrom
VinciGit00 wants to merge 1 commit intoagno-agi:mainfrom
Conversation
The scrapegraph-py SDK underwent a major rewrite in 2.0.0: - Client class renamed from `Client` to `ScrapeGraphAI` - Methods now take typed pydantic request models (ScrapeRequest, ExtractRequest, SearchRequest, CrawlRequest) and return a unified `ApiResult[T]` rather than raising on error - `smartscraper` is now `extract` - `searchscraper` is now `search` - `markdownify` is gone — use `scrape` with a MarkdownFormatConfig - `crawl` is under the `crawl.start(...)` resource - `agenticscraper` has been removed upstream - `sgai_logger` has been removed upstream This refactors the ScrapeGraphTools toolkit to use the new SDK surface while keeping the tool's public method names stable: - smartscraper → SDK extract with JSON extraction - markdownify → SDK scrape with markdown format - searchscraper → SDK search - crawl → SDK crawl.start with JSON format + schema - scrape → SDK scrape with html format (render_heavy_js maps to fetchConfig.mode="js") The `agentic_crawler` method has been removed since the underlying endpoint no longer exists. Unit tests are updated to match the new SDK surface and response shapes. Live-tested against the v2 API for smartscraper, markdownify, scrape and searchscraper.
Contributor
PR TriageMissing issue link: Please link the issue this PR addresses using |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #7603
Summary
The upstream
scrapegraph-pySDK has been rewritten in v2.0.0 with a new API surface (ScrapeGraphAIclient, typed pydantic request models, unifiedApiResult[T]returns, endpoint additions/removals). This PR updatesScrapeGraphToolsso it works against the new SDK and continues to expose a stable toolkit API to agents.Mapping of the toolkit methods onto the new SDK:
smartscraper(url, prompt)client.smartscraper(...)client.extract(ExtractRequest(...))markdownify(url)client.markdownify(...)client.scrape(ScrapeRequest(formats=[MarkdownFormatConfig()]))searchscraper(prompt)client.searchscraper(...)client.search(SearchRequest(query=...))crawl(url, prompt, schema, ...)client.crawl(...)client.crawl.start(CrawlRequest(formats=[JsonFormatConfig(prompt=..., schema=...)]))scrape(url, headers)client.scrape(...)client.scrape(ScrapeRequest(formats=[HtmlFormatConfig()], fetch_config=...))agentic_crawler(...)client.agenticscraper(...)Other changes:
sgai_loggerimport/usage removed — the SDK no longer ships a loggerrender_heavy_js=Truenow maps tofetchConfig.mode="js"on the relevant requestsApiResult[T]instead of raising — a small_unwrap()helper translatesstatus="error"into an exception so existingtry/exceptpaths in each tool continue to return the"Error: ..."string users expectscrapegraph = ["scrapegraph-py>=2.0.0"]Type of change
agentic_crawler/enable_agentic_crawlerare removed, and thecrawl()signature no longer accepts the v1-only params (cache_website,same_domain_only,batch_size)Testing
pytest libs/agno/tests/unit/tools/test_scrapegraph.py)markdownify,scrape,smartscraper(extract) andsearchscraperall return successful responses onhttps://example.comruff check/ruff formatclean on the changed files; no newmypyerrors introduced (the one remaining error inlibs/agno/agno/models/base.py:2180is pre-existing onmain)Checklist
cookbook/91_tools/scrapegraph_tools.pystill works without changes (the toolkit's public method signatures are preserved)venv)Duplicate and AI-Generated PR Check
Additional Notes
Users upgrading will need
scrapegraph-py>=2.0.0, which itself requires Python 3.12+. Sincescrapegraphis an optional extra, this does not change the minimum Python version foragnoitself.