Skip to content

Close response body streams after processing#503

Merged
freekmurze merged 1 commit intomainfrom
close-response-body-streams
Mar 20, 2026
Merged

Close response body streams after processing#503
freekmurze merged 1 commit intomainfrom
close-response-body-streams

Conversation

@freekmurze
Copy link
Copy Markdown
Member

@freekmurze freekmurze commented Mar 20, 2026

Summary

  • Explicitly close response body streams in CrawlRequestFulfilled after processing each URL
  • When using StreamHandler with stream mode, unclosed response bodies keep file descriptors open. Over a large crawl (1000+ URLs), this exhausts the OS file descriptor limit and causes subsequent requests to fail with connection errors
  • The finally block ensures streams are closed even if an exception occurs during processing
  • Observers can still read the body via CrawlResponse::body() which uses the cached string that was read before the stream is closed

…or leaks

When using StreamHandler with stream mode enabled, response body streams
that are not explicitly closed keep their underlying file descriptors open.
Over a large crawl (1000+ URLs), this can exhaust the OS file descriptor
limit and cause subsequent requests to fail with connection errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@freekmurze freekmurze merged commit 4e95a01 into main Mar 20, 2026
10 checks passed
@freekmurze freekmurze deleted the close-response-body-streams branch March 20, 2026 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant