Skip to content

Optimize chunked write path and add tryWrite#1922

Open
GetThatCookie wants to merge 10 commits intouNetworking:masterfrom
GetThatCookie:master
Open

Optimize chunked write path and add tryWrite#1922
GetThatCookie wants to merge 10 commits intouNetworking:masterfrom
GetThatCookie:master

Conversation

@GetThatCookie
Copy link
Copy Markdown
Contributor

Attempt to improve chunked streaming by integrating tryWrite and tightening write

@uNetworkingAB I wanted to try a small uWS-side improvement for the chunked
/ no-content-length HTTP response path discussed around:

My intent here was not to present this as a perfect or final answer, but to
test whether integrating tryWrite into this area could also let me tighten
the existing write path with a small and contained change surface.

What made this seem worth trying:

  • I noticed the code still explicitly called out tryWrite as not yet
    integrated in this path
  • the behavior being discussed in the linked issues pointed more toward the
    underlying no-content-length streaming path than toward any higher-level
    framework overhead
  • that made this part of uWS look like a good candidate for a focused
    implementation attempt

What I implemented:

  • add HttpResponse::tryWrite(std::string_view) for chunked responses
  • move chunked write() onto a shared internal write core
  • route chunked end() through that same core so framing, timeout handling,
    completion, and close behavior stay aligned
  • keep continuation state so partial optional writes can resume from the
    remaining body suffix via getWriteOffset()

Why I thought this was the right experiment:

  • it gives unknown-size chunked streaming a primitive analogous in spirit to
    tryEnd
  • it lets write, tryWrite, and terminating chunked end share one
    implementation instead of evolving independently
  • it keeps the attempt focused on uWS itself and avoids a broader
    architectural change

Local benchmark notes:

  • on one local async no-content-length streaming benchmark, the write path
    moved from about 1073 req/s before this change to about 3186 req/s
    after it
  • on the resulting implementation, separate local wrk checks for the new
    chunked streaming behavior showed:
    • GET /trywrite with 128 KiB: 30882.97 req/s, 3.77 GB/s, p99 3.74 ms
    • GET /trywrite-end with 256 KiB: 30500.77 req/s, 7.45 GB/s, p99 3.92 ms

These are only local workload-specific numbers, but they were encouraging
enough for me to keep the change.

Technical notes:

  • the core change is that chunked body emission now goes through one internal
    helper instead of keeping separate behavior between write() and chunked
    end()
  • that helper is responsible for:
    • emitting the chunk header once per chunk
    • writing the payload with either optional or non-optional semantics
    • updating the HTTP write offset
    • deciding whether the chunk is complete or still pending
    • emitting the terminating 0\r\n\r\n when chunked end() completes
  • for optional writes, I keep a small continuation bit in response state so
    the code can distinguish between:
    • starting a fresh chunk, where a new chunk header must be emitted
    • continuing a previously partial chunk, where only the remaining payload
      suffix should be written
  • the continuation logic is intentionally tied to getWriteOffset(), so
    onWritable can resume from the remaining body suffix without needing a
    second chunk-framing path
  • this also lets tryWrite(...) -> false -> onWritable(...) -> tryWrite(rest)
    and tryWrite(...) -> false -> end(rest) follow one consistent model

Validation:

  • I extended the existing Crc32 / tests/smoke.mjs flow to cover
    write, tryWrite, and tryWrite + end
  • the affected local unit tests, smoke tests, and HTTP compliance checks
    passed in the validated environment

@GetThatCookie
Copy link
Copy Markdown
Contributor Author

Please feel free to take, change, simplify, or remove any part of this as you
see fit. I am not attached to this going in exactly as-is; the main goal here
was to provide a concrete implementation attempt around this path.

If only parts of it are worth keeping, that is perfectly fine too.

@uNetworkingAB
Copy link
Copy Markdown
Contributor

Quick scroll through, my gut feeling says it's high quality high probability of being merged but I still need to review this for a long time

@GetThatCookie
Copy link
Copy Markdown
Contributor Author

Thanks for the quick check. Please take your time.

Comment thread examples/Crc32.cpp Outdated
Comment thread examples/Crc32.cpp Outdated
Comment thread examples/Crc32.cpp Outdated
Comment thread examples/Crc32.cpp Outdated
@GetThatCookie
Copy link
Copy Markdown
Contributor Author

Addressed the review points.

Changes:

  • moved the smoke-only routes out of Crc32 and into the dedicated SmokeTest binary
  • removed the defer usage and resumed streaming strictly via onWritable
  • removed the socket send-buffer tweaking and instead use larger smoke payloads to exercise the tryWrite path
  • updated smoke/CI wiring to run against SmokeTest

@GetThatCookie GetThatCookie marked this pull request as draft April 18, 2026 08:11
@GetThatCookie GetThatCookie marked this pull request as ready for review April 18, 2026 19:30
@GetThatCookie
Copy link
Copy Markdown
Contributor Author

I reduced the follow-up scope and tried to align it with the existing repo structure.

The net diff now only contains the tryWrite core in HttpResponse/HttpResponseData plus a separate manual example (examples/ChunkedResponse.cpp).

This seemed closer to the existing example-based usage pattern already present in the repo than keeping the previous smoke/CI wiring.

Comment thread src/HttpResponse.h Outdated
Comment thread src/HttpResponse.h Outdated
Comment thread src/HttpResponse.h Outdated
Comment thread examples/ChunkedResponse.cpp Outdated
Comment thread src/HttpResponse.h Outdated
Comment thread src/HttpResponse.h Outdated
Comment thread src/AsyncSocketData.h
Comment thread src/AsyncSocketData.h Outdated
Comment thread src/AsyncSocketData.h Outdated
Comment thread examples/ChunkedResponse.cpp Outdated
Comment thread src/HttpResponse.h Outdated
@GetThatCookie
Copy link
Copy Markdown
Contributor Author

GetThatCookie commented Apr 19, 2026

I implemented the finalChunk / write2 idea.

The only subtle part was keeping getWriteOffset() body-only in the combined path. This was not an issue in the previous PR shape, since the terminating chunk was still sent separately by end().

With tryWrite(..., true), if the body is fully written but only part of the terminating \r\n0\r\n\r\n gets out, that trailer should not turn into another public continuation state.

So the current version does the smallest thing:

  • tryWrite(std::string_view, bool finalChunk = false)
  • non-SSL final chunk uses AsyncSocket::write2
  • offsets stay body-only
  • any partial terminating trailer is completed internally
  • ChunkedResponse now uses tryWrite(remaining, true)

Your suggestion pointed to a cleaner path, and my inner monk would not let me leave it half-done once that became obvious. If you want to keep the previous PR shape and leave this optimization for later, I can revert this part.

Comment thread examples/ChunkedResponse.cpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants