Skip to content

perf(storage): zero-copy upload path from mmap to GCS#2394

Closed
ValentaTomas wants to merge 10 commits intomainfrom
perf/zero-copy-upload
Closed

perf(storage): zero-copy upload path from mmap to GCS#2394
ValentaTomas wants to merge 10 commits intomainfrom
perf/zero-copy-upload

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented Apr 14, 2026

Upload memfile/rootfs directly from mmap to GCS, skipping CachePath→StoreFile→ReadAt→alloc-per-chunk. Stacked on #2393.

@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 14, 2026

PR Summary

Medium Risk
Changes the storage write API and snapshot upload path to pass raw byte slices (often mmap-backed) through multiple layers, which could introduce subtle lifetime/locking issues or higher memory pressure from copying large buffers for async caching.

Overview
This PR reworks template memfile/rootfs uploads to take diff data directly from the underlying mmap-backed caches (via new Data() methods on Cache/Chunker/Diff) and write it to storage using an in-memory Store(ctx, []byte) API instead of CachePath + StoreFile. Storage backends (GCS multipart uploader, GCS/AWS/FS objects, peer seekable, and the NFS write-through cache) are updated accordingly, including copying buffers for async cache writes and refactoring GCS multipart uploads to slice parts from the provided byte buffer; tests and mocks are adjusted to match the new API.

Reviewed by Cursor Bugbot for commit ccbd740. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d84ad827b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/shared/pkg/storage/storage_aws.go Outdated
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholder

Comment thread packages/orchestrator/pkg/sandbox/build/local_diff.go Outdated
Comment thread packages/shared/pkg/storage/gcp_multipart.go
Comment thread packages/shared/pkg/storage/storage_aws.go Outdated
@ValentaTomas ValentaTomas changed the base branch from perf/minimal-multipart-fix to main April 14, 2026 22:26
@linear
Copy link
Copy Markdown

linear Bot commented Apr 14, 2026

@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch 6 times, most recently from 9a3ab85 to f451748 Compare April 14, 2026 23:18
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go
Add StoreData(ctx, []byte) to the storage interface and wire up the
template build upload path to use it. Memfile/rootfs uploads now go
directly from mmap-backed diff data to HTTP, skipping the previous
CachePath→StoreFile→ReadAt→alloc-per-chunk pipeline.
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch from c158470 to 00efb6a Compare April 14, 2026 23:51
Comment thread packages/orchestrator/pkg/sandbox/template_build.go
@ValentaTomas ValentaTomas assigned jakubno and unassigned dobrac Apr 15, 2026
The dataStorer type assertion in uploadDiff always failed because
OpenSeekable returns a cachedSeekable or peerSeekable wrapper, not
the inner gcpObject that implements StoreData. This made the zero-copy
upload path unreachable, always falling back to StoreFile.

Add StoreData forwarding to cachedSeekable and peerSeekable so the
type assertion succeeds through the wrapper chain.

Also release the mmap lock (from diff.Data) before falling back to
StoreFile, since that path reads from disk and doesn't use the mmap.
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/shared/pkg/storage/storage_google.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go Outdated
- Extract tryZeroCopyUpload helper so defer release() works safely
  without risking deadlock on the StoreFile fallback path
- Fix zero-size diff: when Data() returns nil, fall through to
  StoreFile instead of silently skipping the upload
- Unify dataStorer interface as storage.DataStorer in one place
- Fix gcsOperationAttr labels in StoreData (WriteFromData instead
  of WriteFromFileSystem)
- Include concrete type in assertion error messages for debugging
- Nit: remove redundant []byte() cast on mmap in cache.Data()
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
- Skip upload entirely when diff is *NoDiff (zero-size file), fixing
  the regression where NoDiff.CachePath() returned NoDiffError on the
  StoreFile fallback path
- Add StoreDataViaFile fallback in cachedSeekable and peerSeekable so
  StoreData gracefully degrades via a temp file + StoreFile when the
  inner storage doesn't implement DataStorer, instead of returning
  an error
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go
Comment thread packages/shared/pkg/storage/storage_cache_seekable_test.go
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch 2 times, most recently from 24d42d2 to f7a9540 Compare April 16, 2026 07:25
@ValentaTomas ValentaTomas marked this pull request as draft April 16, 2026 07:30
Comment thread packages/orchestrator/pkg/sandbox/block/chunk.go
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch 4 times, most recently from 4eeaa48 to 7d1cdf9 Compare April 16, 2026 07:39
…ap upload

Replace StoreFile(ctx, path) with Store(ctx, []byte) across all storage
backends. The mmap-backed diff data now flows directly to GCS multipart
upload as sub-slices, avoiding the previous CachePath→file open→ReadAt→
alloc-per-chunk pipeline.

This merges the three-function multipart upload path
(UploadFileInParallel + uploadParts + UploadData) into a single
Upload(ctx, []byte, concurrency) method, and eliminates the separate
DataStorer interface and StoreData methods on all wrappers.
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch from 7d1cdf9 to a43759c Compare April 16, 2026 07:39
The orchestrator smoke test runs Firecracker VMs that need network
access. Without an explicit timeout, Go's default 10m applies, which
is too short when the VM has network delays.
@ValentaTomas
Copy link
Copy Markdown
Member Author

ValentaTomas commented Apr 18, 2026

#2034 might change the interface, so we are temporarily closing this for now.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ccbd740. Configure here.

}

func TestMultipartUploader_UploadFileInParallel_Success(t *testing.T) {
func TestMultipartUploader_Upload_Success(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests write hundreds of MB to disk unnecessarily

Low Severity

After migrating UploadFileInParallel(path) to Upload(data []byte), at least six tests still create temp files via os.WriteFile (many at 100MB+) that are never read. The testFile, tempDir, and file writes are now dead code — only the []byte(testContent) passed to Upload is used. This wastes ~500MB of disk I/O per test run for no reason.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ccbd740. Configure here.

@ValentaTomas ValentaTomas deleted the perf/zero-copy-upload branch April 19, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants