[dual-timeline 1/N] Add secondary timeline metadata, root offset ring…#2212
[dual-timeline 1/N] Add secondary timeline metadata, root offset ring…#2212maxkozlovsky wants to merge 1 commit intomainfrom
Conversation
9145049 to
db11be5
Compare
There was a problem hiding this comment.
Code Review: [dual-timeline 1/N] Secondary timeline metadata, root offset ring, and lifecycle
Overall Assessment
Well-structured foundational PR. The metadata layout change (carving secondary_timeline_header_t from future_variables_unused) is cleanly done — the static_assert on total size and the size-arithmetic in the padding field guarantee no layout drift. The MONAD007→MONAD008 migration handles crash-recovery correctly (per-copy loop converges regardless of which copy was stranded). The mechanical refactor of compact_offsets / compact_offset_range_* / curr_upsert_auto_expire_version into the per-timeline timeline_compaction_state array is consistent and complete across trie.cpp and update_aux.cpp. Tests are thorough — especially the partial-migration crash-resume test and the lifecycle round-trip tests.
Issues
1. reinterpret_cast for atomic stores vs start_lifetime_as (correctness)
In activate_secondary_timeline and deactivate_secondary_timeline (update_aux.cpp), the code uses raw reinterpret_cast<std::atomic_uint64_t *>(&field)->store(...) to perform atomic writes. However, the reader side (timeline_active() in trie.hpp:371, db_history_min_valid_version() in trie.hpp:416) correctly uses start_lifetime_as<std::atomic<...>>() to implicitly create the atomic object.
Under the C++ object model, reinterpret_cast to std::atomic<T>* does not begin the lifetime of an atomic object at that address — only start_lifetime_as (or std::start_lifetime_as in C++23) does. The writers should use start_lifetime_as consistently with the readers. The codebase already has the polyfill (category/core/detail/start_lifetime_as_polyfill.hpp), so this is a one-line fix per store site.
2. root_offsets_delegator — empty-span UB not guarded internally
The new constructor allows root_offsets_chunks_.size() == 0, and empty() was added as a caller-side guard. However, push() and operator[] compute wp & (root_offsets_chunks_.size() - 1) which becomes wp & SIZE_MAX when size is 0, producing an out-of-bounds access. All current callers do check empty() before access (in get_root_offset_at_version, db_history_max_version, etc.), but the delegator is a public API and nothing prevents a future caller from skipping the check.
Consider adding a MONAD_ASSERT(!empty()) at the top of push() and operator[] (the existing primary-ring constructor already guarantees non-empty, so the assertion would be free for that path).
3. promote_secondary_to_primary — non-atomic and not crash-safe
The TODOs in the code clearly acknowledge this, so this is just reinforcing for review visibility:
- The per-copy header swap (
std::swapofversion_lower_bound_,next_version_) and the global span swap (swap_root_offsets_spans) are not atomic with respect to concurrent readers. A reader racing with promotion can observe one copy's header pointing to the old ring while the span already points to the new ring. - The swap is in-memory only —
root_offsets.storage_.cnv_chunks[](the on-disk list thatmap_root_offsetsuses to reconstruct the primary ring on reopen) is not swapped, so promotion does not survive restart. - The
active_flag for what was previously the primary timeline (now secondary after swap) is never set, so post-promotiontimeline_active(secondary)reads the old secondary header'sactive_flag — which is now logically the old-primary and was never explicitly marked active in the header (primary is always implicitly active). The testpromote_secondary_to_primaryassertstimeline_active(secondary)is true post-promotion, which passes because the old secondary'sactive_=1byte is now in the primary header position (swapped), and the old primary'sactive_=0is now in the secondary position. This means the old primary data is not accessible as a secondary timeline after promotion — which may be intentional, but the semantics are subtle and worth a comment.
These are fine for a 1/N scaffolding PR where no production caller exists yet. The TODO asking to "wire up before any caller ships" is the right gate.
4. with_rw_aux — explicit destructor + placement-new pattern (test-only, minor)
aux.~UpdateAux();
new (&aux) UpdateAux{};This ensures metadata_ctx_ is torn down before testio leaves scope on the background thread. It's correct for the happy path, but if fn(aux) throws, the jthread destructor requests a stop and joins, then the background thread runs the explicit destructor, then aux's automatic destructor runs on the re-constructed (empty) UpdateAux — double-destroy of the original is avoided only because the explicit destructor + placement new runs first. If the thread is slow to react, there's a window where the automatic destructor of the outer-scope aux could race. Not a real concern for current tests (callbacks don't throw), but RAII encapsulation would be more robust.
5. SECONDARY_RING_CAPACITY defined in two places
map_secondary_root_offsets() defines SECONDARY_RING_CAPACITY = 65536 as a function-local static constexpr. The secondary constructor of root_offsets_delegator validates the power-of-two invariant but has no link to this constant. If the capacity ever changes, it would be easy to miss updating both sites. Consider exposing it as a class-level constant on DbMetadataContext.
Nits
trie.hpp:233:timeline_is a public member array — consistent with the old publiccompact_offsets/curr_upsert_auto_expire_version, but for the dual-timeline feature it would be nice to restrict mutation to controlled methods. Low priority for a 1/N.update_aux.cpp:1224: TheLOG_INFO_CFORMATuses%luforfork_version(which isuint64_t). On some platformsuint64_tisunsigned long long, making%lutechnically a format mismatch.PRIu64would be portable, or theLOG_INFOvariant withstd::formatstyle ({}) that's used elsewhere.
There was a problem hiding this comment.
Pull request overview
Adds foundational dual-timeline infrastructure to the MPT on-disk metadata and UpdateAux, including a secondary root-offset ring carved from unused cnv chunk space plus lifecycle operations to activate/deactivate/promote the secondary timeline.
Changes:
- Introduces
timeline_id/timeline_compaction_stateand threads per-timeline compaction state throughUpdateAux/trie update paths. - Extends db metadata layout (size preserved) with a
secondary_timelineheader and adds MONAD007→MONAD008 migration logic. - Maps a secondary root-offset ring from cnv chunk 0 unused space; adds lifecycle APIs and comprehensive unit tests.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| category/mpt/update_aux.cpp | Switches compaction state to per-timeline storage and adds secondary timeline lifecycle (activate/deactivate/promote). |
| category/mpt/trie.hpp | Adds timeline helpers/queries to UpdateAux and stores per-timeline compaction state. |
| category/mpt/trie.cpp | Routes compaction/auto-expire checks through the primary timeline’s compaction state. |
| category/mpt/test/update_aux_test.cpp | Adds tests for secondary ring mapping, lifecycle, promote behavior, and per-timeline queries. |
| category/mpt/test/db_metadata_test.cpp | Adds layout/offset/semantics tests for the new secondary timeline header and db_copy behavior. |
| category/mpt/detail/timeline.hpp | New header defining timeline identifiers and per-timeline compaction state container. |
| category/mpt/detail/db_metadata.hpp | Bumps magic to MONAD008, adds secondary timeline header, and updates db_copy to preserve secondary cursor. |
| category/mpt/db_metadata_context.hpp | Adds storage for secondary ring spans and exposes root_offsets(timeline_id) plus swap helper. |
| category/mpt/db_metadata_context.cpp | Implements MONAD007→MONAD008 migration and mmaps/initializes the secondary root-offset ring. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db11be5 to
95309db
Compare
|
Thanks for the reviews — addressed in push 95309db:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
95309db to
cd556a3
Compare
95df8df to
e3289b9
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6845636 to
6d67113
Compare
…hrink/grow Introduce a second physical root-offsets ring alongside the existing one so an orchestrator can drive a chain fork across both timelines from the same DB. Both rings share the same header type inside db_metadata; an on-disk primary_ring_idx byte selects which is the logical primary, making promote a single atomic byte flip rather than a RAM-only span swap. At pool init, all ring chunks go to ring_a. activate_secondary_header shrinks the primary ring by half and hands the freed cnv chunks to the non-primary ring; deactivate_secondary_header reverses it. Ring data lives in cnv chunks 1..N while the metadata (capacity, version bounds, is_dirty, seq, etc.) lives on cnv chunk 0, so kernel pagecache writeback of the two is unordered. A pending_shrink_grow intent log on the metadata page — stamped + msync'd before any ring write, cleared + msync'd after — combined with an idempotent body lets the constructor's replay_pending_shrink_grow_ path complete any shrink/ grow that a crash interrupted. Concurrent readers of the secondary timeline retry under a shrink_grow_seq_ seqlock; the seq bumps now live under the same hold_dirty as the intent-log stamp, and the clear path normalises both copies' seq to even regardless of starting parity so a partial-stamp crash can't leave a durable odd seq stranded on one copy. MONAD007 -> MONAD008 migration now zeros the full new region (through pending_shrink_grow and future_variables_unused), preventing the 0xff-padding bytes from being interpreted as an active secondary, an odd seq that hangs readers, or an unknown op_kind that aborts replay. CLI restore propagates primary_ring_idx and the secondary_timeline header so restoring a promoted-secondary or active-secondary archive routes queries correctly; shrink_grow_seq_ and pending_shrink_grow are deliberately left at init-zero so the restored DB starts quiescent. Design doc: docs/dual_timeline_metadata.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6d67113 to
22ac6fb
Compare
…, and lifecycle management
Introduce the foundational infrastructure for dual-timeline support:
Types (category/mpt/detail/timeline.hpp):
Metadata (category/mpt/detail/db_metadata.hpp):
UpdateAux (trie.hpp, trie.cpp, update_aux.cpp):
Tests: