Skip to content

[dual-timeline 1/N] Add secondary timeline metadata, root offset ring…#2212

Draft
maxkozlovsky wants to merge 1 commit intomainfrom
max/dual-timeline-1-metadata
Draft

[dual-timeline 1/N] Add secondary timeline metadata, root offset ring…#2212
maxkozlovsky wants to merge 1 commit intomainfrom
max/dual-timeline-1-metadata

Conversation

@maxkozlovsky
Copy link
Copy Markdown
Contributor

…, and lifecycle management

Introduce the foundational infrastructure for dual-timeline support:

Types (category/mpt/detail/timeline.hpp):

  • timeline_id enum (primary/secondary) and NUM_TIMELINES constant
  • timeline_compaction_state struct with per-timeline compaction boundary

Metadata (category/mpt/detail/db_metadata.hpp):

  • secondary_timeline_header_t carved from future_variables_unused: version_lower_bound_, next_version_, active_ flag
  • Total db_metadata size unchanged (528512 bytes) for backward compat

UpdateAux (trie.hpp, trie.cpp, update_aux.cpp):

  • Per-timeline compaction state array (timeline_[NUM_TIMELINES]) with tl(timeline_id) accessor, replacing bare member fields
  • Secondary root offset ring mapped from cnv chunk 0's unused space (65536 entries, 512KB per copy)
  • root_offsets_delegator parameterized on timeline_id
  • timeline_active(), get_root_offset_at_version(v, tid), db_history_{min,max}_version(tid), version_is_valid_ondisk(v, tid)
  • Lifecycle: activate_secondary_timeline (initializes compaction from primary), deactivate_secondary_timeline, promote_secondary_to_primary

Tests:

  • db_metadata_test: layout, field offsets, zero-init semantics, read/write round-trip, memcpy survival, no overlap with consensus
  • timeline_test: enum values, default construction, trivial copyability
  • update_aux_test: secondary ring mapping, activate/deactivate lifecycle, ring push/read, promote, per-timeline version queries, version_is_valid_ondisk per-timeline

Copilot AI review requested due to automatic review settings April 16, 2026 23:20
@maxkozlovsky maxkozlovsky force-pushed the max/dual-timeline-1-metadata branch from 9145049 to db11be5 Compare April 16, 2026 23:22
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: [dual-timeline 1/N] Secondary timeline metadata, root offset ring, and lifecycle

Overall Assessment

Well-structured foundational PR. The metadata layout change (carving secondary_timeline_header_t from future_variables_unused) is cleanly done — the static_assert on total size and the size-arithmetic in the padding field guarantee no layout drift. The MONAD007→MONAD008 migration handles crash-recovery correctly (per-copy loop converges regardless of which copy was stranded). The mechanical refactor of compact_offsets / compact_offset_range_* / curr_upsert_auto_expire_version into the per-timeline timeline_compaction_state array is consistent and complete across trie.cpp and update_aux.cpp. Tests are thorough — especially the partial-migration crash-resume test and the lifecycle round-trip tests.

Issues

1. reinterpret_cast for atomic stores vs start_lifetime_as (correctness)

In activate_secondary_timeline and deactivate_secondary_timeline (update_aux.cpp), the code uses raw reinterpret_cast<std::atomic_uint64_t *>(&field)->store(...) to perform atomic writes. However, the reader side (timeline_active() in trie.hpp:371, db_history_min_valid_version() in trie.hpp:416) correctly uses start_lifetime_as<std::atomic<...>>() to implicitly create the atomic object.

Under the C++ object model, reinterpret_cast to std::atomic<T>* does not begin the lifetime of an atomic object at that address — only start_lifetime_as (or std::start_lifetime_as in C++23) does. The writers should use start_lifetime_as consistently with the readers. The codebase already has the polyfill (category/core/detail/start_lifetime_as_polyfill.hpp), so this is a one-line fix per store site.

2. root_offsets_delegator — empty-span UB not guarded internally

The new constructor allows root_offsets_chunks_.size() == 0, and empty() was added as a caller-side guard. However, push() and operator[] compute wp & (root_offsets_chunks_.size() - 1) which becomes wp & SIZE_MAX when size is 0, producing an out-of-bounds access. All current callers do check empty() before access (in get_root_offset_at_version, db_history_max_version, etc.), but the delegator is a public API and nothing prevents a future caller from skipping the check.

Consider adding a MONAD_ASSERT(!empty()) at the top of push() and operator[] (the existing primary-ring constructor already guarantees non-empty, so the assertion would be free for that path).

3. promote_secondary_to_primary — non-atomic and not crash-safe

The TODOs in the code clearly acknowledge this, so this is just reinforcing for review visibility:

  • The per-copy header swap (std::swap of version_lower_bound_, next_version_) and the global span swap (swap_root_offsets_spans) are not atomic with respect to concurrent readers. A reader racing with promotion can observe one copy's header pointing to the old ring while the span already points to the new ring.
  • The swap is in-memory only — root_offsets.storage_.cnv_chunks[] (the on-disk list that map_root_offsets uses to reconstruct the primary ring on reopen) is not swapped, so promotion does not survive restart.
  • The active_ flag for what was previously the primary timeline (now secondary after swap) is never set, so post-promotion timeline_active(secondary) reads the old secondary header's active_ flag — which is now logically the old-primary and was never explicitly marked active in the header (primary is always implicitly active). The test promote_secondary_to_primary asserts timeline_active(secondary) is true post-promotion, which passes because the old secondary's active_=1 byte is now in the primary header position (swapped), and the old primary's active_=0 is now in the secondary position. This means the old primary data is not accessible as a secondary timeline after promotion — which may be intentional, but the semantics are subtle and worth a comment.

These are fine for a 1/N scaffolding PR where no production caller exists yet. The TODO asking to "wire up before any caller ships" is the right gate.

4. with_rw_aux — explicit destructor + placement-new pattern (test-only, minor)

aux.~UpdateAux();
new (&aux) UpdateAux{};

This ensures metadata_ctx_ is torn down before testio leaves scope on the background thread. It's correct for the happy path, but if fn(aux) throws, the jthread destructor requests a stop and joins, then the background thread runs the explicit destructor, then aux's automatic destructor runs on the re-constructed (empty) UpdateAux — double-destroy of the original is avoided only because the explicit destructor + placement new runs first. If the thread is slow to react, there's a window where the automatic destructor of the outer-scope aux could race. Not a real concern for current tests (callbacks don't throw), but RAII encapsulation would be more robust.

5. SECONDARY_RING_CAPACITY defined in two places

map_secondary_root_offsets() defines SECONDARY_RING_CAPACITY = 65536 as a function-local static constexpr. The secondary constructor of root_offsets_delegator validates the power-of-two invariant but has no link to this constant. If the capacity ever changes, it would be easy to miss updating both sites. Consider exposing it as a class-level constant on DbMetadataContext.

Nits

  • trie.hpp:233: timeline_ is a public member array — consistent with the old public compact_offsets / curr_upsert_auto_expire_version, but for the dual-timeline feature it would be nice to restrict mutation to controlled methods. Low priority for a 1/N.
  • update_aux.cpp:1224: The LOG_INFO_CFORMAT uses %lu for fork_version (which is uint64_t). On some platforms uint64_t is unsigned long long, making %lu technically a format mismatch. PRIu64 would be portable, or the LOG_INFO variant with std::format style ({}) that's used elsewhere.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds foundational dual-timeline infrastructure to the MPT on-disk metadata and UpdateAux, including a secondary root-offset ring carved from unused cnv chunk space plus lifecycle operations to activate/deactivate/promote the secondary timeline.

Changes:

  • Introduces timeline_id/timeline_compaction_state and threads per-timeline compaction state through UpdateAux/trie update paths.
  • Extends db metadata layout (size preserved) with a secondary_timeline header and adds MONAD007→MONAD008 migration logic.
  • Maps a secondary root-offset ring from cnv chunk 0 unused space; adds lifecycle APIs and comprehensive unit tests.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
category/mpt/update_aux.cpp Switches compaction state to per-timeline storage and adds secondary timeline lifecycle (activate/deactivate/promote).
category/mpt/trie.hpp Adds timeline helpers/queries to UpdateAux and stores per-timeline compaction state.
category/mpt/trie.cpp Routes compaction/auto-expire checks through the primary timeline’s compaction state.
category/mpt/test/update_aux_test.cpp Adds tests for secondary ring mapping, lifecycle, promote behavior, and per-timeline queries.
category/mpt/test/db_metadata_test.cpp Adds layout/offset/semantics tests for the new secondary timeline header and db_copy behavior.
category/mpt/detail/timeline.hpp New header defining timeline identifiers and per-timeline compaction state container.
category/mpt/detail/db_metadata.hpp Bumps magic to MONAD008, adds secondary timeline header, and updates db_copy to preserve secondary cursor.
category/mpt/db_metadata_context.hpp Adds storage for secondary ring spans and exposes root_offsets(timeline_id) plus swap helper.
category/mpt/db_metadata_context.cpp Implements MONAD007→MONAD008 migration and mmaps/initializes the secondary root-offset ring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread category/mpt/update_aux.cpp Outdated
Comment thread category/mpt/update_aux.cpp Outdated
Comment thread category/mpt/update_aux.cpp Outdated
Comment thread category/mpt/update_aux.cpp Outdated
Comment thread category/mpt/trie.hpp Outdated
Comment thread category/mpt/update_aux.cpp
Comment thread category/mpt/db_metadata_context.hpp Outdated
Comment thread category/mpt/update_aux.cpp
Comment thread category/mpt/update_aux.cpp
@maxkozlovsky maxkozlovsky force-pushed the max/dual-timeline-1-metadata branch from db11be5 to 95309db Compare April 16, 2026 23:54
@maxkozlovsky
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews — addressed in push 95309db:

  • Clang-tidy errors: added missing #include <cstddef>, <cstring>, <utility>, and <category/mpt/detail/timeline.hpp>; made cleanup scope_exits and read-only delegator locals const; used auto const & in the migration / ring-init range loops.
  • reinterpret_cast<std::atomic_*> writes → start_lifetime_as<std::atomic_*> in activate_secondary_timeline and deactivate_secondary_timeline, matching the reader pattern and avoiding the object-lifetime UB.
  • promote_secondary_to_primary std::swap of atomic fields: replaced with an inline atomic_swap lambda that does acquire-load + release-store through start_lifetime_as<std::atomic_uint64_t>. Kept the existing TODOs; added a clarifying comment on the active_ asymmetry (primary has no separate active_ byte, so post-swap the new-secondary's active_=1 intentionally leaves the old-primary accessible as the new secondary — that's the promote-as-fork semantics).
  • db_history_min_valid_version(secondary): now short-circuits on !timeline_active(tid) before reading version_lower_bound_, closing the activate/deactivate race window.
  • Empty-span OOB in delegator: added MONAD_ASSERT(!root_offsets_chunks_.empty()) in push, assign, and operator[].
  • %lu on uint64_t: switched to LOG_INFO + {} style.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread category/mpt/trie.hpp Outdated
Comment thread category/mpt/update_aux.cpp Outdated
Comment thread category/mpt/update_aux.cpp Outdated
@maxkozlovsky maxkozlovsky force-pushed the max/dual-timeline-1-metadata branch from 95309db to cd556a3 Compare April 17, 2026 00:15
@maxkozlovsky maxkozlovsky marked this pull request as draft April 17, 2026 00:46
@maxkozlovsky maxkozlovsky force-pushed the max/dual-timeline-1-metadata branch 2 times, most recently from 95df8df to e3289b9 Compare April 20, 2026 23:19
@maxkozlovsky maxkozlovsky requested a review from Copilot April 20, 2026 23:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread category/mpt/detail/db_metadata.hpp
Comment thread category/mpt/db_metadata_context.cpp
Comment thread category/mpt/db_metadata_context.cpp
Comment thread category/mpt/db_metadata_context.hpp Outdated
Comment thread category/mpt/db_metadata_context.cpp
@maxkozlovsky maxkozlovsky force-pushed the max/dual-timeline-1-metadata branch 4 times, most recently from 6845636 to 6d67113 Compare April 21, 2026 00:52
…hrink/grow

Introduce a second physical root-offsets ring alongside the existing
one so an orchestrator can drive a chain fork across both timelines
from the same DB. Both rings share the same header type inside
db_metadata; an on-disk primary_ring_idx byte selects which is the
logical primary, making promote a single atomic byte flip rather
than a RAM-only span swap.

At pool init, all ring chunks go to ring_a. activate_secondary_header
shrinks the primary ring by half and hands the freed cnv chunks to
the non-primary ring; deactivate_secondary_header reverses it. Ring
data lives in cnv chunks 1..N while the metadata (capacity, version
bounds, is_dirty, seq, etc.) lives on cnv chunk 0, so kernel pagecache
writeback of the two is unordered. A pending_shrink_grow intent log
on the metadata page — stamped + msync'd before any ring write,
cleared + msync'd after — combined with an idempotent body lets the
constructor's replay_pending_shrink_grow_ path complete any shrink/
grow that a crash interrupted. Concurrent readers of the secondary
timeline retry under a shrink_grow_seq_ seqlock; the seq bumps now
live under the same hold_dirty as the intent-log stamp, and the
clear path normalises both copies' seq to even regardless of
starting parity so a partial-stamp crash can't leave a durable odd
seq stranded on one copy.

MONAD007 -> MONAD008 migration now zeros the full new region (through
pending_shrink_grow and future_variables_unused), preventing the
0xff-padding bytes from being interpreted as an active secondary, an
odd seq that hangs readers, or an unknown op_kind that aborts replay.
CLI restore propagates primary_ring_idx and the secondary_timeline
header so restoring a promoted-secondary or active-secondary archive
routes queries correctly; shrink_grow_seq_ and pending_shrink_grow
are deliberately left at init-zero so the restored DB starts
quiescent.

Design doc: docs/dual_timeline_metadata.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@maxkozlovsky maxkozlovsky force-pushed the max/dual-timeline-1-metadata branch from 6d67113 to 22ac6fb Compare April 21, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants