import_batch panics in ensure_vv_for when DAG has shared dependency#929
import_batch panics in ensure_vv_for when DAG has shared dependency#929kimjoar wants to merge 1 commit intoloro-dev:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb2c729a52
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if let Some(err) = import_batch_error(&[snapshot, update]) { | ||
| panic!("{err}"); |
There was a problem hiding this comment.
Avoid merging this repro test before the panic fix lands
This commit adds a regression test for the known import_batch/ensure_vv_for panic, but it does not modify the production code that panics. On the current implementation, import_batch_error(&[snapshot, update]) will return Some(...), so this new test immediately fails and turns cargo test -p loro-internal red if the commit is merged on its own. If the fix is intended to arrive in a later commit, the reproducer needs to stay with that fix rather than landing independently.
Useful? React with 👍 / 👎.
|
This DAG looks unreasonable. According to the normal formation conditions of a DAG, since C depends on A, B should only depend on C, rather than depending on both C and A. Are there any normal ways of using LoroDoc that would construct this? |
|
@zxch3n Yeah, I wasn't able to reproduce this using LoroDoc directly, but we have been able to get into this state and this was the minimal case of what we saw in those documents. I can try to find the exact documents and share them directly on Discord. |
Three production blobs in loro-debug reliably panic inside doc.import on loro-crdt@1.10.6: - DagCausalIter assertion at dag/iter.rs: when a peer has multiple node segments in the target span and the later segment's deps fall outside the span, both segments reach the initial stack with zero in-degree and LIFO pops the higher counter first. Fix: in DagCausalIter::new, after out_degrees and succ are built, synthesize per-peer ordering edges so the lower counter must drain before the higher one is released. - OnceCell::set(..).unwrap() double-set at oplog/loro_dag.rs:917 during ensure_vv_for on a diamond dep (loro-dev#929): a shared ancestor gets pushed onto the iterative-DFS stack by multiple paths and the second pop tries to initialize an already-filled cell. Fix: skip nodes whose vv is already Some at the top of the loop and swallow the Err from the final set as a defensive measure. - list_state Index-out-of-range panic for the mads-bootstrap fixture: the ListDiffCalculator cold-starts its RichtextTracker via new_with_unknown() and never learns about the snapshot's real list content. During replay the tracker's per-change checkouts temporarily retreat some snapshot ops, so a new op's fugue anchor lands inside the unknown prefix. CrdtRope::get_diff() then emits Retain(N) where N is larger than ListState.len(). Fix: change ContainerState::apply_diff (and DocState::apply_diff, init_with_states_and_version) to return LoroResult<()>; ListState::apply_diff pre-validates the delta against current length and returns LoroError::internal(..) on mismatch so doc.import surfaces the error instead of panicking. Root cause in the tracker cold-start path still needs follow-up. Regression tests for all three live in crates/loro/tests/march_2026_panics.rs with the captured production blobs in fixtures_march_2026/. Additional targeted unit tests in dag/iter.rs and oplog/loro_dag.rs cover the smallest synthetic DAG shapes that triggered each bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Fix stale DAG causal iteration after node splits * Fix richtext cursor updates at fragment tails * Add list import regression for split tracking * Recover document locks after panic poisoning * Add unpoisoned lock helpers across core internals * Fail fast on poisoned core locks * Return errors for snapshot encoding edge cases * Fix loom lock helper usage * Fix three March-2026 import panics (kim, mads, pr929 fixtures) Three production blobs in loro-debug reliably panic inside doc.import on loro-crdt@1.10.6: - DagCausalIter assertion at dag/iter.rs: when a peer has multiple node segments in the target span and the later segment's deps fall outside the span, both segments reach the initial stack with zero in-degree and LIFO pops the higher counter first. Fix: in DagCausalIter::new, after out_degrees and succ are built, synthesize per-peer ordering edges so the lower counter must drain before the higher one is released. - OnceCell::set(..).unwrap() double-set at oplog/loro_dag.rs:917 during ensure_vv_for on a diamond dep (#929): a shared ancestor gets pushed onto the iterative-DFS stack by multiple paths and the second pop tries to initialize an already-filled cell. Fix: skip nodes whose vv is already Some at the top of the loop and swallow the Err from the final set as a defensive measure. - list_state Index-out-of-range panic for the mads-bootstrap fixture: the ListDiffCalculator cold-starts its RichtextTracker via new_with_unknown() and never learns about the snapshot's real list content. During replay the tracker's per-change checkouts temporarily retreat some snapshot ops, so a new op's fugue anchor lands inside the unknown prefix. CrdtRope::get_diff() then emits Retain(N) where N is larger than ListState.len(). Fix: change ContainerState::apply_diff (and DocState::apply_diff, init_with_states_and_version) to return LoroResult<()>; ListState::apply_diff pre-validates the delta against current length and returns LoroError::internal(..) on mismatch so doc.import surfaces the error instead of panicking. Root cause in the tracker cold-start path still needs follow-up. Regression tests for all three live in crates/loro/tests/march_2026_panics.rs with the captured production blobs in fixtures_march_2026/. Additional targeted unit tests in dag/iter.rs and oplog/loro_dag.rs cover the smallest synthetic DAG shapes that triggered each bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Make internal locks fail fast * chore: rm fixtures * chore: changeset * Fix awareness doctest lock usage * Fix loom RwLock wrapper * Update repository agent guidelines --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Problem
import_batchpanics atloro_dag.rs:916with anOnceCelldouble-set when theDAG contains two sibling nodes that share a common dependency:
Sequential
import()on the same data works fine.Root cause
The iterative DFS in
ensure_vv_forpushes the sameAppDagNodeonto the stacktwice. When B is popped and both deps A and C are unresolved, it pushes A then C.
When C is then popped, it finds A still unresolved (A is below C on the stack) and
pushes A again. The first A-pop sets the VV; the second A-pop calls
OnceCell::set()on an already-initialized cell and panics.This only triggers through
import_batchbecause it operates in detached mode —no diff/VV computation happens per blob — so every
OnceCellis empty when thefinal
_checkout_to_latesttraverses the full DAG. Sequentialimport()cachesVVs incrementally after each blob, so the DFS short-circuits before reaching the
problematic depth.