Skip to content

import_batch panics in ensure_vv_for when DAG has shared dependency#929

Open
kimjoar wants to merge 1 commit intoloro-dev:mainfrom
kimjoar:kim/import-batch-panic-min-repro
Open

import_batch panics in ensure_vv_for when DAG has shared dependency#929
kimjoar wants to merge 1 commit intoloro-dev:mainfrom
kimjoar:kim/import-batch-panic-min-repro

Conversation

@kimjoar
Copy link
Copy Markdown
Contributor

@kimjoar kimjoar commented Mar 19, 2026

Problem

import_batch panics at loro_dag.rs:916 with an OnceCell double-set when the
DAG contains two sibling nodes that share a common dependency:

  A       ← shared dep
 / \
C   B     ← B.deps = [A, C], C.deps = [A]

Sequential import() on the same data works fine.

Root cause

The iterative DFS in ensure_vv_for pushes the same AppDagNode onto the stack
twice. When B is popped and both deps A and C are unresolved, it pushes A then C.
When C is then popped, it finds A still unresolved (A is below C on the stack) and
pushes A again. The first A-pop sets the VV; the second A-pop calls
OnceCell::set() on an already-initialized cell and panics.

This only triggers through import_batch because it operates in detached mode —
no diff/VV computation happens per blob — so every OnceCell is empty when the
final _checkout_to_latest traverses the full DAG. Sequential import() caches
VVs incrementally after each blob, so the DFS short-circuits before reaching the
problematic depth.

@kimjoar kimjoar changed the title bug: import_batch panics in ensure_vv_for when DAG has shared dependency import_batch panics in ensure_vv_for when DAG has shared dependency Mar 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb2c729a52

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +175 to +176
if let Some(err) = import_batch_error(&[snapshot, update]) {
panic!("{err}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid merging this repro test before the panic fix lands

This commit adds a regression test for the known import_batch/ensure_vv_for panic, but it does not modify the production code that panics. On the current implementation, import_batch_error(&[snapshot, update]) will return Some(...), so this new test immediately fails and turns cargo test -p loro-internal red if the commit is merged on its own. If the fix is intended to arrive in a later commit, the reproducer needs to stay with that fix rather than landing independently.

Useful? React with 👍 / 👎.

@zxch3n
Copy link
Copy Markdown
Member

zxch3n commented Mar 31, 2026

This DAG looks unreasonable.

According to the normal formation conditions of a DAG, since C depends on A, B should only depend on C, rather than depending on both C and A. Are there any normal ways of using LoroDoc that would construct this?

@kimjoar
Copy link
Copy Markdown
Contributor Author

kimjoar commented Apr 9, 2026

@zxch3n Yeah, I wasn't able to reproduce this using LoroDoc directly, but we have been able to get into this state and this was the minimal case of what we saw in those documents. I can try to find the exact documents and share them directly on Discord.

zxch3n added a commit to zxch3n/loro that referenced this pull request Apr 21, 2026
Three production blobs in loro-debug reliably panic inside doc.import on
loro-crdt@1.10.6:

- DagCausalIter assertion at dag/iter.rs: when a peer has multiple node
  segments in the target span and the later segment's deps fall outside
  the span, both segments reach the initial stack with zero in-degree and
  LIFO pops the higher counter first. Fix: in DagCausalIter::new, after
  out_degrees and succ are built, synthesize per-peer ordering edges so
  the lower counter must drain before the higher one is released.

- OnceCell::set(..).unwrap() double-set at oplog/loro_dag.rs:917 during
  ensure_vv_for on a diamond dep (loro-dev#929): a shared ancestor
  gets pushed onto the iterative-DFS stack by multiple paths and the
  second pop tries to initialize an already-filled cell. Fix: skip nodes
  whose vv is already Some at the top of the loop and swallow the Err
  from the final set as a defensive measure.

- list_state Index-out-of-range panic for the mads-bootstrap fixture:
  the ListDiffCalculator cold-starts its RichtextTracker via
  new_with_unknown() and never learns about the snapshot's real list
  content. During replay the tracker's per-change checkouts temporarily
  retreat some snapshot ops, so a new op's fugue anchor lands inside the
  unknown prefix. CrdtRope::get_diff() then emits Retain(N) where N is
  larger than ListState.len(). Fix: change ContainerState::apply_diff
  (and DocState::apply_diff, init_with_states_and_version) to return
  LoroResult<()>; ListState::apply_diff pre-validates the delta against
  current length and returns LoroError::internal(..) on mismatch so
  doc.import surfaces the error instead of panicking. Root cause in the
  tracker cold-start path still needs follow-up.

Regression tests for all three live in crates/loro/tests/march_2026_panics.rs
with the captured production blobs in fixtures_march_2026/. Additional
targeted unit tests in dag/iter.rs and oplog/loro_dag.rs cover the
smallest synthetic DAG shapes that triggered each bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zxch3n added a commit that referenced this pull request Apr 21, 2026
* Fix stale DAG causal iteration after node splits

* Fix richtext cursor updates at fragment tails

* Add list import regression for split tracking

* Recover document locks after panic poisoning

* Add unpoisoned lock helpers across core internals

* Fail fast on poisoned core locks

* Return errors for snapshot encoding edge cases

* Fix loom lock helper usage

* Fix three March-2026 import panics (kim, mads, pr929 fixtures)

Three production blobs in loro-debug reliably panic inside doc.import on
loro-crdt@1.10.6:

- DagCausalIter assertion at dag/iter.rs: when a peer has multiple node
  segments in the target span and the later segment's deps fall outside
  the span, both segments reach the initial stack with zero in-degree and
  LIFO pops the higher counter first. Fix: in DagCausalIter::new, after
  out_degrees and succ are built, synthesize per-peer ordering edges so
  the lower counter must drain before the higher one is released.

- OnceCell::set(..).unwrap() double-set at oplog/loro_dag.rs:917 during
  ensure_vv_for on a diamond dep (#929): a shared ancestor
  gets pushed onto the iterative-DFS stack by multiple paths and the
  second pop tries to initialize an already-filled cell. Fix: skip nodes
  whose vv is already Some at the top of the loop and swallow the Err
  from the final set as a defensive measure.

- list_state Index-out-of-range panic for the mads-bootstrap fixture:
  the ListDiffCalculator cold-starts its RichtextTracker via
  new_with_unknown() and never learns about the snapshot's real list
  content. During replay the tracker's per-change checkouts temporarily
  retreat some snapshot ops, so a new op's fugue anchor lands inside the
  unknown prefix. CrdtRope::get_diff() then emits Retain(N) where N is
  larger than ListState.len(). Fix: change ContainerState::apply_diff
  (and DocState::apply_diff, init_with_states_and_version) to return
  LoroResult<()>; ListState::apply_diff pre-validates the delta against
  current length and returns LoroError::internal(..) on mismatch so
  doc.import surfaces the error instead of panicking. Root cause in the
  tracker cold-start path still needs follow-up.

Regression tests for all three live in crates/loro/tests/march_2026_panics.rs
with the captured production blobs in fixtures_march_2026/. Additional
targeted unit tests in dag/iter.rs and oplog/loro_dag.rs cover the
smallest synthetic DAG shapes that triggered each bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make internal locks fail fast

* chore: rm fixtures

* chore: changeset

* Fix awareness doctest lock usage

* Fix loom RwLock wrapper

* Update repository agent guidelines

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants