Skip to content

fix(tree): track chunkIndex on field entry so getField finds parent#27084

Open
brrichards wants to merge 1 commit intomicrosoft:mainfrom
brrichards:basicChunkCursor-fix
Open

fix(tree): track chunkIndex on field entry so getField finds parent#27084
brrichards wants to merge 1 commit intomicrosoft:mainfrom
brrichards:basicChunkCursor-fix

Conversation

@brrichards
Copy link
Copy Markdown
Contributor

@brrichards brrichards commented Apr 17, 2026

Summary

basicChunkCursor currently doesn't handle tracking the chunk index around multi-node chunks correctly. This problem shows up in chunkedForest, which extends this cursor.

Files changed

File Changes
basicChunk.ts Change location of when chunkIndex is pushed onto the stack and getField now uses chunkIndex to find the parent in siblingStack
basicChunk.spec.ts Added two unit tests that manually construct a field with a UniformChunk followed by a BasicChunk with a subfield

What surfaces the bug

                                                                                                  
  Root field                                                                     
  ├─ chunks[0]  UniformChunk  (nodes 0, 1)
  │    ├─ 10                                                                                             
  │    └─ 20
  └─ chunks[1]  BasicChunk    (node 2)                                                                   
       └─ subField                                                               
            └─ 42     
   
siblingStack when getField is called after navigating to subField: [ UniformChunk(10,20) , BasicChunk "Trailing" ]              
 

When a field contains multi-node chunks followed by a BasicChunk containing a subfield, entering that subfield with enterField will call getField which searches for the parent of this subfield in the siblingStack. Currently this uses the flat node index, which equals the count of all nodes in the field. The problem is that siblingStack contains the chunks and not the flat nodes. So when getField is called using an index based off the flat nodes it will incorrectly index into the siblings array on that stack. In the case above we would attempt to get the parent BasicChunk at index 2, but that would be oob.

Changes

chunkIndex

The proposed changes are to swap the use of a flat node index for getField and use chunkIndex. This was being tracked before, but was a stale reference for what this fix requires. The flow for the current chunk index is:
start at 0 -> increment as we change chunks -> enterField(subField) -> getField -> enterNode/firstNode -> push chunkIndex onto the stack
This causes chunkIndex to not be on the stack when we need it. We would be looking at a stale reference for 1 level up. The change needed was pushing chunkIndex onto the stack during enterField/firstField so we have the needed chunkIndex on the stack when getField needs it.

halfHeight

halfHeight balances indexOfChunkStack.length against siblingStack.length. The current code pushed once per node level. The new code pushes once per Field level, so the rounding isn't needed like before.

New Test Coverage

Two unit tests in basicChunk.spec.ts that manually create a multi-node uniform chunk(not currently possible by the chunker) and then navigating into the subfield throws under the old indexing logic.

@brrichards brrichards marked this pull request as ready for review April 17, 2026 20:03
@brrichards brrichards requested a review from a team as a code owner April 17, 2026 20:03
Copilot AI review requested due to automatic review settings April 17, 2026 20:04
@brrichards brrichards changed the title fix(basicChunk): Correct basicChunkCursor indexing of sibling stack during getField fix(basicChunk): track chunk-array index on sibling stack for cursor getField Apr 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes BasicChunkCursor parent lookup during getField() when a field contains multi-node chunks (e.g., UniformChunk) followed by a BasicChunk, by tracking/restoring the correct chunk-array index instead of relying on the flat logical node index. This addresses failures observed in chunkedForest, which extends this cursor.

Changes:

  • Adjust BasicChunkCursor stack bookkeeping so the chunk-array position (indexOfChunk) is pushed/popped at the right times and used to resolve the correct parent chunk for getField().
  • Update node resolution (getNode()) to use indexOfChunk (chunk array index) rather than the flat node index.
  • Add unit tests that construct root fields with multi-node UniformChunks followed by a trailing BasicChunk containing a subfield, validating navigation doesn’t throw.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/dds/tree/src/feature-libraries/chunked-forest/basicChunk.ts Fixes stack/index invariants and uses chunk-array indices when resolving the parent chunk for getField() (supports fields containing multi-node chunks).
packages/dds/tree/src/test/feature-libraries/chunked-forest/basicChunk.spec.ts Adds regression tests covering getField() parent resolution when preceded by one or more multi-node chunks.

Comment on lines +16 to +20
import {
BasicChunk,
BasicChunkCursor,
// eslint-disable-next-line import-x/no-internal-modules
} from "../../../feature-libraries/chunked-forest/basicChunk.js";
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// eslint-disable-next-line import-x/no-internal-modules is currently placed inside the named-import list, which typically won’t suppress a lint finding reported on the ImportDeclaration line. Move the disable comment to immediately precede the import { ... } from ".../basicChunk.js"; statement (or use an eslint-disable block) so CI linting is reliably suppressed for this internal-module import.

Copilot uses AI. Check for mistakes.
Comment on lines +32 to +35
import {
TreeShape,
UniformChunk,
// eslint-disable-next-line import-x/no-internal-modules
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: the // eslint-disable-next-line import-x/no-internal-modules comment is inside the import specifier list, so it may not suppress the rule on the ImportDeclaration. Place the disable comment immediately above the import { TreeShape, UniformChunk } ... statement to avoid lint failures.

Suggested change
import {
TreeShape,
UniformChunk,
// eslint-disable-next-line import-x/no-internal-modules
// eslint-disable-next-line import-x/no-internal-modules
import {
TreeShape,
UniformChunk,

Copilot uses AI. Check for mistakes.
Comment on lines +502 to +504
this.indexOfChunkStack.pop() ?? fail("Unexpected indexOfChunkStack.length");
this.indexWithinChunk =
this.indexWithinChunkStack.pop() ?? fail("Unexpected indexWithinChunkStack.length");
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail() messages in this package are consistently passed as numeric assert codes (e.g. fail(0xaf0 /* ... */) a couple of lines above). These two new fail("Unexpected …") calls break that convention and make failures harder to triage consistently. Please use new numeric codes here (with brief comments) to match the surrounding error-tagging pattern.

Suggested change
this.indexOfChunkStack.pop() ?? fail("Unexpected indexOfChunkStack.length");
this.indexWithinChunk =
this.indexWithinChunkStack.pop() ?? fail("Unexpected indexWithinChunkStack.length");
this.indexOfChunkStack.pop() ?? fail(0xaf4 /* Unexpected indexOfChunkStack.length */);
this.indexWithinChunk =
this.indexWithinChunkStack.pop() ??
fail(0xaf5 /* Unexpected indexWithinChunkStack.length */);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am just moving this in the file I don't know if I should be re-using the fail codes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok to reuse codes when moving the assert/fail around (so long as you don't introduce another copy of it), but it's also ok to remove them.

@brrichards brrichards changed the title fix(basicChunk): track chunk-array index on sibling stack for cursor getField fix(tree): track chunkIndex on field entry so getField finds parent Apr 17, 2026
private getStackedChunkIndex(height: number): number {
assert(height % 2 === 1, "must be node height");
assert(height >= 0, "must not be above root");
// eslint-disable-next-line no-bitwise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When suppressing the linter, its a good idea to document why its ok to suppress it in this case, like we do in https://github.com/microsoft/FluidFramework/pull/27084/changes#diff-e637d8b2f919c742dad1d0d98d18b101019ade1f3b748623aed966ba4b1119d7R169-R170

That said, we should probably deduplicate that code with this.

Introducing a getNodeOnlyHeightFromHeight method, which takes an optional height defaulting to this.siblingStack.length seems like it would be a good way to deduplicate this logic, and make it a bit more clear what is going on in it.

Comment on lines 169 to 180
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this seems like its just checking some invariants about the state of this cursor.

Randomly throwing invariant validation code in the middle of some method like this isn't a good pattern (I assume it was me who did this btw).

Factoring this out into something like an "assertValid" method, then calling it from a few key places might be a better approach.

You could probably then add a couple more checks which detect the kinds of bugs you found and are fixing here to help reduce the risk of regressions even more.

// Logical node index 2 maps to chunk array index 1 (the trailing BasicChunk).
cursor.enterNode(2);

// enterField drives getField, which on main looks up the parent at the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is going to be on main soon. If you want to comment on some specific comparison between this branch and main which will no longer apply after the PR merges, you can do so in a PR comment, or if putting it in the code, you need to phrase it differently.

Anyway, thanks for the explanation.

Comment on lines +196 to +199
// enterField drives getField, which on main looks up the parent at the
// logical node index (2) instead of the chunk array index (1). siblings[2]
// is out of bounds in the 2-chunk array, so the cast returns undefined and
// the subsequent field read throws.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "looks up the parent", that's a bit confusing.

I think this is coming from the use of the variable name "parent" in get field, which is also rather poor.

Our trees and cursors do provide the ability to look up the parents of nodes, which sounds like what is being done here, but isn't.

Its instead simply looking up the current node, which it happens to call parent, since it's the parent of the field that method is operating on.

Also, you should make it clear why it was wrong to use a node index.

Suggested change
// enterField drives getField, which on main looks up the parent at the
// logical node index (2) instead of the chunk array index (1). siblings[2]
// is out of bounds in the 2-chunk array, so the cast returns undefined and
// the subsequent field read throws.
// enterField uses getField, which must looks up the current node (who's field should be gotten).
// This must handle when the index of the node does not match the index of the chunk containing that node.
// This test is a node index 2 and chunk index 1.

// is out of bounds in the 2-chunk array, so the cast returns undefined and
// the subsequent field read throws.
cursor.enterField(subField);
assert.doesNotThrow(() => cursor.getFieldLength());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than simply asserting it doesn't throw, it would be better to assert that enter field actually worked correctly (its easy to have bugs that don't throw).

Asserting the field length is 1, and that if you enter that node you get a value of 42 should do it I think.

@@ -355,6 +367,8 @@ export class BasicChunkCursor extends SynchronousCursor implements ChunkedCursor

this.siblingStack.push(this.siblings);
this.indexStack.push(this.index);
this.indexOfChunkStack.push(this.indexOfChunk);
this.indexWithinChunkStack.push(this.indexWithinChunk);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The places in the code where we edit these parallel arrays could call our validation method afterward ( see https://github.com/microsoft/FluidFramework/pull/27084/changes#r3103181969 ) which could assert they are properly in sync to catch future similar bugs.

@@ -169,7 +169,7 @@ export class BasicChunkCursor extends SynchronousCursor implements ChunkedCursor
// Compute the number of nodes deep the current depth is.
// We want the floor of the result, which can computed using a bitwise shift assuming the depth is less than 2^31, which seems safe.
// eslint-disable-next-line no-bitwise
const halfHeight = (this.siblingStack.length + 1) >> 1;
const halfHeight = this.siblingStack.length >> 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the old code wrong, but just never getting called in cases where that caused an assert before?

If so, https://github.com/microsoft/FluidFramework/pull/27084/changes#r3103181969 to make it easy to run this in more places seems extra valuable. I think that might have caught this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants