fix(tree): track chunkIndex on field entry so getField finds parent#27084
fix(tree): track chunkIndex on field entry so getField finds parent#27084brrichards wants to merge 1 commit intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes BasicChunkCursor parent lookup during getField() when a field contains multi-node chunks (e.g., UniformChunk) followed by a BasicChunk, by tracking/restoring the correct chunk-array index instead of relying on the flat logical node index. This addresses failures observed in chunkedForest, which extends this cursor.
Changes:
- Adjust
BasicChunkCursorstack bookkeeping so the chunk-array position (indexOfChunk) is pushed/popped at the right times and used to resolve the correct parent chunk forgetField(). - Update node resolution (
getNode()) to useindexOfChunk(chunk array index) rather than the flat node index. - Add unit tests that construct root fields with multi-node
UniformChunks followed by a trailingBasicChunkcontaining a subfield, validating navigation doesn’t throw.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/dds/tree/src/feature-libraries/chunked-forest/basicChunk.ts | Fixes stack/index invariants and uses chunk-array indices when resolving the parent chunk for getField() (supports fields containing multi-node chunks). |
| packages/dds/tree/src/test/feature-libraries/chunked-forest/basicChunk.spec.ts | Adds regression tests covering getField() parent resolution when preceded by one or more multi-node chunks. |
| import { | ||
| BasicChunk, | ||
| BasicChunkCursor, | ||
| // eslint-disable-next-line import-x/no-internal-modules | ||
| } from "../../../feature-libraries/chunked-forest/basicChunk.js"; |
There was a problem hiding this comment.
// eslint-disable-next-line import-x/no-internal-modules is currently placed inside the named-import list, which typically won’t suppress a lint finding reported on the ImportDeclaration line. Move the disable comment to immediately precede the import { ... } from ".../basicChunk.js"; statement (or use an eslint-disable block) so CI linting is reliably suppressed for this internal-module import.
| import { | ||
| TreeShape, | ||
| UniformChunk, | ||
| // eslint-disable-next-line import-x/no-internal-modules |
There was a problem hiding this comment.
Same issue as above: the // eslint-disable-next-line import-x/no-internal-modules comment is inside the import specifier list, so it may not suppress the rule on the ImportDeclaration. Place the disable comment immediately above the import { TreeShape, UniformChunk } ... statement to avoid lint failures.
| import { | |
| TreeShape, | |
| UniformChunk, | |
| // eslint-disable-next-line import-x/no-internal-modules | |
| // eslint-disable-next-line import-x/no-internal-modules | |
| import { | |
| TreeShape, | |
| UniformChunk, |
| this.indexOfChunkStack.pop() ?? fail("Unexpected indexOfChunkStack.length"); | ||
| this.indexWithinChunk = | ||
| this.indexWithinChunkStack.pop() ?? fail("Unexpected indexWithinChunkStack.length"); |
There was a problem hiding this comment.
fail() messages in this package are consistently passed as numeric assert codes (e.g. fail(0xaf0 /* ... */) a couple of lines above). These two new fail("Unexpected …") calls break that convention and make failures harder to triage consistently. Please use new numeric codes here (with brief comments) to match the surrounding error-tagging pattern.
| this.indexOfChunkStack.pop() ?? fail("Unexpected indexOfChunkStack.length"); | |
| this.indexWithinChunk = | |
| this.indexWithinChunkStack.pop() ?? fail("Unexpected indexWithinChunkStack.length"); | |
| this.indexOfChunkStack.pop() ?? fail(0xaf4 /* Unexpected indexOfChunkStack.length */); | |
| this.indexWithinChunk = | |
| this.indexWithinChunkStack.pop() ?? | |
| fail(0xaf5 /* Unexpected indexWithinChunkStack.length */); |
There was a problem hiding this comment.
Since I am just moving this in the file I don't know if I should be re-using the fail codes.
There was a problem hiding this comment.
It is ok to reuse codes when moving the assert/fail around (so long as you don't introduce another copy of it), but it's also ok to remove them.
| private getStackedChunkIndex(height: number): number { | ||
| assert(height % 2 === 1, "must be node height"); | ||
| assert(height >= 0, "must not be above root"); | ||
| // eslint-disable-next-line no-bitwise |
There was a problem hiding this comment.
When suppressing the linter, its a good idea to document why its ok to suppress it in this case, like we do in https://github.com/microsoft/FluidFramework/pull/27084/changes#diff-e637d8b2f919c742dad1d0d98d18b101019ade1f3b748623aed966ba4b1119d7R169-R170
That said, we should probably deduplicate that code with this.
Introducing a getNodeOnlyHeightFromHeight method, which takes an optional height defaulting to this.siblingStack.length seems like it would be a good way to deduplicate this logic, and make it a bit more clear what is going on in it.
There was a problem hiding this comment.
All of this seems like its just checking some invariants about the state of this cursor.
Randomly throwing invariant validation code in the middle of some method like this isn't a good pattern (I assume it was me who did this btw).
Factoring this out into something like an "assertValid" method, then calling it from a few key places might be a better approach.
You could probably then add a couple more checks which detect the kinds of bugs you found and are fixing here to help reduce the risk of regressions even more.
| // Logical node index 2 maps to chunk array index 1 (the trailing BasicChunk). | ||
| cursor.enterNode(2); | ||
|
|
||
| // enterField drives getField, which on main looks up the parent at the |
There was a problem hiding this comment.
this code is going to be on main soon. If you want to comment on some specific comparison between this branch and main which will no longer apply after the PR merges, you can do so in a PR comment, or if putting it in the code, you need to phrase it differently.
Anyway, thanks for the explanation.
| // enterField drives getField, which on main looks up the parent at the | ||
| // logical node index (2) instead of the chunk array index (1). siblings[2] | ||
| // is out of bounds in the 2-chunk array, so the cast returns undefined and | ||
| // the subsequent field read throws. |
There was a problem hiding this comment.
When you say "looks up the parent", that's a bit confusing.
I think this is coming from the use of the variable name "parent" in get field, which is also rather poor.
Our trees and cursors do provide the ability to look up the parents of nodes, which sounds like what is being done here, but isn't.
Its instead simply looking up the current node, which it happens to call parent, since it's the parent of the field that method is operating on.
Also, you should make it clear why it was wrong to use a node index.
| // enterField drives getField, which on main looks up the parent at the | |
| // logical node index (2) instead of the chunk array index (1). siblings[2] | |
| // is out of bounds in the 2-chunk array, so the cast returns undefined and | |
| // the subsequent field read throws. | |
| // enterField uses getField, which must looks up the current node (who's field should be gotten). | |
| // This must handle when the index of the node does not match the index of the chunk containing that node. | |
| // This test is a node index 2 and chunk index 1. |
| // is out of bounds in the 2-chunk array, so the cast returns undefined and | ||
| // the subsequent field read throws. | ||
| cursor.enterField(subField); | ||
| assert.doesNotThrow(() => cursor.getFieldLength()); |
There was a problem hiding this comment.
Rather than simply asserting it doesn't throw, it would be better to assert that enter field actually worked correctly (its easy to have bugs that don't throw).
Asserting the field length is 1, and that if you enter that node you get a value of 42 should do it I think.
| @@ -355,6 +367,8 @@ export class BasicChunkCursor extends SynchronousCursor implements ChunkedCursor | |||
|
|
|||
| this.siblingStack.push(this.siblings); | |||
| this.indexStack.push(this.index); | |||
| this.indexOfChunkStack.push(this.indexOfChunk); | |||
| this.indexWithinChunkStack.push(this.indexWithinChunk); | |||
There was a problem hiding this comment.
The places in the code where we edit these parallel arrays could call our validation method afterward ( see https://github.com/microsoft/FluidFramework/pull/27084/changes#r3103181969 ) which could assert they are properly in sync to catch future similar bugs.
| @@ -169,7 +169,7 @@ export class BasicChunkCursor extends SynchronousCursor implements ChunkedCursor | |||
| // Compute the number of nodes deep the current depth is. | |||
| // We want the floor of the result, which can computed using a bitwise shift assuming the depth is less than 2^31, which seems safe. | |||
| // eslint-disable-next-line no-bitwise | |||
| const halfHeight = (this.siblingStack.length + 1) >> 1; | |||
| const halfHeight = this.siblingStack.length >> 1; | |||
There was a problem hiding this comment.
Was the old code wrong, but just never getting called in cases where that caused an assert before?
If so, https://github.com/microsoft/FluidFramework/pull/27084/changes#r3103181969 to make it easy to run this in more places seems extra valuable. I think that might have caught this bug.
Summary
basicChunkCursor currently doesn't handle tracking the chunk index around multi-node chunks correctly. This problem shows up in chunkedForest, which extends this cursor.
Files changed
basicChunk.tschunkIndexis pushed onto the stack andgetFieldnow useschunkIndexto find the parent insiblingStackbasicChunk.spec.tsUniformChunkfollowed by aBasicChunkwith a subfieldWhat surfaces the bug
Root field ├─ chunks[0] UniformChunk (nodes 0, 1) │ ├─ 10 │ └─ 20 └─ chunks[1] BasicChunk (node 2) └─ subField └─ 42 siblingStack when getField is called after navigating to subField: [ UniformChunk(10,20) , BasicChunk "Trailing" ]When a field contains multi-node chunks followed by a
BasicChunkcontaining a subfield, entering that subfield withenterFieldwill callgetFieldwhich searches for the parent of this subfield in thesiblingStack. Currently this uses the flat node index, which equals the count of all nodes in the field. The problem is thatsiblingStackcontains the chunks and not the flat nodes. So whengetFieldis called using an index based off the flat nodes it will incorrectly index into the siblings array on that stack. In the case above we would attempt to get the parent BasicChunk at index 2, but that would beoob.Changes
chunkIndex
The proposed changes are to swap the use of a flat node index for
getFieldand usechunkIndex. This was being tracked before, but was a stale reference for what this fix requires. The flow for the current chunk index is:start at 0 -> increment as we change chunks -> enterField(subField) -> getField -> enterNode/firstNode -> push chunkIndex onto the stackThis causes
chunkIndexto not be on the stack when we need it. We would be looking at a stale reference for 1 level up. The change needed was pushingchunkIndexonto the stack during enterField/firstField so we have the neededchunkIndexon the stack whengetFieldneeds it.halfHeight
halfHeight balances
indexOfChunkStack.lengthagainstsiblingStack.length. The current code pushed once per node level. The new code pushes once per Field level, so the rounding isn't needed like before.New Test Coverage
Two unit tests in
basicChunk.spec.tsthat manually create a multi-node uniform chunk(not currently possible by the chunker) and then navigating into the subfield throws under the old indexing logic.