Skip to content

Text node position offsets include consumed blockquote continuation prefixes #48

@adri1wald

Description

@adri1wald

Initial checklist

Affected packages and versions

mdast-util-from-markdown@2.0.2 (via remark-parse@11.0.0)

Steps to reproduce

Parse a blockquote with a continuation line:

import { unified } from 'unified'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'

const content =
  "Hello:\n\n" +
  "> *\"Quote line one.\"*\n" +
  "> Continuation line.\n\n" +
  "After."

const processor = unified().use(remarkParse).use(remarkRehype)
const tree = processor.runSync(processor.parse(content))

// Find the text node containing "Continuation line."
function findText(node) {
  if (node.type === 'text' && node.value.includes('Continuation'))
    return node
  for (const child of node.children || []) {
    const r = findText(child)
    if (r) return r
  }
}

const textNode = findText(tree)
const start = textNode.position.start.offset
const end = textNode.position.end.offset

console.log('value:', JSON.stringify(textNode.value))
console.log('value.length:', textNode.value.length)
console.log('offset span:', end - start)
console.log('source.slice(start, end):', JSON.stringify(content.slice(start, end)))

Output:

value: "\nContinuation line."
value.length: 20
offset span: 22
source.slice(start, end): "\n> Continuation line."

Expected behavior

source.slice(position.start.offset, position.end.offset) should equal node.value (offset span should match value length). This contract holds for other constructs like list items.

Actual behavior

The text node spans a blockquote continuation line boundary. The > prefix (2 chars) is correctly stripped from node.value, but position.start.offset / position.end.offset still reference the raw source range that includes the prefix.

Each continuation line within a text node's span adds +2 drift between the offset span and value length. For a 3-line blockquote this becomes +4, etc.

This makes it impossible to reliably map a known source offset to an index within node.value using offset - position.start.offset, which breaks downstream consumers that use positions for source mapping (e.g. rehype plugins that need to locate ranges within text nodes).

A possible fix: emit separate text nodes per continuation line so each node's position only spans content after the > prefix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    👎 phase/noPost cannot or will not be acted on🙅 no/wontfixThis is not (enough of) an issue for this project

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions