Skip to content

Parsing sections incorrectly in the presence of carriage return (\r) #145

@diazale

Description

@diazale

The parser isn't recognizing sections in pages where the newline character (\n) is paired with the carriage return character (\r), i.e. every newline is recorded as \r\n. Removing the \r seems to fix the parsing.

I ran into this while collecting revision data with the API for the page "Basques". Revision 3769768 has \r\n and the parser does not return any sections. The following revision (3799279) has \n instead for every new line, and the parser correctly identifies every section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions