Skip to content

Fix the Ord instance for Ident + some other small fixes #42

Open
yav wants to merge 55 commits intoharpocrates:masterfrom
GaloisInc:master
Open

Fix the Ord instance for Ident + some other small fixes #42
yav wants to merge 55 commits intoharpocrates:masterfrom
GaloisInc:master

Conversation

@yav
Copy link
Copy Markdown

@yav yav commented May 9, 2023

Previously the instance was incorrect because it'd cause an infinite loop.

This version rearranges the fields of the records to ensure that the
hash field is first, which makes it possible to derive Eq and Ord.

We also do a bunch of refactoring to use record notation instead of
constructor pattern matching, to make it easier to do similar refactoring
in the future.

Other fixes: updates to make tests work, updates to make things work with more recent Aeson and Prettyprinter

harpocrates and others added 30 commits September 4, 2019 20:41
In Rust 1.37, `catch`, `yield`, and `dyn` ceased to be "weak" keywords.
This simplifies the grammar a bit \o/.

`do catch { ... }` has turned into `try { ... }` and `async`-prefixed
blocks are now a thing.
Also re-jiggered the arguments of the `Closure` constructor to match
that  of the Rust AST.
This uses the new `await` keyword.
`TupleStructP`, `TupleP`, `SliceP`, all used to accept `..` in them. The
parsing rules were awful, and the fields were confusing. Now, `..` is
its own pattern (although semantically it doesn't make sense outside of
the cases I just listed).

Also, a `ParenP` was added. I have not fixed `Resolve` yet to take
advantage of this.
Rust got full-blown support for or-patterns (see [RFC 2535][0]). This
means a couple changes:

  * `OrP` is a new variant of `Pat`
  * `WhileLet`, `IfLet`, `Arm` now just take a `Pat` (instead of a list)
  * in the parser, or-patterns are not allowed everywhere that regular
    patterns are!

Tests cases were heavily inspired by [the PR that implemented the RFC][1].

[0]: https://github.com/rust-lang/rfcs/blob/master/text/2535-or-patterns.md#grammar
[1]: rust-lang/rust#63693
Whenever a context doesn't support or-patterns out of the box, the trick
is to add an extra set of parentheses around the pattern.
  * Updated incorrect or misformatted Haddock docstrings
  * Removed trailing spaces from files
  * Updated the copyright year
  * Fixed up the `.cabal` file (more warnings enabled, tested-with)
`Fn` and `MethodSig` moved constness, safety, abi, and now asyncness
into a new type called `FnHeader`. We do the same. I've also started
fixing the `rustc-tests`, but they still don't pass.
Variety of fixes to the JSON expected, but also a good set of
improvements to the error messages for failed `rustc-tests` cases

  * stack traces
  * suggestions for keys
  * array bounds (on out of bound)
  * We are gradually moving over the new AST of `Generics`. This
    commit, we cleaned up the naming of generic bounds

  * Fix a nasty amibguity around `async {}` statements. The issue is the
    same as  `unsafe {}` statements - the parser can't know soon
    enough whether it is dealing with the beginning of a function
    definition or the beginning of an expression.

  * Fix final outstanding `rustc-tests` TODO's introduced during the
    1.37 bump
  * Finally introduce `GenericParam`:
      - `PathParameters` renamed to `GenericArgs`
      - `LifetimeDef` -> `LifetimeParam` variant of `GenericParam`
      - `TyParam` -> `TypeParam` variant of `GenericParam`

  * `MethodCall` takes a path segment (although the parsing is still
    very incomplete in this area)

  * `rustc-tests` on the committed `sample-sources` works!!!
The existing parsing, printing, and test code has been adjusted, but no
work was done to support the new constructor for bound constraints.
The restriction that statement items should only have inherited or
public visibility has been lifted (although I'm not sure what  the
visibility means at all...).
  * no longer require an ordering on entries of generics
  * parse/print/resolve const arguments and parameters
  * parse/print/resolve bound constraints
  * amended `rustc-tests` to  work with the new constructs
These now build and run properly
  * Proper handling of single semicolon statements (and diffing)
  * `rustc-tests` now also check the exit code of `rustc` (instead of
    just checking whether its stdout is valid JSON)
  * Fix some incorrect handling of types with pluses
  * Support parsing macro definitions (that use the `macro` keyword)
  * Add attributes to fields and field patterns
  * Rework pretty-printing of path types to be.. prettier... in
    multi-line mode.

With these fixes almost all scraped files pass the `rustc-tests` tests
  * general (possible unnamed) function arguments are now only allowed
    in bare function types
  * macro items in braces can have a trailing `;`
  * fix some pretty-printing issues
I scraped 4400 more test cases from Rust's testsuite. With the
following fixes, only about 44 of the tests still fail.

  * some keywords were unreserved
  * trailing plus on bare trait objects
  * new ABIs
  * exclusive range patterns
  * bug around parsing of `if break { }`/`if yield { }`/`if return { }`
  * underscore crate import `extern crate foo as _`
  * initializer expressions are allowed on any enum variant
  * lexer is more permissive around accepted whitespace
  * lexer allows underscores in character literals
  * properly lex `/**/` as a comment
  * normalize windows newlines in inline-style comments
  * test for `OpaqueTy`, `OrP` in difference tests
Previously, only expressions could (and had to) put a `::`
discriminator between identifiers and generics (so as to disambiguate
with the less than operator). Now, type paths can do this too (although
they do not _have_ to).

The parsing paths for type and expression paths are now much similar.

Also fixed a bounds issue on trait aliases (`trait Foo = ?Send` is now
allowed).
This is motivated by two useful features I've been manually patching
into the testsuite for some time:

  * pointing the testsuite at a _different_ folder of sources
  * automatically deleting a source test case if `rustc` can't initially
    parse it
  * `static || { 1 };`, `async || { 1 };`, `async { 1 }`, `unsafe { 1 }`
    and company finally parse as statements! Along the way, I refactored
    and commented heavily the statement/expression-conflict-motivated
    rules.

  * `union::a + 1;`, `auto { x: 1 }`, and company also parse as statements!

  * `ItemMac` no longer takes an optional identifier - the _only_ valid
    form is `macro_rules! foo { ... }`. The grammar also reflects this.

  * abstract out some duplicate parsing code for lambda expressions,
    accept lambda expressions in more positions (esp. those with an
    explicit result type).

  * fix associativity of comparision operators

  * where bound predicates parse empty bound lists
  * invalid suffixes lead to parse errors, not crashes
  * replace `sep_by1T` with `sep_byT` where possible
  * allow `const _: <ty> = ...`
  * add support foreign macros
  * parse where clauses on trait aliases
  * support attributes on expressions inside of a `let`
  * support self crate renamings (`extern crate self as foo`)
  * take into account the crate root in the `QSelf` index
 * Edge case for parsing: `macro_rules` can be the name of a user defined
   macro, and can be called manually. Example: `macro_rules!("my call!")`.
   Parsing this is a bit more tricky though, due to the old style of macro
   definitions: `macro_rules! my_macro { ... }`.

 * Also added a top-level entry point into the path parser. Type paths are
   now strictly more general than all other paths, so it makes sense to
   use them as "general" paths.

 * Allow bare trait objects to start with lifetimes
Block expressions can be broken out of using `break 'lbl <expr?>`.
However, this requires blocks to be labelled. This commit adds

  * required AST changes for labelled block expressions
  * parsing of labelled block expressions
  * printing/resolving of labelled block expressions
  * adjusting all of the test cases and adding a couple new ones
  * parsing
  * printing/resolving
  * `rustc-tests`
Allow failures on nightly.
Tests can now build and pass on GHC 8.8
yav and others added 25 commits May 8, 2023 16:56
Previously the instance was incorrect because it'd cause an infinite loop.

This version rearranges the fields of the records to ensure that the
hash field is first, which makes it possible to derive Eq and Ord.

We also do a bunch of refactoring to use record notation instead of
constructor pattern matching, to make it easier to do similar refactoring
in the future.
This reverts commit 9ff9176.

Per the discussion in #6, having the `Eq` and `Ord` instances ignore the `raw`
field of `Ident` causes more trouble than it's worth, as it causes the parser
to incorrectly deem raw identifiers like `r#return` to be keywords. While we
could fix this issue by changing the parser, this would take quite a bit of
code changes to accomplish. As such, we revert the change here, and we make a
note in the Haddocks for the `Eq` and `Ord` instances to beware of the fact
that `raw` is taken into account.

After this change, the `rustc-tests` test suite passes once more. As such, this
change fixes #6.
Make tests pass, migrate to GitHub Actions
The previous lexer implementation in `Language.Rust.Parser.Lexer` was broken
for Unicode characters with sufficiently large codepoints, as the previous
implementation incorrectly attempted to port UTF-16–encoded codepoints over to
`alex`, which is UTF-8–encoded. Rather than try to fix the previous
implementation (which was based on old `rustc` code that is no longer used),
this ports the lexer to a new implementation that is based on the Rust
`unicode-xid` crate (which is how modern versions of `rustc` lex Unicode
characters). Specifically:

* This adapts `unicode-xid`'s lexer generation script to generate an
  `alex`-based lexer instead of a Rust-based one.

* The new lexer is generated to support codepoints from Unicode 15.1.0.
  (It is unclear which exact Unicode version the previous lexer targeted, but
  given that it was last updated in 2016, it was likely quite an old version.)

* I have verified that the new lexer can lex exotic Unicode characters such as
  `𝑂` and `𐌝` by adding them as regression tests.

Fixes #3.
Lexer: Properly support Unicode 15.1.0
…-2.1

Restrict `happy` version to less then 2.1
`happy-2.1.1` includes a fix for haskell/happy#320,
which was preventing `language-rust` from building. Now that this version of
`happy` is on Hackage, we no longer need to include such a restrictive upper
version bound on `happy`.
Allow building with `happy-2.1.1` or later
Remove one expression that is syntactically invalid, and uncomment another
expression that _is_ valid (with some minor tweaks).
Some documentation is better than no documentation.
Address leftover review comments from #12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants