Skip to content

feat(lsp): UX additions — completions, debouncing, outlines, tests#1371

Draft
ciaran-matthew-dunne wants to merge 15 commits intoDeducteam:masterfrom
ciaran-matthew-dunne:pr/lsp-ux
Draft

feat(lsp): UX additions — completions, debouncing, outlines, tests#1371
ciaran-matthew-dunne wants to merge 15 commits intoDeducteam:masterfrom
ciaran-matthew-dunne:pr/lsp-ux

Conversation

@ciaran-matthew-dunne
Copy link
Copy Markdown
Contributor

Summary

Builds on #1366 (LSP correctness fixes). Adds user-facing features + a Python integration test suite. Seven logical commits; each stands alone, builds cleanly, and has tests alongside it.

  1. Python integration test harness — stdlib-only, no pip deps. Runnable via make test_lsp or python3 -m tests.lsp. Grows to 67 tests over the rest of the series.
  2. Unbuffered stdin + didChange debouncing — rewrites lsp_io to own its own stdin buffer (OCaml's Stdlib reads ahead into an opaque buffer that Unix.select can't see, which defeats peek-based debouncing). do_change now skips the re-check when a later didChange for the same URI is already queued; the trailing change always triggers its own re-check.
  3. Drop workspace/didChangeWatchedModule NOOP stub — no LSP method by that name.
  4. initialize honours rootUri / workspaceFolders and reads client capabilities — applies package config at workspace open (previously only per-file on didOpen). Reads snippetSupport and hierarchicalDocumentSymbolSupport into module-level refs used by (5) and (7).
  5. Hierarchical DocumentSymbol[] — nested outlines (Enum + EnumMember children for inductives) when the client advertises the capability. Flat SymbolInformation[] preserved as fallback. Exposes Pure.Command.get_elt so other consumers (MCP outlines, …) can pattern-match on declaration forms.
  6. Go-to-definition on module paths in require / require … as / open — click the module name, jump to its source file. New Syntax.fold_paths + Pure.path_rangemap + Lp_doc.path_map; do_definition falls back to this map when the click isn't on a known symbol.
  7. textDocument/completion with resolve + snippets — in-scope symbols with lazy type detail via completionItem/resolve (keeps the initial list small on large signatures). Inside a proof: tactic keywords with TextMate-style insertText snippets (capability-gated), plus the focused goal's hypotheses as Variable items. Pure.set_print_state ensures qualified names render correctly in the resolve response.

Dependencies

Stacked on #1366. This PR's commit list below includes #1366's 6 commits (already reviewed there). New review starts at commit 7, LSP: Python integration test harness.

Test plan

  • dune build --only-packages lambdapi @install clean.
  • make test_lsp — 67 tests pass.
  • Manual: open a file with require open Stdlib.Nat; in Zed/VS Code; verify go-to-def on Nat jumps to Nat.lp.
  • Manual: trigger completion inside a proof — tactic keywords appear, snippet insertion works (with a client that supports it), hypotheses of the focused goal are listed.
  • Manual: hold down a key / paste a large block; check server logs show debounce skips and only the trailing change triggers a re-check.
  • Manual: document outline in Zed shows an inductive type with its constructors as nested children.

Draft until #1366 merges.

path_of_file iterates every library mapping and keeps the one whose
filesystem prefix is the longest match. Inside the update branch, the
longer module-path prefix (mp) was being paired with the SHORTER
filesystem prefix (p) from the previous candidate instead of the new,
longer one (fp):

    | Some(_, p) ->
        if String.(length p < length fp) then mapping := Some(mp, p)
                                                              ^
                                                              should be fp

Consequence: for <root>/Stdlib/Nat.lp under [Stdlib] -> <root>/Stdlib,
the chosen mp was ["Stdlib"] but the stored fs prefix stayed <root>.
The remainder computation saw "Stdlib/Nat" and the final module path
came out as ["Stdlib"; "Stdlib"; "Nat"].

Via sym_path, this doubled every qualified reference the signature
emitted; in the LSP it appeared as go-to-def URIs with a duplicated
directory component.
`Pure.handle_tactic` prepended `Pos.popt_to_string p ^ " "` to the
error message before returning `Tac_Error(Some p, ...)`. The LSP
diagnostic already carries the position as its range, so clients
rendered the file:line:column twice — once as the underline and again
inside the message body.

Mirror `handle_command` and `end_proof`, which return the raw message
unchanged.
Four correctness bugs in the same two handlers. Three of them share a
root cause: the code carried its own hand-rolled conversions (URI vs
path, short-name vs qualified lookup, 1-based vs 0-based columns)
instead of reusing the helpers that the rest of the codebase already
uses.

1. External symbols landed at line 1 of the target file. do_definition
   compared sym_pos.fname (raw path) against doc.uri (URI). The check
   always failed for external symbols and the code fell back to a
   synthesized line-1 position. Use sym_pos directly when present, and
   normalize mk_definfo's file argument into a URI.

2. Qualified references M.foo resolved wrong. A local foo shadowed the
   qualified target, and aliased paths (require M as A; A.foo) never
   resolved at all. The hand-rolled lookup tried short-name first and
   didn't know about aliases. Route both handlers through a new
   Pure.find_sym that wraps Sig_state.find_sym, which already has the
   correct precedence and handles aliases.

3. Hover returned a range shifted one column to the left. hover_symInfo
   was the only handler still building its range JSON by hand, with
   its own column math that had drifted out of sync with the rest of
   the server (a stale `column - 1` from when columns were 1-based).
   Replace the hand-rolled block with a new helper
   LSP.mk_range_of_interval — the same Range.t -> LSP-JSON conversion
   that diagnostics and go-to-def already go through.

4. Error-position fallback sent -1 on the wire. When initial_state
   raised, new_doc synthesized a fallback Pos.pos with start_line = 0.
   The 1-based -> 0-based conversion turned that into -1, which strict
   clients reject (LSP positions are uinteger). Introduce
   Pos.file_start for the fallback, and clamp negative values to 0 in
   mk_range / mk_range_of_interval as a defense-in-depth measure.

Drive-by cleanups in the same handlers:
- Generalize get_symbol from qident-specific to 'a RangeMap.t.
- Wrap the hover/definition dispatch in try _ with _ -> null so a
  bogus request can't hang the handler.
- Replace process_input's fake-goal_info fallback with plain null.
- Drop the msg_fail helper (unused after the rewrite).
- Use Lplib.String.is_prefix for the file:// URI guard, matching
  Lp_doc and Common.Library.

UTF-16 note: LSP columns are UTF-16 code units; Lambdapi columns are
codepoints. For every character Lambdapi syntax actually uses (ASCII
plus BMP glyphs like ℕ → ≔ ↪) they coincide, so no conversion is
needed. Supplementary-plane characters would require negotiation, left
as a follow-up.
`Pure.initial_state` does `Time.restore t0` and then re-applies the
package config for the opened file. That installs the file's library
mappings but wipes out any mappings belonging to other open documents.

As a result, opening doc A, then opening doc B (from a different
package), caused subsequent go-to-def / hover / document-symbol
requests on A to call `Library.file_of_path` against B's mappings —
returning paths that collapse A's module path back to the lib root.
Opening Nat.lp from Binom.lp and then querying inside Binom.lp, for
example, produced URIs like `<lib_root>/Fermat/Binom.lp` instead of
the real `<repo>/benchmarks/fermat/Binom.lp`.

Expose `Pure.restore_time` and call it at the start of
`do_definition`, `hover_symInfo`, and `do_symbols` so each request
reads the timed globals from the target document's saved state rather
than whatever was most recently opened.
do_close removed the URI from doc_table but not from completed_table,
so every file the client had ever opened kept a fully-elaborated
Lp_doc.t — including the pinned signature state and every term the
file produced — alive for the lifetime of the LSP process. In a long
editing session this grows without bound: a modest file can pin tens
of MB of signature data, and nothing ever releases it.

Remove from both tables.
…ommand

Every successful top-level command and tactic emits a severity-4 "hint"
diagnostic so the client can draw a success indicator. The server was
attaching that diagnostic to the range of the WHOLE command or tactic:

  symbol foo : Nat → Nat ≔
    λ x, add x x;         <-- entire span underlined as "OK"

  opaque symbol thm : … ≔ begin
    assume h; apply h      <-- whole proof underlined per tactic
  end;

That range is wrong: the success applies to the *thing being declared*
or *the action that succeeded*, not to the body that follows. Wide
underlines over bodies and proofs also compete visually with the
tactic-error ranges a client cares about.

Attach each hint to the token that actually names the action:
- symbol / inductive / opaque: the declared name.
- rule / unif_rule / coerce_rule / notation / builtin / require /
  require … as / open: the command keyword.
- query commands (assert, compute, type, print, …): the query keyword.
- tactics (refine, assume, rewrite, reflexivity, …): the tactic
  keyword.

Mechanism: new Pure.keyword_pos (takes the start of a Pos.pos and a
length), Pure.query_keyword_len, and Command.get_focus_pos /
Tactic.get_focus_pos pure-syntax helpers. Lp_doc.process_cmd /
process_pstep feed the hint diagnostic from get_focus_pos instead of
get_pos; every other diagnostic path (errors, goal info) is unchanged.
Adds a stdlib-only Python harness for exercising the LSP server over
JSON-RPC/stdio. No pytest / pip dependencies; runnable via
`make test_lsp` or `python3 -m tests.lsp`. The harness prefers a
locally-built binary over PATH so the suite always exercises the
current tree.

  tests/lsp/client.py      LSP client over JSON-RPC/stdio
  tests/lsp/source.py      pattern-based position finder
  tests/lsp/base.py        LSPTestCase with per-test server
  tests/lsp/fixtures/      small focused .lp documents

Initial test modules cover features already landed:

  test_lifecycle    initialize / didOpen / didClose
  test_diagnostics  error diagnostics + focused success hints
  test_hover        plain-string contents, stdlib hovers
  test_definition   local / stdlib / qualified-name go-to-def
  test_goals        proof/goals
  test_invariants   range validity, sequential stability
  test_bugs_observed  regression guards for cross-file go-to-def

Subsequent commits add test modules alongside the features they
exercise.

- .gitignore: ignore Python bytecode.
- Makefile: new `test_lsp` target, invoked from `test`.

Stdlib-dependent tests are guarded by @requires_stdlib and skip when
it is absent.
Two connected changes that make the server responsive under rapid
typing bursts.

== lsp_io: own the stdin byte buffer ==

The old reader used [Stdlib.input_line] / [really_input] on an
[in_channel] backed by stdin. [Stdlib] reads ahead into an opaque
buffer, so after a message is parsed there's often more data sitting
in the channel that [Unix.select] on the fd can't see — which makes
any peek-based debounce unable to fire on tight bursts.

Replace with a small hand-rolled reader over [Unix.stdin]: one
[Buffer.t] we own, [read_line_ub] and [read_exact] to carve messages
out of it, and [bytes_pending] that checks the buffer first and then
falls back to [Unix.select]. Signature of [read_request] changes from
taking an [in_channel] to unit. [src/lsp/dune] gains the [unix]
dependency.

== lp_lsp: debounce didChange ==

Under sustained typing the main loop would queue one full re-check
per keystroke and fall further behind. New behaviour:

- a [pending] queue of messages already read but not yet dispatched
- [drain_stdin_nonblocking ()] empties the IO buffer + fd into
  [pending] — called after every dispatch and at the start of
  [do_change]
- [read_next_or_block ()] pulls from [pending] first, then blocks
- [do_change] applies every content change to the in-memory text,
  checks [pending_change_for_uri], and skips [do_check_text] if a
  later didChange for the same URI is already in flight. Trailing
  change always triggers its own re-check so the final state is
  still published.

== Tests ==

[tests/lsp/test_incremental.py] covers didChange semantics — position
shifting, multi-change requests — and adds a TestDebounce case that
fires a 12-change burst and asserts the publishDiagnostics count is
strictly less than 12 while the final text still reflects the last
change. Stable across runs.
No LSP method by that name exists — typo for
[workspace/didChangeWatchedFiles], which we don't support either.
The case was a NOOP, so removing it is behaviour-preserving.
Two changes to [initialize] handling that were previously dead code.

== Apply package config on workspace open ==

Before: [Parsing.Package.apply_config] ran per-file on [didOpen],
meaning the server had no module mappings until the first file
arrived. Tools that issue workspace-level requests before any open
(documentSymbol on a freshly restored session, some clients' initial
hover probes) hit an empty mapping table.

Now: if [initialize] carries a [rootUri] (pre-3.6) or
[workspaceFolders] (3.6+), [apply_config] runs on each folder path
before the reply is sent. Per-file [apply_config] still runs on
[didOpen], so this is strictly a bootstrap.

== Capability flags read from clientCapabilities ==

[LSP.snippet_support] and [LSP.hierarchical_symbols] in [lsp_base]
are refs set from the client's advertised capabilities
([textDocument.completion.completionItem.snippetSupport] and
[textDocument.documentSymbol.hierarchicalDocumentSymbolSupport]).
Both default to [false] if the client is silent. Consumers land in
subsequent commits; introducing the refs here keeps the capability
plumbing as one reviewable step.

[initializationOptions] and the non-feature parts of
[clientCapabilities] are still ignored.

== Helpers ==

[json_path] walks a nested object safely; [path_of_file_uri] strips
[file://] and percent-decodes, matching [Common.Library] / [Lp_doc]
conventions.
Return a tree outline with nested children when the client advertises
[hierarchicalDocumentSymbolSupport], matching what modern editors
(Zed, VS Code) actually render. Flat [SymbolInformation[]] response
is preserved as a fallback for clients that don't advertise it.

Mapping from AST to outline:

  symbol name : …               → Function
  opaque symbol name : …        → Function
  inductive Foo … ≔
    | Ctor1 : … | Ctor2 : …     → Enum { EnumMember, EnumMember }

Require/open, rules, builtins, queries, and notations are deliberately
skipped — they don't introduce named, jump-to-able items.

== Plumbing ==

[Pure.Command.get_elt] exposes the parser-level [p_command_aux] so
the LSP can pattern-match on declaration forms without leaking the
full AST into the interface. Other clients (MCP outlines, future
editors) can reuse it.

== Tests ==

[tests/lsp/fixtures/inductive.lp] carries an inductive with two
constructors. [tests/lsp/test_symbols.py] asserts the hierarchical
shape (Enum + EnumMember children; leaf has no children) when the
client advertises the capability, and keeps flat-mode coverage for
the fallback path.
Feature: clicking a module name inside [require M], [require M as N],
or [open M] now navigates to that module's source file. Those
positions were previously invisible to the LSP — [Syntax.fold_idents]
only visits symbol identifiers, not module paths.

== Parser traversal ==

New [Syntax.fold_paths] walks every [p_path] appearing in require /
require-as / open commands.

== Pure / Lp_doc ==

[Pure.path_rangemap] builds a [Common.Path.t RangeMap.t] from a
command list — source range → module path — analogous to the existing
symbol [rangemap].

[Lp_doc.t] gains a [path_map] field alongside [map]; [check_text]
populates it after parsing. Other consumers (MCP, future outlines)
can pick it up.

== LSP handler ==

[do_definition] now falls back to the path_map when the click isn't
on a known symbol. The target file is resolved via
[Library.file_of_path], and [Pos.file_start] synthesises a (0, 0)
position at the head of that file. Returning to the declaration
itself would need line/col info we don't have without loading the
target, which this commit deliberately avoids.

== Tests ==

New [TestRequireOpenDefinition] case in [test_bugs_observed.py]:
load [imports.lp], click on [Nat] in [require open Stdlib.Nat], and
assert the resolved URI ends in [Nat.lp].
New feature: in-scope completion. Returns the set of visible symbols
and, when the cursor is inside a proof block, the tactic keyword set
and the focused goal's hypotheses. Advertised via the
[completionProvider] server capability with [resolveProvider: true]
and trigger character ['.'].

== Shape of the response ==

  Symbols       → CompletionItemKind [21 Constant] or [3 Function]
                  depending on [sym_prop] / whether rules are defined.
                  [detail] (pretty-printed type) is filled in lazily
                  on [completionItem/resolve] so the initial list
                  stays small on large signatures.
  Tactic kwds   → kind 14 Keyword, sortText prefix ["0"] so they
                  come before symbol completions in proof context.
                  When the client advertises snippetSupport,
                  [insertText] is a TextMate template (e.g.
                  [apply ${1:term}]); otherwise the raw label.
  Hypotheses    → kind 6 Variable, detail = the type string,
                  sortText prefix ["1"] so they follow tactics.

Inside a proof, a tactic keyword shadows any symbol with the same
name — the user almost certainly means the tactic. Defensive; the
parser already reserves them so this branch is currently
unreachable.

== Resolve ==

[completionItem/resolve] round-trips a [data] field carrying the
symbol's URI / path / name so the server can re-find the symbol in
the stored signature and pretty-print its type. Uses a new
[Pure.set_print_state] helper, which restores the timed state *and*
installs the printer's [Print.sig_state] so qualified names render
correctly.

== Drive-by ==

[hover_symInfo] grows a small [send] helper to mirror the style of
the rest of the handler.

== Tests ==

[tests/lsp/test_completion.py] covers:
- in-scope symbols are returned, with type in detail after resolve;
- tactic keywords appear only in proof context, with [insertText]
  emitted / suppressed per snippetSupport;
- hypotheses of the focused goal are included with the expected
  detail string.
Unknown requests previously got logged and silently dropped, leaving
strict JSON-RPC clients blocked waiting for a reply. Notifications
(no id) keep the "log and drop" behaviour per the LSP spec.
- client: guard [_pending]/[_next_id] with a lock and join reader
  threads on close so the integration suite no longer needs
  [-W ignore::ResourceWarning] to stay quiet.
- __main__: discover tests by glob instead of hardcoding the list,
  so new test modules are picked up automatically.
- New coverage: shutdown/exit flow, MethodNotFound reply (server
  and client side), workspaceFolders init, standard-mode goal_info
  leak check, UTF-16 column handling for hover/definition after
  multibyte identifiers, tighter debounce assertion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant