Skip to content

fix: redact file keys and node IDs from telemetry error messages#356

Open
shaun0927 wants to merge 4 commits intoGLips:mainfrom
shaun0927:fix/redact-telemetry-identifiers
Open

fix: redact file keys and node IDs from telemetry error messages#356
shaun0927 wants to merge 4 commits intoGLips:mainfrom
shaun0927:fix/redact-telemetry-identifiers

Conversation

@shaun0927
Copy link
Copy Markdown

@shaun0927 shaun0927 commented Apr 16, 2026

Summary

Closes #354

  • Redacts Figma file keys from /files/<fileKey> and /images/<fileKey> fragments before telemetry leaves the process
  • Redacts node identifiers from node-id=..., ids=..., and Node <nodeId> was not found message shapes
  • Adds focused regression tests for the validated leak paths, including image render endpoint errors that carry both a file key and ids=...

This preserves the current telemetry design (including actionable error analytics) while bringing the implementation back in line with the privacy claim from #342.

Why this scope

I kept the patch intentionally small:

  • no event schema changes
  • no product-level behavior changes
  • no exception-pipeline refactor

That should make it easier to merge even if #346 continues evolving the broader PostHog integration.

Test plan

  • ./node_modules/.bin/vitest run src/tests/telemetry-redaction.test.ts
  • ./node_modules/.bin/tsc --noEmit
  • ./node_modules/.bin/eslint src/tests/telemetry-redaction.test.ts
  • Full pnpm test
  • Validate sanitized events in PostHog after deployment

The telemetry rollout explicitly promised that file IDs would not be sent, but the current implementation forwards raw error messages after only credential redaction. That leaks file keys and node IDs through actionable endpoint and missing-node messages. This change redacts those identifier patterns before events leave the process and adds focused regression tests.

Constraint: Error messages still need enough structure for grouping and debugging after identifier redaction
Rejected: Drop error_message entirely | maintainers are actively investing in error analytics and still need sanitized context
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any new telemetry pathway that sends error text must pass through the same identifier redaction rules
Tested: ./node_modules/.bin/vitest run src/tests/telemetry-redaction.test.ts; ./node_modules/.bin/tsc --noEmit; ./node_modules/.bin/eslint src/telemetry/client.ts src/tests/telemetry-redaction.test.ts
Not-tested: Full pnpm test suite; downstream PostHog dashboards after redaction
The initial regression tests covered file fetch and missing-node messages, but the same identifier leakage can also happen through image render endpoints that use `/images/<fileKey>?ids=...`. This adds that shape explicitly so the sanitizer contract matches the real error surfaces.

Constraint: The tests should validate sanitized output without depending on live PostHog traffic
Rejected: Fold this into a single broad snapshot test | targeted cases are clearer and easier to maintain as new error formats appear
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: New telemetry error-message tests should cover each distinct identifier-bearing message shape that can reach captureEvent
Tested: ./node_modules/.bin/vitest run src/tests/telemetry-redaction.test.ts; ./node_modules/.bin/tsc --noEmit; ./node_modules/.bin/eslint src/tests/telemetry-redaction.test.ts
Not-tested: Full pnpm test suite; live PostHog ingestion after deployment
The sanitizer claims to handle both node-id and ids query parameters, so add explicit coverage for the ids shape too. That makes the branch easier to review and reduces the risk of a later regex tweak only preserving the currently asserted case.

Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep telemetry redaction tests aligned with every identifier shape the sanitizer advertises
Tested: /Users/jh0927/Workspace/figma-context-mcp-diff-audit/node_modules/.bin/vitest run src/tests/telemetry-redaction.test.ts; /Users/jh0927/Workspace/figma-context-mcp-diff-audit/node_modules/.bin/eslint src/tests/telemetry-redaction.test.ts
Not-tested: Full pnpm test suite
@GLips
Copy link
Copy Markdown
Owner

GLips commented Apr 21, 2026

Thanks for catching this — the privacy gap is real and worth closing. I'd like to take it in a slightly different shape before merging though.

Redact at the source, not at the telemetry boundary. The codebase already has structured error construction (services/errors/, HttpError, tagError({ category })). The clean fix is to have errors carry a category plus a safe shape, and have telemetry send something like error_category: "forbidden" + status: 403 + a sanitized template — rather than forwarding error.message and trying to scrub it on the way out. The regex approach perpetually chases new error string shapes, and telemetry breakage is silent by design (events still flow, just with leaked IDs again), so a future copy edit to a message would silently re-open the leak with no test failure.

Tests should exercise the real leak paths. The current tests construct synthetic strings shaped like the real errors. That couples them to the regex implementation rather than the actual contract. Better: feed an actual HttpError from services/figma.ts and the actual missing-node error from extractors/design-extractor.ts through telemetry capture and assert on what gets captured. Then if those upstream messages change shape, the test catches the regression.

Concretely, the \bNode\s+I?\d+(?::|-)\d+... regex matches the current phrasing in design-extractor.ts:60. If anyone rewords it to "Could not find node X", redaction silently stops working.

If you're up for restructuring along those lines I'd be glad to merge it. Happy to keep the regex pass as a belt-and-braces fallback, but I'd want the primary fix to be structural so the privacy claim isn't one copy-edit away from being false again.

Maintainer feedback on GLips#356 asked for a structural fix: errors should carry
a safe shape, and telemetry should emit that — rather than forwarding
`error.message` and scrubbing it on the way out with a regex that chases
current phrasings.

This extends the existing error-meta contract with a `safe_message`
template. Producers that know their error strings can contain Figma file
keys or node IDs attach an identifier-free template at throw time:

- `fetch-json.ts` tags HTTP errors with `Fetch failed with status {code}`
- `fetch-json.ts` tags proxy/network wraps with `Network connection failed ({code})`
- `figma.ts` tags the 429, 403, and generic request wraps with endpoint-free templates
- `design-extractor.ts` tags the missing-node path with an ID-free template

Telemetry's `captureEvent` now reads `meta.safe_message` in preference to
`error.message`. The regex pass in `client.ts` stays in place as a
belt-and-braces fallback for any error path that forgets to tag a template
(or that we don't own, e.g. third-party libraries).

Crucially, the test suite no longer constructs synthetic strings shaped
like the real errors. It drives the actual producers — a mocked 403
through `FigmaService.getRawFile` / `getNodeRenderUrls`, and the real
null-node path through `simplifyRawFigmaObject` — and asserts on what
telemetry would capture. That way a future reword of any upstream error
message (e.g. "Could not find node X") cannot silently reopen the leak:
the contract the tests enforce is the template + category, not the regex.

The original untagged-error regression test stays as a fourth case,
explicitly covering the fallback path.

Constraint: Keep user-facing `error.message` content actionable (endpoint context for 403, proxy hint for network errors) — only redact in the telemetry view
Rejected: Drop `error_message` entirely from telemetry | maintainers want actionable error analytics; templated strings preserve that without identifiers
Rejected: Scrub the message after the fact only | silent breakage on any message reword is the specific failure mode we're closing
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any new thrown error whose message can contain Figma file keys, node IDs, or endpoint paths must attach a `safe_message` template via `tagError`
Tested: ./node_modules/.bin/vitest run src/tests/telemetry-redaction.test.ts src/tests/error-meta.test.ts; ./node_modules/.bin/tsc --noEmit; ./node_modules/.bin/eslint src/telemetry/client.ts src/telemetry/capture.ts src/utils/error-meta.ts src/utils/fetch-json.ts src/services/figma.ts src/extractors/design-extractor.ts src/tests/telemetry-redaction.test.ts
Not-tested: Full pnpm test suite (unrelated pre-existing stdio/server `spawn tsx ENOENT` failures present on the branch before this change); live PostHog ingestion after deployment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telemetry error_message can include file keys and node IDs

2 participants