Skip to content

fix(drilldown): remove unknown_service from non-service metrics#232

Merged
szibis merged 3 commits intomainfrom
codex/volume-no-unknown-service
Apr 21, 2026
Merged

fix(drilldown): remove unknown_service from non-service metrics#232
szibis merged 3 commits intomainfrom
codex/volume-no-unknown-service

Conversation

@szibis
Copy link
Copy Markdown
Collaborator

@szibis szibis commented Apr 21, 2026

What

  • avoid injecting synthetic service_name=unknown_service on query/query_range metric responses when there is no real service signal in the metric labelset
  • suppress high-cardinality timestamp terminal fields (timestamp_end, observed_timestamp_end) from detected_fields output to stabilize Drilldown field discovery
  • add unit and e2e regressions for both behaviors and document the contracts

Why

  • Grafana Explore metric queries like sum by(cluster)(rate(...)) should not gain synthetic service labels that Loki would not return
  • Drilldown fields should not surface timestamp terminal keys that trigger expensive backend paths and intermittent no-data behavior

Proof

Before (local compose):

  • query_range metric response included {"cluster":"us-east-1","service_name":"unknown_service"}

After:

  • same query returns metric labels {"cluster":"us-east-1"}

Validation

  • go test ./...
  • go test -tags=e2e ./test/e2e-compat -run "TestDrilldown_DetectedFieldsMatchStructuredLogs|TestDrilldown_DetectedFieldsSuppressTimestampTerminalFields|TestDrilldown_StatsQueryDoesNotInjectUnknownServiceName" -count=1 -v

@szibis szibis force-pushed the codex/volume-no-unknown-service branch from 94d9740 to 2cd04a5 Compare April 21, 2026 17:07
@github-actions github-actions Bot added size/M Medium change scope/proxy Proxy core scope/docs Documentation scope/tests Tests bugfix Bug fix labels Apr 21, 2026
@github-actions github-actions Bot added size/L Large change and removed size/M Medium change labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal Base PR Delta
Test count 1981 1982 1
Coverage 88.4% 88.4% 0.0% (stable)

Compatibility

Track Base PR Delta
Loki API 100.0% 11/11 (100.0%) 0.0% (stable)
Logs Drilldown 100.0% 17/17 (100.0%) 0.0% (stable)
VictoriaLogs 100.0% 11/11 (100.0%) 0.0% (stable)

Compatibility components

Track Component Base PR Delta
Loki API label_values 2/2 (100.0%) 2/2 (100.0%) 0.0% (stable)
Loki API labels 4/4 (100.0%) 4/4 (100.0%) 0.0% (stable)
Loki API metrics 2/2 (100.0%) 2/2 (100.0%) 0.0% (stable)
Loki API otel 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Loki API query_range 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Loki API series 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown detected_fields 11/11 (100.0%) 11/11 (100.0%) 0.0% (stable)
Logs Drilldown label_values 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown level_volume 2/2 (100.0%) 2/2 (100.0%) 0.0% (stable)
Logs Drilldown patterns 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown service_logs 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
Logs Drilldown service_selection 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs detected_fields 4/4 (100.0%) 4/4 (100.0%) 0.0% (stable)
VictoriaLogs field_values 3/3 (100.0%) 3/3 (100.0%) 0.0% (stable)
VictoriaLogs index_stats 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs stream_translation 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs synthetic_labels 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)
VictoriaLogs volume_range 1/1 (100.0%) 1/1 (100.0%) 0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal Base PR Delta
QueryRange cache-hit CPU cost 1306.0 ns/op 1341.0 ns/op +2.7% (stable)
QueryRange cache-hit memory 200.0 B/op 200.0 B/op 0.0% (stable)
QueryRange cache-hit allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
QueryRange cache-bypass CPU cost 1654.0 ns/op 1671.0 ns/op +1.0% (stable)
QueryRange cache-bypass memory 272.0 B/op 274.0 B/op +0.7% (stable)
QueryRange cache-bypass allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
Labels cache-hit CPU cost 700.9 ns/op 698.1 ns/op -0.4% (stable)
Labels cache-hit memory 48.0 B/op 48.0 B/op 0.0% (stable)
Labels cache-hit allocations 3.0 allocs/op 3.0 allocs/op 0.0% (stable)
Labels cache-bypass CPU cost 868.6 ns/op 854.3 ns/op -1.6% (stable)
Labels cache-bypass memory 53.0 B/op 53.0 B/op 0.0% (stable)
Labels cache-bypass allocations 3.0 allocs/op 3.0 allocs/op 0.0% (stable)
High-concurrency throughput 116721.0 req/s 132722.0 req/s +13.7% (stable)
High-concurrency memory growth 0.4 MB 0.4 MB 0.0% (stable)

State

  • Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
  • This is a delta report, not a release gate by itself. Required checks still decide merge safety.
  • Performance is a smoke comparison, not a full benchmark lab run.
  • Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

@szibis szibis merged commit ac7f76c into main Apr 21, 2026
46 checks passed
@szibis szibis deleted the codex/volume-no-unknown-service branch April 21, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix scope/docs Documentation scope/proxy Proxy core scope/tests Tests size/L Large change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant