Skip to content

@hono/otel: histogram buckets default to ms-scale, quantile queries don't work #1861

@superxiao

Description

@superxiao

Which middleware has the bug?

@hono/otel

What version of the middleware?

1.1.1

What version of Hono are you using?

4.12.14

What runtime/platform is your app running on? (with version if possible)

Node.js 24.15.0 (Railway)

What steps can reproduce the bug?

@hono/otel creates its http.server.request.duration histogram without
passing advice.explicitBucketBoundaries:

// packages/otel/src/index.ts
const histogram = getMeter(config).createHistogram(
  METRIC_HTTP_SERVER_REQUEST_DURATION,
  {
    unit: 's',
    description: '...',
  },
);

With no advice, the OTel SDK falls back to its static default bucket
boundaries, which are legacy ms-scale values:
[0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000]
(see sdk-metrics/src/view/Aggregation.ts).

Since PR #1784 correctly converts the recorded value to seconds, every
sub-second request (the common case) lands in the first bucket [0, 5s].

Minimal repro:

  1. Use @hono/otel >= 1.1.1 with any OTLP metric exporter.

  2. Serve any route that returns in <5 seconds (i.e. ~all routes).

  3. Query Prometheus / VictoriaMetrics / any backend:

    histogram_quantile(0.99,
      sum by(le) (rate(http_server_request_duration_seconds_bucket[5m]))
    )
    
  4. Observe the result is always ~4.95 regardless of actual latency.

What is the expected behavior?

histogram_quantile(0.99, ...) should approximate the real p99 latency.

For a service where requests typically take a few ms, p99 should be on the
order of tens-to-hundreds of milliseconds — not ~4.95 seconds.

What do you see instead?

histogram_quantile(0.99) reports ~4.95s for every route, because the SDK
interpolates linearly within the [0, 5s] bucket: 99% of the way through
that bucket is 0 + 5 * 0.99 = 4.95s.

Direct query on a sample histogram confirms the bucket layout:

le=0       count=0
le=5       count=29
le=10      count=29
le=25      count=29
... (all identical through le=10000)
le=+Inf    count=29

All 29 requests (mean latency 0.68ms per _sum / _count) sit in the first
bucket. histogram_quantile has no signal to differentiate fast from slow
within that bucket.

Additional information

The OTel JS SDK's static histogram default was inherited from the pre-stable-semconv era, when HTTP durations were recorded in milliseconds. Stable HTTP semconv later switched units to seconds, but the SDK default boundaries were never updated — instead, well-behaved instrumentations pass their own advice.explicitBucketBoundaries to override.

@opentelemetry/instrumentation-http does this correctly:

this._stableHttpServerDurationHistogram = this.meter.createHistogram(
  METRIC_HTTP_SERVER_REQUEST_DURATION,
  {
    description: '...',
    unit: 's',
    advice: {
      explicitBucketBoundaries: [
        0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1,
        2.5, 5, 7.5, 10,
      ],
    },
  },
);

@hono/otel needs the same advice block. PR #1784 fixed the value unit (dividing by 1000 before recording) but did not add bucket advice — a one-line oversight that leaves the histogram effectively unreadable via histogram_quantile for the common sub-second request case.

Proposed fix: add advice.explicitBucketBoundaries with the stable HTTP semconv defaults (same values instrumentation-http uses), so Hono apps and apps instrumented via instrumentation-http land on identical bucket grids and share PromQL queries / dashboards.

Backward compatibility: similar character to #1784 — which was itself a behavior change shipped as a patch. Any user relying on the current (broken) bucket layout for alerts would see le labels change. Users who already override via an SDK-level View are unaffected (View precedence wins over advice).

Workaround for users hitting this today — override at the SDK level via a View in your telemetry bootstrap:

import { AggregationType } from '@opentelemetry/sdk-metrics';

new NodeSDK({
  views: [{
    instrumentName: 'http.server.request.duration',
    aggregation: {
      type: AggregationType.EXPLICIT_BUCKET_HISTOGRAM,
      options: {
        boundaries: [
          0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1,
          2.5, 5, 7.5, 10,
        ],
      },
    },
  }],
  // ...
});

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions