Skip to content

Add BZM2 API parity and runtime retune safety#53

Draft
recklessnode wants to merge 25 commits into256foundation:mainfrom
recklessnode:codex/bzm2-upstream-parity-followup
Draft

Add BZM2 API parity and runtime retune safety#53
recklessnode wants to merge 25 commits into256foundation:mainfrom
recklessnode:codex/bzm2-upstream-parity-followup

Conversation

@recklessnode
Copy link
Copy Markdown

Summary

This follow-up restores the remaining BZM2 parity work that was intentionally kept out of the core upstream integration PR.

It adds:

  • BZM2 chain summary API
  • BZM2 clock-report API
  • full runtime tuning / retune state exposure
  • saved operating-point validation / status reporting
  • missing BZM2 protocol regression tests
  • a runtime retune safety fix so the saved operating point is retained until a replacement retune is actually applied

Why this is separate

The core BZM2 integration PR was kept focused on:

  • board/ASIC integration
  • mining UART/TDM path
  • telemetry
  • startup calibration flow
  • generic documentation
  • upstream-scope cleanup

This branch carries the later parity work that exists in the internal repository but was intentionally not folded into the core PR.

What this adds

  • GET /api/v0/boards/{name}/bzm2/chain-summary
  • POST /api/v0/boards/{name}/bzm2/clock-report
  • associated board commands and API DTOs
  • runtime tuning state fields for:
    • saved operating-point reuse
    • retune requirement / pending state
    • desired voltage / clock / accept ratio targets
    • saved operating-point validation status and reasons
    • planner notes
  • protocol coverage for legacy wire-format invariants and parser resynchronization

Retune safety fix

This branch also fixes an operational safety issue in the richer runtime tuning layer:

  • runtime retune no longer invalidates and deletes the saved operating point before any replacement plan is actually applied
  • instead, the saved operating point is marked Pending with reasons and retained as the last known-good profile
  • the live tuning state still reports that the saved operating point should not be reused for the current run

That keeps the persisted calibration artifact available until a real retune executor exists.

Branch base

This branch is built on top of the already-rebased core BZM2 PR branch and therefore also sits on current upstream main (ece3334).

Validation

  • cargo test -p mujina-miner --message-format=human

Result:

  • 384 passed, 0 failed, 5 ignored
  • doctests: 3 passed, 0 failed, 2 ignored

Scope note

This branch already includes the same upstream-scope cleanup as the core PR:

  • no private-source doc references
  • no bzm2-debug binary
  • no synthetic virtual_device transport layer
  • tuning and board power logic moved out of asic/bzm2

Add the first upstream BZM2 integration slice:
- introduce the new ASIC module with protocol and mining thread support
- add a virtual BZM2 board and transport wiring for serial-backed devices
- register the board with the backplane and daemon startup path

This commit intentionally lands the core integration surface first. Telemetry, tuning, diagnostics, and broader API support follow in later commits.
Extend the initial BZM2 integration with:
- board telemetry and safety-state handling
- reusable control-plane abstractions for reset and power sequencing
- expanded UART opcode coverage and parser behavior
- end-to-end actor tests for dispatch and share reconstruction

This keeps the transport and board core from the first commit, then layers in the reusable hardware-control and validation surface.
Add the reusable UART-side PLL diagnostic path for BZM2.

This introduces the first clock-control surface needed for bring-up and debugging:
- PLL divider calculation and programming
- enable/disable control
- lock-state polling
- structured clock status reporting

The port note is added here so the remaining BZM2 docs can evolve in place with later functionality.
Add the standalone BZM2 debug binary and extend the clock-control path beyond PLL status.

This commit adds:
- the UART-focused debug CLI for live serial interaction
- DLL configuration and diagnostics alongside the existing PLL flow
- protocol tests covering the extra wire-format and parser behavior needed by the tooling

The port note and UART guide are updated in the same slice because they document the new bring-up and debug surface.
Build out the BZM2 board runtime beyond basic mining support.

This commit adds:
- the tuning planner and saved operating-point model
- startup calibration and replay of saved operating points
- broadcast-oriented bring-up helpers in the debug CLI
- the naming cleanup for BZM2 tuning concepts so the Rust surface is less coupled to legacy internal terminology

It also keeps the supporting operator docs in sync with the new tuning and bring-up behavior.
Surface BZM2 sensor telemetry through the existing API and board runtime.

This commit adds:
- passive DTS/VS telemetry publication into board state
- explicit on-demand DTS/VS query support
- the supporting API and debug-tooling integration for ASIC voltage and temperature reads

The accompanying docs stay with this slice because they explain the new telemetry endpoints and sensor naming.
Add the first generic chain-enumeration tooling for BZM2 and document the remaining reference-implementation gaps.

This commit adds:
- default-ID chain walk helpers
- the debug CLI command for serial chain enumeration
- the initial Blockscale/BZM2 roadmap document
- README links for the growing BZM2 documentation set

The conversation log from the local integration repo is intentionally excluded from the upstream branch.
Teach the BZM2 board runtime to enumerate chains at startup instead of relying entirely on static ASIC-count configuration.

This commit adds:
- startup bus enumeration from the default ASIC ID
- fallback handling when the runtime must use configured counts instead
- the related operator documentation updates

The local conversation log is carried temporarily so later local commits apply cleanly; it will be removed from the final PR branch before completion.
Apply the generic BZM2 rail and reset sequencing plan as part of board startup and shutdown.

This moves the runtime closer to a reusable hardware reference implementation by:
- running the configured bring-up plan before discovery and calibration
- reversing the same plan during shutdown
- documenting the boundary between generic sequencing and board-specific glue
Extend the board runtime from sequencing alone to active rail management.

This commit adds:
- board-state rail telemetry publication
- application of planner-generated per-domain voltages onto the configured rail control path
- persistence and replay of the applied domain-voltage state alongside the saved operating point
Add the physical engine-discovery path and use the discovered topology in the live runtime.

This commit adds:
- per-ASIC engine probing helpers and CLI support
- board/API publication of discovered engine maps
- live dispatch and share reconstruction against the discovered layout instead of a fixed default hole map
Adjust the BZM2 tuning path to account for discovered missing engines so throughput and operating-point decisions scale to imperfect silicon and mixed topologies.
Add the live runtime feedback path for BZM2 tuning.

This commit adds:
- per-thread, per-ASIC, and per-PLL runtime measurements
- feeding live throughput data back into the tuning planner
- persistent retune triggers and saved-operating-point validation state

The board runtime can now reason about tuning quality from live operation instead of startup-only assumptions.
Expose the highest-value BZM2 diagnostics and runtime status through the board-owned API surface.

This commit adds:
- live UART diagnostics routed through the active BZM2 thread
- chain summary and clock-report endpoints
- the supporting API contract types and regression coverage

The diagnostics follow the existing UART ownership rules by requiring the thread to be idle when they run.
Move the BZM2 and Blockscale-specific documentation into a dedicated docs/bzm2 subtree, add the missing integration/reference guides, and drop the local porting conversation log from the upstream branch.

This keeps the upstream-facing PR focused on reusable implementation and operator documentation rather than local development history.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant