This document captures the standardized 8 go-no‑go gates, how to run them quickly (local / CI), expected evidence, residual risk watchlist, and the sign‑off template.
# Ensure virtualenv active if used
$env:AETHERRA_PROFILE='test'
$env:AETHERRA_QUIET='1'
python tools\run_go_no_go_gates.py --allArtifacts produced:
gate_results.json(structured machine JSON)gate_sign_off.md(markdown sign-off table + summary lines)
Exit code is non‑zero if any mandatory gate fails. HMR gate is marked manual-followup (🔧) unless strict manual is requested.
Strict manual enforcement (fail if any manual gate not fully validated):
python tools\run_go_no_go_gates.py --strict-manualRun a subset:
python tools\run_go_no_go_gates.py --gates launcher_smoke chat_sse_resumeValidates phased boot and core service registration.
Command (direct):
$env:AETHERRA_PROFILE='test'; $env:AETHERRA_QUIET='1'; python tools\os_smoke.pyPass: core services (memory_system, plugin_manager, aetherra_engine) present.
Checks envelope ordering and Last-Event-ID monotonic resume.
Minimal manual check:
curl "http://localhost:3012/api/ai/stream?message=ping&scratchpad_policy=redacted"Pass: status → policy (first 2–3), usage before final, resumed stream starts at prior_last_id+1.
Scripts & .aether signing strict, network policy.
$env:AETHERRA_SCRIPT_VERIFY_STRICT='1'; python tools\verify_aether_scripts.py --strict --output aether_static_report.mdOptionally set:
$env:AETHERRA_NET_STRICT='1'Pass: Unsigned scripts flagged (fail) in strict; disallowed outbound calls denied.
$env:AETHERRA_QFAC_MODE='hybrid'
python - <<'PY'
import asyncio, os
from Aetherra.aetherra_core.memory.qfac_integration import QFACMemorySystem
async def main():
q = QFACMemorySystem('_quick_qfac')
nid = await q.store_memory({'messages':['Hi'], 'kind':'conversation'})
data = await q.retrieve_memory(nid)
status = await q.get_system_status()
print('node', nid, 'retrieved', type(data).__name__, 'nodes=', status['node_statistics']['total_nodes'])
asyncio.run(main())
PYPass: store/retrieve success; status node_statistics populated; hybrid gracefully degrades if quantum backend absent.
Automation script only verifies controller presence & config metrics. Full validation:
- Enable HMR env (
AETHERRA_HMR_STRICT=1, optionalAETHERRA_HMR_ALLOWED_SOURCES) - Enqueue kernel task
{type: 'hmr_reload', data: {target: 'engine', source: 'path_or_module'}} - Inspect
.aetherra/hmr_audit.jsonlfor events: HMR_PREPARE → HMR_SWAP or HMR_ROLLBACK.
Pass: swap event or clean rollback with inflight drained.
Disabled by default; returns 501/403/disabled. Enable with token:
$env:AETHERRA_AGENTS_API_ENABLED='1'
$env:AETHERRA_AGENTS_API_REQUIRE_TOKEN='1'
$env:AETHERRA_AGENTS_API_TOKEN='dev'Pass: Disabled path blocked; enabled path returns orchestrator summary with token.
pytest -q tests/capabilities
python tools\spec_tests_gate.py
python tools\quality_gates.pyPass: tests succeed; spec gate passes (0 or 2 exit); coverage ≥ baseline & not dropped.
SSE or ask endpoints emit X-Aetherra-Policy header + policy event.
Optional DP flags:
$env:AETHERRA_DP_ENABLED='1'
$env:AETHERRA_DP_EPSILON='8.0'Pass: header parseable JSON, policy event present; DP keys appear when enabled.
| Area | Impact | Suggested Mitigation | Priority |
|---|---|---|---|
| Plugin manifest signing partial | Unsigned third-party risk | Enforce Ed25519 & revocation list | High |
| Sandbox best-effort | Escape risk | Process/container isolation | High |
| HMR phase-1 limits | Mid-flight swap risk | Keep strict + night-cycle windows | Med-High |
| Quantum bridge experimental | Flakiness | Default classical in prod | Med |
| Agents quotas absent | Resource pressure | Add per-agent budgets & metrics | Med |
| Chat no replay | Client mismatch | Document semantics | Low-Med |
| Network policy tuning | Surprising denies | Versioned allowlist tests | Med |
| Simple vector recall | Scale ceiling | Plan adapter abstraction (pgvector/FAISS) | Low |
Full rationale: see PR description or security notes.
| Area | Command / Endpoint | Expected |
|---|---|---|
| Kernel status | GET /api/kernel/status | running=true, sane queues |
| Kernel metrics | GET /metrics | inflight & HMR counters |
| Chat stream | /api/ai/stream | status→policy→usage→final |
| Lyrixa bridge | POST /api/lyrixa/chat | persona default, edit_plan synthesized |
| Security scripts | verify_aether_scripts.py --strict | OK or explicit FAIL lines |
| Agents API | /api/agents (off/on) | disabled → summary w/token |
| Memory health | /api/memory/status | coherence/branches metrics or fallback |
- Security alerts:
.aetherra/security/alerts.jsonlquiet - Kernel DLQ: minimal expired/dropped tasks
- HMR audit rotation within configured caps
- Memory coherence/drift stable
Copy into PR / Release notes (auto-populated in gate_sign_off.md):
Launcher smoke: ✅/❌ (log path/link)
Chat SSE v2 + resume: ✅/❌ (last_event_id tested)
Security strict (scripts/plugins/net): ✅/❌ (report link)
Memory (core + QFAC fallback): ✅/❌ (status snapshot)
HMR swap + audit: ✅/❌ (audit excerpt)
Agents API posture: ✅/❌ (off by default; enabled w/ token OK)
Spec→Tests & coverage no‑drop: ✅/❌ (coverage % vs baseline)
Policy/DP surfaced to clients: ✅/❌ (captured policy event)
gate_results.json example shape:
{
"_meta": {"profile": "test", "ts": 1730000000.123, "all_passed": true},
"launcher_smoke": {"ok": true, "manual": false, "duration_sec": 2.51, "details": {"services": ["aetherra_engine"], "missing": []}},
"hmr_quiesce": {"ok": true, "manual": true, "details": {"manual_followup": true, "reason": "hmr_controller not registered"}}
}gate_sign_off.md contains a table plus summary lines ready for PR insertion.
Why separate automation vs manual? HMR correctness depends on live inflight draining & audit semantics—safer to observe manually until strict gating matures.
Can I fail build on manual gate? Use --strict-manual flag in automation.
How to add a new gate? Extend GATES list in tools/run_go_no_go_gates.py with (name, async_func) returning (ok, details).
Maintainers: Keep this file aligned with any gate evolution. Update residual risk table as mitigations land.
These helper tools accelerate remediation of large numbers of failing .aether workflows without blocking main go/no‑go execution.
Added to aether.py to validate syntax/structure without executing side‑effects:
python aether.py --check path\to\workflow.aetherExit codes:
- 0: Parse OK
- 1: Structural / basic parse issue (line diagnostics printed)
Use in bulk (PowerShell example):
Get-ChildItem -Recurse -Filter *.aether | ForEach-Object { python aether.py --check $_.FullName | Out-Null }Generates machine + human summaries of failing workflows, preferring --check first, then executing for runtime issues.
Artifacts:
workflow_failures.json(includes per-file category & signature/risk data)workflow_failures.md(category table)
Key categories (heuristic): ParseError, SignatureMissing, RuntimeError, Timeout, NotImplemented, Other.
Sample run (limited to first 300 for speed):
python tools\classify_aether_workflow_failures.py --limit 300 --output workflow_failures.json --markdown workflow_failures.mdCI: A lightweight GitHub Actions workflow (workflow-classifier.yml) runs a capped sample and uploads artifacts (no hard fail yet).
Normalizes older forms (e.g., intent: → goal:) and cleans whitespace.
Dry-run with unified diffs:
python tools\migrate_legacy_aether.py path\to\workflows --dry-runApply in-place:
python tools\migrate_legacy_aether.py path\to\workflows --applyProduces migration_report.md when multiple files are processed.
- Run classifier → capture top categories & counts.
- Run migration (dry-run) → apply if high % convertible.
- Re-run classifier → note delta in failure rate.
- Bulk sign remaining unsigned (
tools/sign_aether.py file1.aether ...). - Address remaining ParseError patterns (extend parser or add transforms).
- Introduce suppression list only for intentional experimental scripts.
- Interpreter emits structured error codes (E_PARSE_UNBALANCED, E_RUNTIME_SERVICE_MISSING) for deterministic bucketing.
- Parallel classifier execution with concurrency limits.
- Historical trend snapshots under
.aetherra/workflow_classify_history/.
The interpreter now supports structured machine-friendly reporting for both parse-only and full execution paths.
Flags:
python aether.py --check --emit-error-code --json-status path\to\workflow.aetherOutput additions:
- Stderr line:
AETHER_ERROR_CODE:<int>(when--emit-error-codesupplied) - JSON line (stdout):
{ "ok": bool, "code": int, "code_name": "PARSE_ERROR", "file": "...", "phase": "parse|execute", "message": "...", "line": n }when--json-status
Current code table:
| Code | Name | Meaning |
|---|---|---|
| 0 | SUCCESS | Parsed / executed successfully |
| 1 | GENERIC_FAILURE | Legacy generic failure |
| 20 | PARSE_ERROR | Structural / syntax issue |
| 21 | RUNTIME_ERROR | Execution failed (uncaught) |
| 22 | SIGNATURE_ERROR | (Reserved) signature validation |
| 23 | TIMEOUT_ERROR | (Reserved) internal timeout |
| 24 | UNSUPPORTED_FEATURE | Feature not yet implemented |
| 25 | VALIDATION_ERROR | Semantic / pre-exec validation |
| 26 | IO_ERROR | File read / access issue |
| 27 | INTERNAL_ERROR | Unexpected interpreter crash |
tools/classify_aether_workflow_failures.py now:
- Requests structured JSON/ codes automatically.
- Falls back to heuristics if interpreter lacks flags.
- Supports concurrency:
--jobs N(default = CPU count). - Persists historical snapshots:
--history-dir .aetherra/workflow_history(default). - Generates rolling
trends.json(last 20 snapshots) for regression tracking.
Example (parallel run with history):
python tools\classify_aether_workflow_failures.py --jobs 8 --output workflow_failures.json --markdown workflow_failures.mdHistorical artifacts:
.aetherra/workflow_history/
20250912T101500_classification.json
20250912T111500_classification.json
trends.json
trends.json excerpt:
[
{"file": "20250912T101500_classification.json", "timestamp": "2025-09-12T10:15:00Z", "failed": 1820, "total": 2700},
{"file": "20250912T111500_classification.json", "timestamp": "2025-09-12T11:15:10Z", "failed": 1755, "total": 2700}
]tools/generate_parse_baseline.py provides a fast snapshot of parse health only:
python tools\generate_parse_baseline.py --output parse_baseline.jsonProduces:
{
"timestamp": "...Z",
"total": 2700,
"by_code": {"SUCCESS": 900, "PARSE_ERROR": 1800},
"failure_rate": 0.6667,
"files": [ {"path": "...", "code_name": "PARSE_ERROR", "line": 12, "message": "Parse error (assignment)..." } ]
}Two workflows:
workflow-classifier.yml(sample execution classification artifacts)workflow-parse-baseline.yml(full parse baseline artifact)
Use artifact diffing in future to hard-fail on regression deltas (e.g., PARSE_ERROR count increase > threshold).
- Add suppression list for intentional experimental scripts.
- Introduce semantic validation (e.g., unknown function names → VALIDATION_ERROR).
- Gate PRs on non-increasing PARSE_ERROR count after stabilization.
Recent hardening adds pre-execution semantic checks, deterministic failure fingerprints, opt-in suppression, and a parse regression gate.
During --check the interpreter now detects unknown function calls in:
- Standalone calls:
foo(bar) - Assignment expressions:
x: foo(bar) - Memory-prefixed calls:
memory: foo(bar)
If the function name is not a recognized built-in, parse still structurally succeeds but the exit status becomes VALIDATION_ERROR with line + message (first unknown only). This separates structural syntax correctness from semantic readiness.
Why: Prevents false "SUCCESS" baselines where workflows would later fail at runtime due to missing functions.
Classifier computes a short stable hash (first 16 hex chars of SHA-256) over:
code_name | line | first_error_message_line
Example (conceptual):
PARSE_ERROR|12|Parse error (assignment) line 12: goal
-> fingerprint: a1b2c3d4e5f67890
Fingerprints enable de-duplication and persistent suppression without path coupling.
Optional file: .aetherra/workflow_suppressions.txt
Format: one fingerprint per line. Lines starting with # ignored; inline comments allowed after whitespace.
Example:
# Experimental quantum workflows under redesign
a1b2c3d4e5f67890 # deprecated goal syntax
deadbeefcafe1234 # awaiting plugin migration
Suppressed failures are recategorized as Suppressed-<OriginalCategory> and counted separately (suppressed_failures field in JSON summary). They still surface in artifacts but can be excluded from gating decisions.
Guidelines:
- Only suppress when a remediation plan + owner exist.
- Remove entry immediately after fix merges.
- Avoid mass suppression (anti-signal). Prefer targeted migrations.
tools/parse_baseline_regression_gate.py compares a new parse baseline vs a reference and fails if both thresholds are exceeded for a targeted code (PARSE_ERROR, VALIDATION_ERROR):
- Absolute increase >
--abs-threshold(default 5) AND - Relative increase >
--rel-threshold(default 0.10 = 10%)
Usage:
python tools\parse_baseline_regression_gate.py --new parse_baseline.json --ref main_parse_baseline.jsonSample output:
{
"ok": false,
"timestamp": "2025-09-13T10:15:22.123456+00:00",
"new_counts": {"PARSE_ERROR": 1810, "VALIDATION_ERROR": 42},
"ref_counts": {"PARSE_ERROR": 1800, "VALIDATION_ERROR": 30},
"abs_threshold": 5,
"rel_threshold": 0.1,
"regressions": [
{"code": "VALIDATION_ERROR", "old": 30, "new": 42, "delta": 12, "relative_increase": 0.4}
]
}Exit codes:
| Code | Meaning |
|---|---|
| 0 | No regression detected |
| 1 | Regression (threshold exceeded) |
| 2 | Usage / input error |
Recommended CI Flow:
- Generate new baseline (parse-only) →
parse_baseline.json. - Download/reference main branch baseline (artifact cache) →
ref.json. - Run regression gate script.
- Fail PR if exit code = 1 (after stabilization period).
All new structured outputs use datetime.now(UTC).isoformat() eliminating naive utcnow() usage for clarity and future DST-safety in analytics.
- Auto-generate suppression template for top N fingerprints lacking coverage.
- Fingerprint aging (auto-expire entries > X days old).
- Enrich semantic validation (unknown variables, reserved keyword misuse).
- Hard fail CI on introduction of any new
INTERNAL_ERRORcode immediately.