This repository is a C/C++ library + CLI that runs several neural audio codecs (currently WavTokenizer-Large, DAC, Mimi) using ggml graphs so execution can be offloaded via ggml backends (CPU/CUDA/Vulkan/Metal/etc.).
The intended architecture is llama.cpp-style:
- Build model forward passes as ggml graphs (ops).
- Execute via ggml_backend + ggml_backend_sched so backends can offload.
- Avoid bespoke CPU-side tensor math buffers when possible.
include/codec.h— public C API (model load/init, encode/decode, batch decode)src/codec.cpp— top-level dispatch + model loading + backend selectionsrc/models/— per-architecture graph builders and gluewavtokenizer.cpp/.hdac.cpp/.hmimi.cpp/.h
src/runtime/— graph cache + execution runtimegraph.cpp/.h— graph cache keyed by (kind, n_frames, n_q, hop, etc.)graph_exec.cpp— ggml_backend scheduler init + graph computetensor_utils.*,gguf_kv.*— tensor helpers / metadata
src/ops/— small wrappers around ggml ops + a few custom compositionsggml_ops.*— layernorm/groupnorm/linear/unary/snake/pad/crop helpersconv1d.*,convtr1d.*— conv wrappers (keep only if ggml lacks needed variant)
src/batch/— sequence-level batch container + decode loop (MVP)examples/— demo/inspection binaries (e.g. batch decode)ggml/— ggml submodule/subproject
CMake project with ggml as a subdirectory.
Typical CPU build:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -jEnable GPU backend (example: CUDA):
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build -jBackends are intended to be selected via ggml backend selection logic.
In src/codec.cpp the backend is selected roughly as:
- if
codec_model_params.use_gpu = true: callggml_backend_load_all()thenggml_backend_init_best() - else: CPU backend
src/runtime/graph_exec.cpp uses:
ggml_backend_sched_new(...)ggml_backend_sched_graph_compute(...)
This is the core mechanism enabling CPU/GPU split + offload.
Key rule: graphs should be constructed in a way that ggml can place tensors on supported backends; avoid pulling intermediate tensors out to CPU buffers.
Graphs are cached by a small key (see codec_graph_cache_key in internal headers), typically including:
- graph kind (encode/decode per model)
n_frames,n_q,hop, input sizes, latent_dim, etc.
Flow:
codec_graph_cache_get_or_build(...)builds graph in an eval arena (ggml_init(no_alloc=true)).codec_graph_prepare_io(...)allocates tensors for the graph in a backend buffer.codec_graph_compute(...)runs scheduler compute.
Important constraints:
- When switching to a different graph allocation, scheduler reset may be required to avoid dangling allocations (see comments in
codec_graph_prepare_io).
Prefer directly using ggml ops when available.
src/ops/ggml_ops.cpp provides small helpers that are either:
- thin wrappers over ggml primitives (
ggml_norm,ggml_group_norm,ggml_mul_mat, activations) - composed ops built from primitives (e.g. DAC
snakeimplemented asx + sin(ax)^2 / a)
If a needed op is missing in ggml:
- First try composing from existing ops.
- If impossible/perf critical, add a custom op (CPU SIMD first) and keep a path to backend support later.
Models are loaded from .gguf.
Some tensors that could be generated at runtime should instead be baked into GGUF during conversion (for reproducibility + avoiding runtime FP32→FP16 conversions).
If conversion scripts are involved, regenerate gguf after changes (stale gguf is a common source of “missing tensor” errors).
- Keep encode/decode numerics stable (unit/regression tests where possible).
- Avoid introducing new CPU-only intermediate buffers; build everything as ggml tensors.
- When touching graph execution / backend scheduler: be careful with allocation lifetimes (
eval_ctx, scheduler reset semantics). - Prefer small, reviewable commits.
If you need to understand execution:
src/runtime/graph_exec.cpp(scheduler + compute)src/runtime/graph.cpp(cache + arena)
If you need to understand a model forward:
src/models/mimi.cpp/dac.cpp/wavtokenizer.cpp
If you need to add/replace an op:
src/ops/ggml_ops.cpp(+ possibly ggml upstream)
- Mimi encode path is consolidated into one canonical graph kind:
CODEC_GRAPH_MIMI_ENCODE. - The unified graph builder is the only Mimi encode graph path (
frontend -> transformer -> downsample -> unrolled RVQ). - Split/legacy graph kinds for Mimi encode stages are removed from runtime graph enums.
- Mimi encode weight writing now targets only the canonical encode graph path.