codec.cpp

This repository is a C/C++ library + CLI that runs several neural audio codecs (currently WavTokenizer-Large, DAC, Mimi) using ggml graphs so execution can be offloaded via ggml backends (CPU/CUDA/Vulkan/Metal/etc.).

The intended architecture is llama.cpp-style:

Build model forward passes as ggml graphs (ops).
Execute via ggml_backend + ggml_backend_sched so backends can offload.
Avoid bespoke CPU-side tensor math buffers when possible.

High-level layout

include/codec.h — public C API (model load/init, encode/decode, batch decode)
src/codec.cpp — top-level dispatch + model loading + backend selection
src/models/ — per-architecture graph builders and glue
- wavtokenizer.cpp/.h
- dac.cpp/.h
- mimi.cpp/.h
src/runtime/ — graph cache + execution runtime
- graph.cpp/.h — graph cache keyed by (kind, n_frames, n_q, hop, etc.)
- graph_exec.cpp — ggml_backend scheduler init + graph compute
- tensor_utils.*, gguf_kv.* — tensor helpers / metadata
src/ops/ — small wrappers around ggml ops + a few custom compositions
- ggml_ops.* — layernorm/groupnorm/linear/unary/snake/pad/crop helpers
- conv1d.*, convtr1d.* — conv wrappers (keep only if ggml lacks needed variant)
src/batch/ — sequence-level batch container + decode loop (MVP)
examples/ — demo/inspection binaries (e.g. batch decode)
ggml/ — ggml submodule/subproject

Build

CMake project with ggml as a subdirectory.

Typical CPU build:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

Enable GPU backend (example: CUDA):

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build -j

Backends are intended to be selected via ggml backend selection logic.

Runtime / backend philosophy

Backend selection

In src/codec.cpp the backend is selected roughly as:

if codec_model_params.use_gpu = true: call ggml_backend_load_all() then ggml_backend_init_best()
else: CPU backend

Scheduler-based execution (important)

src/runtime/graph_exec.cpp uses:

ggml_backend_sched_new(...)
ggml_backend_sched_graph_compute(...)

This is the core mechanism enabling CPU/GPU split + offload.

Key rule: graphs should be constructed in a way that ggml can place tensors on supported backends; avoid pulling intermediate tensors out to CPU buffers.

Graph caching

Graphs are cached by a small key (see codec_graph_cache_key in internal headers), typically including:

graph kind (encode/decode per model)
n_frames, n_q, hop, input sizes, latent_dim, etc.

Flow:

codec_graph_cache_get_or_build(...) builds graph in an eval arena (ggml_init(no_alloc=true)).
codec_graph_prepare_io(...) allocates tensors for the graph in a backend buffer.
codec_graph_compute(...) runs scheduler compute.

Important constraints:

When switching to a different graph allocation, scheduler reset may be required to avoid dangling allocations (see comments in codec_graph_prepare_io).

ggml op usage

Prefer directly using ggml ops when available.

src/ops/ggml_ops.cpp provides small helpers that are either:

thin wrappers over ggml primitives (ggml_norm, ggml_group_norm, ggml_mul_mat, activations)
composed ops built from primitives (e.g. DAC snake implemented as x + sin(ax)^2 / a)

If a needed op is missing in ggml:

First try composing from existing ops.
If impossible/perf critical, add a custom op (CPU SIMD first) and keep a path to backend support later.

Model files / GGUF

Models are loaded from .gguf.

Some tensors that could be generated at runtime should instead be baked into GGUF during conversion (for reproducibility + avoiding runtime FP32→FP16 conversions).

If conversion scripts are involved, regenerate gguf after changes (stale gguf is a common source of “missing tensor” errors).

Conventions / guardrails for changes

Keep encode/decode numerics stable (unit/regression tests where possible).
Avoid introducing new CPU-only intermediate buffers; build everything as ggml tensors.
When touching graph execution / backend scheduler: be careful with allocation lifetimes (eval_ctx, scheduler reset semantics).
Prefer small, reviewable commits.

Useful entry points for Codex

If you need to understand execution:

src/runtime/graph_exec.cpp (scheduler + compute)
src/runtime/graph.cpp (cache + arena)

If you need to understand a model forward:

src/models/mimi.cpp / dac.cpp / wavtokenizer.cpp

If you need to add/replace an op:

src/ops/ggml_ops.cpp (+ possibly ggml upstream)

Mimi Encoder Migration Status

Mimi encode path is consolidated into one canonical graph kind: CODEC_GRAPH_MIMI_ENCODE.
The unified graph builder is the only Mimi encode graph path (frontend -> transformer -> downsample -> unrolled RVQ).
Split/legacy graph kinds for Mimi encode stages are removed from runtime graph enums.
Mimi encode weight writing now targets only the canonical encode graph path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codec.cpp

High-level layout

Build

Runtime / backend philosophy

Backend selection

Scheduler-based execution (important)

Graph caching

ggml op usage

Model files / GGUF

Conventions / guardrails for changes

Useful entry points for Codex

Mimi Encoder Migration Status

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

codec.cpp

High-level layout

Build

Runtime / backend philosophy

Backend selection

Scheduler-based execution (important)

Graph caching

ggml op usage

Model files / GGUF

Conventions / guardrails for changes

Useful entry points for Codex

Mimi Encoder Migration Status