Commit db6e43b
committed
Add Gemma 4 model family support
Implement the text-only portion of Google DeepMind Gemma 4 architecture:
- Hybrid attention: alternating sliding window and full attention layers
- Dual RoPE: proportional RoPE for full attention, default for sliding
- Per-Layer Embeddings (PLE): per-layer token-dependent gating
- KV sharing: later layers reuse KV from earlier layers of same type
- Q/K/V normalization: RMS normalization on query, key, and value
- Per-layer scalar: learned scaling factor per transformer block
- Optional MoE: mixture-of-experts FFN blocks (26B-A4B variant)
Architectures: :base (Gemma4TextModel), :for_causal_language_modeling
(Gemma4ForCausalLM). Multimodal Gemma4ForConditionalGeneration is not
yet supported.
Uses a custom decoder loop rather than Layers.Transformer.blocks/2
because the model requires features not available in the shared
infrastructure: per-layer embeddings threaded through the block loop,
cross-block KV sharing state, per-layer head dimension variation,
and value normalization.
Includes integration test verified against Python transformers reference
values (atol < 5e-5).1 parent 0b397f6 commit db6e43b
File tree
3 files changed
+1413
-0
lines changed- lib
- bumblebee/text
- test/bumblebee/text
3 files changed
+1413
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
139 | 142 | | |
140 | 143 | | |
141 | 144 | | |
| |||
273 | 276 | | |
274 | 277 | | |
275 | 278 | | |
| 279 | + | |
| 280 | + | |
276 | 281 | | |
277 | 282 | | |
278 | 283 | | |
| |||
0 commit comments