Checklist
Describe the bug
I trained a Dflash application using the latest training code for Qwen3.5-4B. The draft configuration for 4B is from the official documentation, while Qwen3.5-4B is a version I fine-tuned using internal data. The task was OCR. This training used 70,000 data points, achieving a 98% acceptance rate during training.
I used a version of VLLM that supports Dflash for inference, but the average acceptance rate was only 10%. I've already aligned the chat-template, and I haven't used <think> during either training or inference.
Are there any gaps I might be overlooking?
Reproduction
BUILD_DATASET_NUM_PROC=64
ATTENTION_BACKEND=${2:-flex_attention}
NUM_GPUS=4
# Use patched specforge (fixes circular import in original)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun \
--standalone \
--nproc_per_node $NUM_GPUS \
$ROOT_DIR/scripts/train_dflash.py \
--target-model-path qwen3.5_4B\
--draft-config-path Qwen3.5-4B-Dflash/config.json \
--train-data-path data_filtered.jsonl\
--output-dir deflash_outputs/qwen3.5-4b-dflash-opc \
--num-epochs 10 \
--batch-size 2 \
--learning-rate 6e-4 \
--warmup-ratio 0.04 \
--max-grad-norm 1.0 \
--max-length 4096 \
--chat-template qwen3.5-nothink \
--attention-backend $ATTENTION_BACKEND \
--num-anchors 512 \
--loss-decay-gamma 7.0 \
--log-interval 50 \
--save-interval 10000 \
--target-model-backend hf \
--block-size 16 \
Environment
specforge [latest]
vllm [0.19.1.rc.0] nightly
Checklist
Describe the bug
I trained a Dflash application using the latest training code for Qwen3.5-4B. The draft configuration for 4B is from the official documentation, while Qwen3.5-4B is a version I fine-tuned using internal data. The task was OCR. This training used 70,000 data points, achieving a 98% acceptance rate during training.
I used a version of VLLM that supports Dflash for inference, but the average acceptance rate was only 10%. I've already aligned the chat-template, and I haven't used
<think>during either training or inference.Are there any gaps I might be overlooking?
Reproduction
Environment
specforge [latest]
vllm [0.19.1.rc.0] nightly