Skip to content

Improve TorchAO quantization test coverage and XPU support#13530

Open
jiqing-feng wants to merge 10 commits intohuggingface:mainfrom
jiqing-feng:torchao
Open

Improve TorchAO quantization test coverage and XPU support#13530
jiqing-feng wants to merge 10 commits intohuggingface:mainfrom
jiqing-feng:torchao

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

@jiqing-feng jiqing-feng commented Apr 21, 2026

What does this PR do?

This PR improves the TorchAO quantization testing infrastructure with several fixes: enabling int4wo tests on Intel XPU, implementing _dequantize for TorchAO, fixing input dtype mismatches, and fixing training gradient underflow.

Changes

  1. Enable int4wo tests on XPU: Removed the _int4wo_skip marker that restricted int4wo tests to CUDA only, allowing them to run on all accelerator backends.

  2. XPU-specific int4 packing format: Added XPU-specific handling in _get_quant_config() — Intel XPU requires int4_packing_format="plain_int32" for Int4WeightOnlyConfig.

  3. Fix input dtype casting: Introduced _get_dummy_inputs_for_model(model) helper in QuantizationTesterMixin to automatically cast floating-point input tensors to the model's parameter dtype, preventing dtype mismatches during quantized model inference.

  4. Implement _dequantize for TorchAO: Added _dequantize() method in TorchAoHfQuantizer that iterates all nn.Linear modules, calls weight.dequantize() on TorchAOBaseTensor weights, and replaces them with standard nn.Parameter. Also fixed _verify_if_layer_quantized to check isinstance(module.weight, TorchAOBaseTensor) so dequantized layers are correctly detected as non-quantized.

  5. Fix training gradient underflow: Changed autocast dtype from float16 to bfloat16 in _test_quantization_training. Float16's limited dynamic range (max ~65504, min subnormal ~5.96e-8) causes gradients to underflow to zero when passing through quantized tensor subclass operations; bfloat16 shares float32's exponent range and avoids this issue.

  6. Reduce WanAnimate TorchAO test input sizes: Shrunk dummy inputs in TestWanAnimateTransformer3DTorchAo to avoid OOM on devices without FlashAttention (e.g. XPU, which falls back to math SDPA and materializes the full O(S²) attention matrix). Reduced hidden_states from (1,36,21,64,64) to (1,36,5,16,16) and face_pixel_values from (1,3,77,512,512) to (1,3,13,512,512), bringing self-attention sequence length from 21,504 to 320 and peak attention memory from ~74 GiB to ~16 MB. Face frame count (13) is chosen so the face encoder's two stride-2 convolutions produce temporal output 4, plus 1 padding = 5, matching hidden_states temporal dim.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions bot added tests size/M PR with diff < 200 LOC labels Apr 21, 2026
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @sayakpaul . Would you please review this PR? Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions bot added quantization size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 21, 2026
@jiqing-feng jiqing-feng changed the title Enable TorchAO int4 weight-only quantization tests on Intel XPU Improve TorchAO quantization test coverage and XPU support Apr 21, 2026
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 21, 2026
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization size/M PR with diff < 200 LOC tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant