Hello, I attempted to reproduce the performance of Vanilla. However, the performance I obtained on MMVP using eager+float32 was 77.3, while the performance after adding DMLR was only 74.3. It seems that DMLR has actually reduced the performance of Qwen3-VL-4B-Instruct. Could the author provide Vanilla code to ensure that DMLR has gain?

Hello, I attempted to reproduce the performance of Vanilla. However, the performance I obtained on MMVP using eager+float32 was 77.3, while the performance after adding DMLR was only 74.3. It seems that DMLR has actually reduced the performance of Qwen3-VL-4B-Instruct. Could the author provide Vanilla code to ensure that DMLR has gain?