Hi, authors, thanks for your awesome work!
I'm attempting to train Qwen/Qwen2.5-VL-3B-Instruct using the provided training script, but I've encountered several issues that I'd like to clarify:
Training Script
#!/bin/bash
setting='dozen_vsr_qwen_add_grounded_reasoning_single_turn_think_rethink'
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export WANDB_PROJECT=$setting
# Load config variables
source scripts/train_base_config.sh
# Run the training script with DeepSpeed
python -m accelerate.commands.launch \
--config_file ./accelerate_configs/deepspeed_zero2.yaml \
--main_process_port 20092 \
grpo-gr/GRPO_GR.py \
--train_data_path ./GRIT_data/tallyqa_train_10.jsonl,./GRIT_data/vsr_cot_train_10.jsonl \
--train_image_folder_path ./GRIT_data/tallyqa,./GRIT_data/vsr \
--eval_data_path ./GRIT_data/vsr_val.jsonl,./GRIT_data/mme_val.jsonl,./GRIT_data/tallyqa_val.jsonl,./GRIT_data/gqa_val.jsonl,./GRIT_data/mathvista_mini_val.jsonl,./GRIT_data/ovd_position_val.jsonl,./GRIT_data/ovd_relationship_val.jsonl,./GRIT_data/ovd_negation_val.jsonl \
--eval_image_folder_path ./GRIT_data/vsr,./GRIT_data/mme,./GRIT_data/tallyqa,./GRIT_data/gqa,./GRIT_data/mathvista_mini,./GRIT_data/ovd_position,./GRIT_data/ovd_relationship,./GRIT_data/ovd_negation \
--setting $setting \
--max_turns 1 \
--output_dir output/$setting \
--hub_model_id $setting \
$COMMON_ARGS \
--eval_steps 50 \
--save_steps 50 \
--num_train_epochs 500 \
--lr_scheduler_type cosine \
--per_device_eval_batch_size 8
1. Dataset Issues
MME Dataset
Most datasets can be downloaded normally, but for the MME dataset, when I try to download from the repository path specified in the paper (link), I find that the image names in the downloaded files don't match the names listed in mme_val.jsonl.
Missing Label Files
The following label files are missing:
./GRIT_data/ovd_relationship_val.jsonl
./GRIT_data/ovd_negation_val.jsonl
Could you please provide these files or clarify how to obtain them?
2. Flash Attention Issues
During initial training, I encounter the following warning/error:
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
...
The specific error indicates that float16 is not supported. I resolved this by manually specifying torch_dtype=torch.bfloat16 during model initialization. Did you encounter this issue during your training? What's the recommended approach to handle this?
|
if "qwen" in model_id.lower(): |
|
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model, **model_init_kwargs) |
3. Training Hyperparameters
I'd like to confirm a few things about the training parameters:
-
Epochs: Is --num_train_epochs 500 an experimental parameter? This seems quite high - is this intentional?
-
Batch Size & Memory: When training on 48GB VRAM, I can only set per_device_train_batch_size to 1, otherwise I get OOM errors. Is this normal? If the batch size can only be 1, should the learning rate be scaled accordingly? What would be the recommended values?
-
Other Parameters: Are the other hyperparameters in the script reasonable for this model size and task?
4. Demo Environment
Regarding the gradio_qwen.py mentioned on the GitHub page, where can I find this file? It doesn't seem to be included in the current repository.
Environment:
- Model: Qwen/Qwen2.5-VL-3B-Instruct
- GPU: 8x GPUs with 48GB VRAM each
- Framework: DeepSpeed ZeRO-2
5. Logs
Also, It's very weird that reward scores always get zero.

Any guidance on these issues would be greatly appreciated. Thank you again for your work on this project!
Hi, authors, thanks for your awesome work!
I'm attempting to train
Qwen/Qwen2.5-VL-3B-Instructusing the provided training script, but I've encountered several issues that I'd like to clarify:Training Script
1. Dataset Issues
MME Dataset
Most datasets can be downloaded normally, but for the MME dataset, when I try to download from the repository path specified in the paper (link), I find that the image names in the downloaded files don't match the names listed in
mme_val.jsonl.Missing Label Files
The following label files are missing:
./GRIT_data/ovd_relationship_val.jsonl./GRIT_data/ovd_negation_val.jsonlCould you please provide these files or clarify how to obtain them?
2. Flash Attention Issues
During initial training, I encounter the following warning/error:
The specific error indicates that
float16is not supported. I resolved this by manually specifyingtorch_dtype=torch.bfloat16during model initialization. Did you encounter this issue during your training? What's the recommended approach to handle this?GRIT/grpo-gr/GRPO_GRTrainer.py
Lines 233 to 234 in fd08d57
3. Training Hyperparameters
I'd like to confirm a few things about the training parameters:
Epochs: Is
--num_train_epochs 500an experimental parameter? This seems quite high - is this intentional?Batch Size & Memory: When training on 48GB VRAM, I can only set
per_device_train_batch_sizeto 1, otherwise I get OOM errors. Is this normal? If the batch size can only be 1, should the learning rate be scaled accordingly? What would be the recommended values?Other Parameters: Are the other hyperparameters in the script reasonable for this model size and task?
4. Demo Environment
Regarding the
gradio_qwen.pymentioned on the GitHub page, where can I find this file? It doesn't seem to be included in the current repository.Environment:
5. Logs
Also, It's very weird that reward scores always get zero.
Any guidance on these issues would be greatly appreciated. Thank you again for your work on this project!