This repository contains the official implementation of the Iterative Compositional Data Generation pipeline, as well as the compositional transformers architectures and multi-task RL training scripts introduced in Iterative Compositional Data Generation for Robot Control (Pham et al., 2026), accepted at Transactions on Machine Learning Research (TMLR) in April 2026. ICDG is a self-improving generative pipeline for robotic manipulation that uses a semantic compositional diffusion transformer to synthesize high-quality expert data for unseen tasks.
Robotic manipulation domains often contain a combinatorial number of possible tasks, arising from combinations of different components, such as robots, objects, obstacles, and objectives. Collecting demonstrations for all combinations is prohibitively expensive. ICDG leverages the underlying compositional structure of these domains to generalize far beyond the tasks it has been trained on, enabling large-scale capability growth from limited data.
-
Semantic Compositional Diffusion Transformer:
Factorizes each transition into specific components and learns their interactions through attention, enabling strong compositional generalization. -
Zero-Shot Generation:
Generates full state–action–next-state transitions for new task combinations that were never observed in real data. -
Iterative Self-Improvement:
Synthetic data is evaluated using offline RL; only high-quality, policy-validated transitions are added back into the training pool, allowing the model to continuously refine itself without additional real data collection. -
Data Efficiency and Generalization:
Trained on real data from approximately 20 percent of possible task combinations, ICDG generates useful data for the remaining tasks and ultimately solves nearly all held-out tasks. -
Utility of Rare Successful Synthetic Data:
Even synthetic datasets with very low evaluated success (e.g., single-digit to low double-digit percentages) still bootstrap strong online RL: initializing online RL with such data yields high task success with far fewer environment steps than training from scratch. -
Emergent Compositional Structure:
Attention patterns and intervention tests reveal that the model recovers meaningful task-factor dependencies, despite no hand-crafted structure being imposed.
- Python 3.9.6
- CUDA-capable GPU (for training diffusion models and policies)
- SLURM cluster access (for running experiments)
- Create a Python 3.9.6 virtual environment:
python3.9 -m venv first_3.9.6
source first_3.9.6/bin/activate # On Linux/Mac- Install dependencies from
requirements.txt:
pip install --upgrade pip
pip install -r requirements.txtNote: The requirements.txt includes an editable install of CompoSuite from a specific git commit for reproducibility:
-e git+https://github.com/Lifelong-ML/CompoSuite.git@1fa36f67f31aeccc9ef75748bfc797960e044a86#egg=composuite
- Set up the data directory:
- Download expert datasets from Dryad
- Organize the data according to the structure described in
data/README.md - Only expert datasets are needed for this project
The main pipeline implements the iterative self-improvement procedure from the paper (see Figure 1). The process consists of:
- Compositional Diffusion Training: Train the semantic compositional diffusion transformer on N expert datasets + M high-quality synthetic datasets from previous iterations
- Zero-shot Data Generation: Generate synthetic transitions for all remaining task combinations (All combinations - N - M)
- Offline RL Validation: Train policies on synthetic data and evaluate performance via offline RL
- Quality-based Filtering:
- Good datasets: Added to training set for next iteration (M synthetic datasets)
- Bad datasets: Removed from future generation cycles
- Iteration: Repeat until convergence or max iterations reached
Run the pipeline:
python -u -m scripts.automated_iterative_diffusion_dits_iiwa \
--max_iterations 5 \
--num_train 14 \
--diffusion_seed 0 \
--curriculum_seed 0 \
2>&1 | tee iterative_diffusion_0_dits_iiwa.outKey arguments:
--max_iterations: Maximum number of iterations to run--num_train: Number of training tasks (14 for IIWA subset)--diffusion_seed: Random seed for diffusion model training--curriculum_seed: Random seed for curriculum schedule generation--success_threshold: Success rate threshold for good tasks (default: 0.8)--threshold_reduction_amount: Amount to reduce threshold by when no good tasks found (default: 0.1)--threshold_reduction_cycle: Number of consecutive iterations with no good tasks before reducing threshold (default: 1)--min_threshold: Minimum threshold value (default: 0.5)
Output:
- Diffusion models:
results/augmented_{iteration}/diffusion/ - Synthetic data:
results/augmented_{iteration}/diffusion/{model_name}/{task}/samples_0.npz - Policy checkpoints:
results/augmented_{iteration}/policies/ - Analysis logs:
scripts/policies_slurm_logs/ - Best test task dataset:
results/best_testtask_dataset/
Run the transformer TD3+BC multitask baseline for comparison:
python -u -m scripts.run_transformer_baseline_pipeline \
--num_train 14 \
--seeds 10 11 12 13 14 \
--memory 50 \
--time 24 \
2>&1 | tee multitask_Trans_OfflineRL_iwa_seed2.outKey arguments:
--num_train: Number of training tasks (14 for IIWA subset)--seeds: List of random seeds to run (e.g.,10 11 12 13 14)--memory: Memory per job in GB (default: 50)--time: Time limit per job in hours (default: 24)--max_timesteps: Maximum training timesteps (default: 50000)--batch_size: Batch size (default: 1792)
Output:
- Model checkpoints:
results/transformer_baseline/seed_{seed}/ - Results CSV:
results/transformer_baseline/transformer_baseline_results.csv - Training logs:
scripts/transformer_baseline_logs/
These scripts train RLPD (Ball et al., 2023) on CompoSuite data: each update interleaves online rollouts with an offline replay buffer. For the synthetic-data variant, that buffer is loaded from per-task synthetic trajectories from the compositional diffusion pipeline (base_synthetic_data_path in the script).
Train a single task (example):
python scripts/cleanrl_sac_rlpd_synthdata.py \
--use_composuite \
--robot IIWA \
--obj Dumbbell \
--obst ObjectWall \
--subtask Trashcan \
--seed 0 \
--total_timesteps 1000000 \
--base_synthetic_data_path /path/to/results/sac_exp/dit_tasklist0_train14_diffusionseed0_iters4 \
--wandb_project_name cleanrl_sac_rlpd_synthdata \
--trackKey arguments:
--base_synthetic_data_path: Root directory of single-task synthetic datasets (one dataset per test task) from your diffusion run.--robot,--obj,--obst,--subtask: CompoSuite task (must match available synthetic data).--total_timesteps,--seed: Training length and RNG seed.--track/--wandb_project_name: Optional Weights & Biases logging.
Checkpoints and eval_history.csv are written under runs/ per the run-name logic in the script.
Large-scale SLURM sweep (optional):
scripts/automated_cleanrl_sac_rlpd_synthdata.py submits jobs for test tasks × seeds, retries failures, and aggregates CSVs under results/sac_exp/sac_rlpd_synthdata_results/. Edit BASE_PATH, synthetic paths, and log dirs inside the script for your machine, then:
python scripts/automated_cleanrl_sac_rlpd_synthdata.py 2>&1 | tee sac_rlpd_synthdata_automated.outRelated scripts:
| Script | Role |
|---|---|
scripts/cleanrl_sac_rlpd_synthdata.py |
SAC + RLPD with per-task synthetic offline data. |
scripts/cleanrl_sac_rlpd_14realdata.py |
SAC + RLPD with a shared offline buffer from real expert data (14 training tasks). |
scripts/cleanrl_sac_original_taskIDenabled.py |
Online SAC baseline (task-ID observations), no RLPD offline buffer. |
Automated SLURM helpers (edit paths before use): automated_cleanrl_sac_rlpd_14realdata.py, automated_cleanrl_sac_rlpd_synthdata.py, automated_cleanrl_sac_original_target.py — for full-test-split or smaller targeted comparisons and result aggregation.
.
├── data/ # Dataset directory (see data/README.md)
├── results/ # Experiment results
│ ├── augmented_{iteration}/ # Iterative diffusion results
│ └── transformer_baseline/ # Transformer baseline results
├── scripts/ # Main entrypoints
│ ├── automated_iterative_diffusion_dits_iiwa.py # Main pipeline
│ ├── run_transformer_baseline_pipeline.py # Transformer TD3+BC baseline
│ ├── run_compositional_baseline_pipeline.py # Compositional TD3+BC baseline
│ ├── train_augmented_diffusion.py # Diffusion training
│ ├── train_augmented_policy.py # Policy training
│ ├── generate_augmented_data_dits.py # Data generation
│ └── cleanrl_sac_*.py, automated_cleanrl_sac_*.py # SAC + RLPD & automated sweeps
├── diffusion/ # Diffusion model code
├── corl/ # Offline RL algorithms (TD3-BC, IQL)
├── offline_compositional_rl_datasets/ # Offline train/eval, task splits (see folder README)
├── config/ # Configuration files
└── requirements.txt # Python dependencies
- Semantic Compositional Architecture: Diffusion transformer with factorized components (robot, object, obstacle, objective)
- Iterative Self-Improvement: Each iteration uses validated high-quality synthetic tasks to improve the diffusion model
- Zero-shot Generation: Generates data for unseen task combinations without additional training
- Automatic Retry: Failed jobs are automatically retried with increased resources
- Curriculum Filtering: Component-specific curriculum filtering for iterations 5+ (optional)
- Adaptive Threshold: Success threshold automatically reduces if no good tasks are found
- Comprehensive Logging: Detailed logs and CSV analysis files for each iteration
Default paths are set in the script configuration classes. Modify these in the scripts if needed:
base_path: Project root directorydata_path: Path to expert datasetsresults_path: Path to save resultstasks_path: Path to task list JSON files
If you use this code, please cite the TMLR paper (accepted April 2026):
@article{pham2026iterative,
title={Iterative Compositional Data Generation for Robot Control},
author={Pham, Anh-Quan and Hussing, Marcel and Patankar, Shubhankar P. and Bassett, Dani S. and Mendez-Mendez, Jorge and Eaton, Eric},
journal={Transactions on Machine Learning Research},
year={2026},
}The preprint remains available on arXiv:2512.10891.
Related Resources:
For inquiries, please contact Anh-Quan Pham.
