Iterative Compositional Data Generation for Robot Control

This repository contains the official implementation of the Iterative Compositional Data Generation pipeline, as well as the compositional transformers architectures and multi-task RL training scripts introduced in Iterative Compositional Data Generation for Robot Control (Pham et al., 2026), accepted at Transactions on Machine Learning Research (TMLR) in April 2026. ICDG is a self-improving generative pipeline for robotic manipulation that uses a semantic compositional diffusion transformer to synthesize high-quality expert data for unseen tasks.

Robotic manipulation domains often contain a combinatorial number of possible tasks, arising from combinations of different components, such as robots, objects, obstacles, and objectives. Collecting demonstrations for all combinations is prohibitively expensive. ICDG leverages the underlying compositional structure of these domains to generalize far beyond the tasks it has been trained on, enabling large-scale capability growth from limited data.

Key Contributions

Semantic Compositional Diffusion Transformer:
Factorizes each transition into specific components and learns their interactions through attention, enabling strong compositional generalization.
Zero-Shot Generation:
Generates full state–action–next-state transitions for new task combinations that were never observed in real data.
Iterative Self-Improvement:
Synthetic data is evaluated using offline RL; only high-quality, policy-validated transitions are added back into the training pool, allowing the model to continuously refine itself without additional real data collection.
Data Efficiency and Generalization:
Trained on real data from approximately 20 percent of possible task combinations, ICDG generates useful data for the remaining tasks and ultimately solves nearly all held-out tasks.
Utility of Rare Successful Synthetic Data:
Even synthetic datasets with very low evaluated success (e.g., single-digit to low double-digit percentages) still bootstrap strong online RL: initializing online RL with such data yields high task success with far fewer environment steps than training from scratch.
Emergent Compositional Structure:
Attention patterns and intervention tests reveal that the model recovers meaningful task-factor dependencies, despite no hand-crafted structure being imposed.

Setup

Prerequisites

Python 3.9.6
CUDA-capable GPU (for training diffusion models and policies)
SLURM cluster access (for running experiments)

Installation

Create a Python 3.9.6 virtual environment:

python3.9 -m venv first_3.9.6
source first_3.9.6/bin/activate  # On Linux/Mac

Install dependencies from requirements.txt:

pip install --upgrade pip
pip install -r requirements.txt

Note: The requirements.txt includes an editable install of CompoSuite from a specific git commit for reproducibility:

-e git+https://github.com/Lifelong-ML/CompoSuite.git@1fa36f67f31aeccc9ef75748bfc797960e044a86#egg=composuite

Set up the data directory:
- Download expert datasets from Dryad
- Organize the data according to the structure described in data/README.md
- Only expert datasets are needed for this project

Usage

Automated Iterative Compositional Data Generation

The main pipeline implements the iterative self-improvement procedure from the paper (see Figure 1). The process consists of:

Compositional Diffusion Training: Train the semantic compositional diffusion transformer on N expert datasets + M high-quality synthetic datasets from previous iterations
Zero-shot Data Generation: Generate synthetic transitions for all remaining task combinations (All combinations - N - M)
Offline RL Validation: Train policies on synthetic data and evaluate performance via offline RL
Quality-based Filtering:
- Good datasets: Added to training set for next iteration (M synthetic datasets)
- Bad datasets: Removed from future generation cycles
Iteration: Repeat until convergence or max iterations reached

Run the pipeline:

python -u -m scripts.automated_iterative_diffusion_dits_iiwa \
    --max_iterations 5 \
    --num_train 14 \
    --diffusion_seed 0 \
    --curriculum_seed 0 \
    2>&1 | tee iterative_diffusion_0_dits_iiwa.out

Key arguments:

--max_iterations: Maximum number of iterations to run
--num_train: Number of training tasks (14 for IIWA subset)
--diffusion_seed: Random seed for diffusion model training
--curriculum_seed: Random seed for curriculum schedule generation
--success_threshold: Success rate threshold for good tasks (default: 0.8)
--threshold_reduction_amount: Amount to reduce threshold by when no good tasks found (default: 0.1)
--threshold_reduction_cycle: Number of consecutive iterations with no good tasks before reducing threshold (default: 1)
--min_threshold: Minimum threshold value (default: 0.5)

Output:

Diffusion models: results/augmented_{iteration}/diffusion/
Synthetic data: results/augmented_{iteration}/diffusion/{model_name}/{task}/samples_0.npz
Policy checkpoints: results/augmented_{iteration}/policies/
Analysis logs: scripts/policies_slurm_logs/
Best test task dataset: results/best_testtask_dataset/

Semantic + Compositional RL Baseline with Transformer Policy

Run the transformer TD3+BC multitask baseline for comparison:

python -u -m scripts.run_transformer_baseline_pipeline \
    --num_train 14 \
    --seeds 10 11 12 13 14 \
    --memory 50 \
    --time 24 \
    2>&1 | tee multitask_Trans_OfflineRL_iwa_seed2.out

Key arguments:

--num_train: Number of training tasks (14 for IIWA subset)
--seeds: List of random seeds to run (e.g., 10 11 12 13 14)
--memory: Memory per job in GB (default: 50)
--time: Time limit per job in hours (default: 24)
--max_timesteps: Maximum training timesteps (default: 50000)
--batch_size: Batch size (default: 1792)

Output:

Model checkpoints: results/transformer_baseline/seed_{seed}/
Results CSV: results/transformer_baseline/transformer_baseline_results.csv
Training logs: scripts/transformer_baseline_logs/

CleanRL SAC + RLPD (synthetic offline data)

These scripts train RLPD (Ball et al., 2023) on CompoSuite data: each update interleaves online rollouts with an offline replay buffer. For the synthetic-data variant, that buffer is loaded from per-task synthetic trajectories from the compositional diffusion pipeline (base_synthetic_data_path in the script).

Train a single task (example):

python scripts/cleanrl_sac_rlpd_synthdata.py \
    --use_composuite \
    --robot IIWA \
    --obj Dumbbell \
    --obst ObjectWall \
    --subtask Trashcan \
    --seed 0 \
    --total_timesteps 1000000 \
    --base_synthetic_data_path /path/to/results/sac_exp/dit_tasklist0_train14_diffusionseed0_iters4 \
    --wandb_project_name cleanrl_sac_rlpd_synthdata \
    --track

Key arguments:

--base_synthetic_data_path: Root directory of single-task synthetic datasets (one dataset per test task) from your diffusion run.
--robot, --obj, --obst, --subtask: CompoSuite task (must match available synthetic data).
--total_timesteps, --seed: Training length and RNG seed.
--track / --wandb_project_name: Optional Weights & Biases logging.

Checkpoints and eval_history.csv are written under runs/ per the run-name logic in the script.

Large-scale SLURM sweep (optional):
scripts/automated_cleanrl_sac_rlpd_synthdata.py submits jobs for test tasks × seeds, retries failures, and aggregates CSVs under results/sac_exp/sac_rlpd_synthdata_results/. Edit BASE_PATH, synthetic paths, and log dirs inside the script for your machine, then:

python scripts/automated_cleanrl_sac_rlpd_synthdata.py 2>&1 | tee sac_rlpd_synthdata_automated.out

Related scripts:

Script	Role
`scripts/cleanrl_sac_rlpd_synthdata.py`	SAC + RLPD with per-task synthetic offline data.
`scripts/cleanrl_sac_rlpd_14realdata.py`	SAC + RLPD with a shared offline buffer from real expert data (14 training tasks).
`scripts/cleanrl_sac_original_taskIDenabled.py`	Online SAC baseline (task-ID observations), no RLPD offline buffer.

Automated SLURM helpers (edit paths before use): automated_cleanrl_sac_rlpd_14realdata.py, automated_cleanrl_sac_rlpd_synthdata.py, automated_cleanrl_sac_original_target.py — for full-test-split or smaller targeted comparisons and result aggregation.

Project Structure

.
├── data/                          # Dataset directory (see data/README.md)
├── results/                       # Experiment results
│   ├── augmented_{iteration}/     # Iterative diffusion results
│   └── transformer_baseline/      # Transformer baseline results
├── scripts/                       # Main entrypoints
│   ├── automated_iterative_diffusion_dits_iiwa.py  # Main pipeline
│   ├── run_transformer_baseline_pipeline.py        # Transformer TD3+BC baseline
│   ├── run_compositional_baseline_pipeline.py      # Compositional TD3+BC baseline
│   ├── train_augmented_diffusion.py               # Diffusion training
│   ├── train_augmented_policy.py                  # Policy training
│   ├── generate_augmented_data_dits.py            # Data generation
│   └── cleanrl_sac_*.py, automated_cleanrl_sac_*.py  # SAC + RLPD & automated sweeps
├── diffusion/                     # Diffusion model code
├── corl/                          # Offline RL algorithms (TD3-BC, IQL)
├── offline_compositional_rl_datasets/  # Offline train/eval, task splits (see folder README)
├── config/                        # Configuration files
└── requirements.txt               # Python dependencies

Key Features

Semantic Compositional Architecture: Diffusion transformer with factorized components (robot, object, obstacle, objective)
Iterative Self-Improvement: Each iteration uses validated high-quality synthetic tasks to improve the diffusion model
Zero-shot Generation: Generates data for unseen task combinations without additional training
Automatic Retry: Failed jobs are automatically retried with increased resources
Curriculum Filtering: Component-specific curriculum filtering for iterations 5+ (optional)
Adaptive Threshold: Success threshold automatically reduces if no good tasks are found
Comprehensive Logging: Detailed logs and CSV analysis files for each iteration

Configuration

Default paths are set in the script configuration classes. Modify these in the scripts if needed:

base_path: Project root directory
data_path: Path to expert datasets
results_path: Path to save results
tasks_path: Path to task list JSON files

Citation

If you use this code, please cite the TMLR paper (accepted April 2026):

@article{pham2026iterative,
  title={Iterative Compositional Data Generation for Robot Control},
  author={Pham, Anh-Quan and Hussing, Marcel and Patankar, Shubhankar P. and Bassett, Dani S. and Mendez-Mendez, Jorge and Eaton, Eric},
  journal={Transactions on Machine Learning Research},
  year={2026},
}

The preprint remains available on arXiv:2512.10891.

Related Resources:

CompoSuite Benchmark: GitHub
Datasets: Dryad

Contact

For inquiries, please contact Anh-Quan Pham.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iterative Compositional Data Generation for Robot Control

Key Contributions

Setup

Prerequisites

Installation

Usage

Automated Iterative Compositional Data Generation

Semantic + Compositional RL Baseline with Transformer Policy

CleanRL SAC + RLPD (synthetic offline data)

Project Structure

Key Features

Configuration

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
config		config
corl		corl
data		data
diffusion		diffusion
offline_compositional_rl_datasets		offline_compositional_rl_datasets
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Iterative Compositional Data Generation for Robot Control

Key Contributions

Setup

Prerequisites

Installation

Usage

Automated Iterative Compositional Data Generation

Semantic + Compositional RL Baseline with Transformer Policy

CleanRL SAC + RLPD (synthetic offline data)

Project Structure

Key Features

Configuration

Citation

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages