- [2026-04-01] 🎉 New released VenusFactory2 website at venusfactory.cn/playground
- [2026-03-27] 🚀 VenusFactory2 technical report released at arXiv:2603.27303
- [2026-01-23] 🚀 Added VenusX (ICLR2026) in VenusFactory2
- [2025-04-19] 🎉 VenusREM (ISMB/ECCB2025) #1 in ProteinGym & VenusMutHub!
VenusFactory2 is an Agent-driven protein engineering platform combining 40+ AI models with 11 biological databases. Designed for everyone — from biologists to AI researchers.
| 🤖 Agent-First | 🎯 Three Interfaces | ⚡ Zero to Results |
|---|---|---|
| Natural language → Multi-step automation | Web UI / REST API / CLI | Upload → Predict in seconds |
| 40+ models + 11 databases | Same power, different styles | Or train custom models in minutes |
📖 Easy to Learn: Designed for life science professionals with no programming background required. Intuitive Web UI, comprehensive bilingual documentation, rich examples and video tutorials help you quickly grow from beginner to protein AI expert.
| Task | Solution | Time |
|---|---|---|
| 🧬 Mutation effects | ESM-2, ProSST, ProtSSN (zero-shot) | <1 min |
| 🎯 Protein function | 30+ fine-tuned models | <30 sec |
| 🔬 Custom training | 7 PEFT methods (LoRA, QLoRA, etc.) | 10-60 min |
| 💾 Data download | AlphaFold, UniProt, RCSB, KEGG, etc. | Real-time |
| 📚 Literature | AI-powered search & analysis | <2 min |
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install -r requirements.txt # Detailed guide below ↓cd frontend
npm install
npm run build
cd ..# Web UI v1 (legacy Gradio, local mode)
python src/webui.py --mode all # → http://localhost:7860
# Web UI v2 (FastAPI + React, local mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861 # → http://localhost:7861
# Web UI v2 (FastAPI + React, online mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online
# REST API only
python src/api_server.py # → http://localhost:5000/docs
# CLI
bash script/train/train_plm_lora.sh🤖 Try Agent-0.1 | ⚡ Quick Tools | 🔬 Train Models (Click to expand examples)
Agent-0.1 (Natural Language)
Q: "Predict stability for sequence MKTAYIAKQRQISFV..."
→ Agent auto-selects model → Runs prediction → Returns results + explanations
Quick Mutation Scoring
Upload: PDB/FASTA → Mutations: A23V, K45R → Get: Stability scores
Train Your Model
Model: ESM2-650M → Dataset: DeepSol → Method: LoRA → 15 min → Trained model ✓
Agent-0.1 orchestrates all tools via natural language. Powered by LangGraph + LangChain.
You: "Design thermostable mutations for PDB:1ABC"
↓
🤖 Agent Planning
↓
📥 Download → 🧬 Predict → 🎯 Score → 📊 Report
RCSB PDB ESM-2 scan Stability Ranked list
✨ Agent Capabilities
| Category | Features |
|---|---|
| 🔬 Analysis | Mutation prediction • Function/stability scoring • Structure analysis |
| 💾 Data | Multi-database search • Format conversion • Batch processing |
| 🧠 Planning | Multi-step automation • Tool orchestration • Error handling |
| 📚 Research | Literature mining • Family analysis • Report generation |
💬 Example Conversations
Mutation Design:
You: "Improve thermostability of MKTAYIAKQR..."
Agent: ✓ ESM-2 scanning... ✓ Stability scoring...
→ Top 3: A5V (+2.8 kcal/mol), K9R (+1.9), T2S (+1.5)
Database Search:
You: "Find lysozyme structures <2.0Å resolution"
Agent: ✓ Searching RCSB... → Found 47 structures
→ Downloaded to: temp_outputs/lysozyme_structures/
💡 Note: Requires API key (OpenAI/Anthropic). Currently in Beta.
🌐 Interfaces: Web UI | REST API | CLI
↓
🤖 Agent Layer (LangGraph + LangChain)
↓
🔧 Application: Train | Eval | Predict | Tools
↓
🛠️ Core Tools: 9 categories (mutation, database, search, etc.)
↓
📊 Resources: 40+ Models | 30+ Datasets | 11+ Databases
📚 Integrated Resources
Models (40+): ESM, ProtBert, ProtT5, Venus/PETA/ProSST series
Databases (11+): AlphaFold • RCSB PDB • UniProt • NCBI • KEGG • STRING • BRENDA • ChEMBL • HPA • FDA • Foldseek
Datasets (30+): Function • Localization • Stability • Solubility • Mutation fitness
🔧 Tool Categories
| Tool | Description | Agent | CLI |
|---|---|---|---|
| 🧬 Mutation | ESM-1v, ESM-2, ProSST, ProtSSN, MIF-ST | ✅ | ✅ |
| 🎯 Prediction | 30+ fine-tuned models | ✅ | ✅ |
| 💾 Database | 11 integrations | ✅ | ✅ |
| 🔍 Search | PubMed, FDA, patents | ✅ | ✅ |
| 🏋️ Training | LoRA, QLoRA, DoRA, etc. | ✅ | ✅ |
| 📁 File | Format conversion | ✅ | ✅ |
| 🔬 Denovo | Protein design | ✅ | ✅ |
| 🧪 Discovery | Novel discovery | ✅ | ✅ |
| 📊 Visualize | 3D viewer | ✅ | ✅ |
40+ Protein Language Models (Click to expand)
Venus Series (Liang's Lab): ProSST-20/128/512/1024/2048/4096 (110M) • ProPrime-690M • VenusPLM-300M • PETA-base/bpe/unigram (80M)
ESM Series (Meta AI): ESM2: 8M, 35M, 150M, 650M, 3B, 15B • ESM-1v: 5 models (650M each)
ProtBert & ProtT5: ProtBert-Uniref100/BFD (420M) • IgBert (420M) • ProtT5-XL/XXL (3B-11B) • Ankh-base/large (450M-1.2B)
Selection Guide:
- GPU <8GB: ESM2-8M/35M, ProSST
- GPU 8-16GB: ESM2-150M/650M, ProtBert
- GPU 24GB+: ESM2-3B, ProtT5-XL
- Multi-GPU: ESM2-15B, ProtT5-XXL
By Task:
- Classification: ESM2, ProtBert
- Structure: Ankh
- Generation: ProtT5
- Antibody: IgBert/IgT5
- Lightweight: ProSST, PETA
30+ Supervised + Zero-Shot Datasets
Zero-Shot: VenusMutHub • ProteinGym (217 DMS)
Function: EC • GO_BP • GO_CC • GO_MF Localization: DeepLocBinary • DeepLocMulti • DeepLoc2Multi Stability: Thermostability • TAPE_Stability Solubility: DeepSol • DeepSoluE • eSOL • ProtSolM • PETA_CHS/LGK/TEM_Sol Mutation: FLIP_AAV (7 splits) • FLIP_GB1 (5 splits) • TAPE_Fluorescence Others: DeepET_Topt • MetalIonBinding • SortingSignal • PaCRISPR
All datasets available on HuggingFace
🍎 macOS (M1/M2/M3)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install torch_scatter torch-sparse torch-geometric -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements_for_macOS.txt🪟 Windows / 🐧 Linux (CUDA 12.8)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.8.0 torchvision --index-url https://download.pytorch.org/whl/cu128
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
pip install -r requirements.txt🪟 Windows / 🐧 Linux (CUDA 11.8)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.7.0+cu118.html
pip install -r requirements.txt💻 CPU Only
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements.txtVerify: python -c "import torch; print(torch.__version__)"
WebUI v2 serves static files from
frontend/distin production mode, so runnpm run buildinfrontend/before startingsrc/webui_v2.py.
# Build WebUI v2 frontend assets first
cd frontend && npm run build && cd ..
# v1 (legacy Gradio) - local mode
python src/webui.py --mode all # → http://localhost:7860
# v1 (legacy Gradio) - online-compatible mode (feature-limited)
WEBUI_V2_MODE=online python src/webui.py --mode all # → http://localhost:7860
# v2 (FastAPI + React) - local mode
python src/webui_v2.py --host 0.0.0.0 --port 7861 # → http://localhost:7861
# v2 (FastAPI + React) - online mode
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online # → http://localhost:7861- Main runtime configuration template:
.env.example - Typical flow:
cp .env.example .envthen adjust required keys for your mode. - Minimal local setup: keep defaults, set
OPENAI_API_KEYonly if you use LLM-backed features. - Minimal online setup: set
WEBUI_V2_MODE=online,WEBUI_V2_SESSION_TOKEN_SECRET, and review allWEBUI_V2_*_LIMITvalues.
| Tab | Purpose | Features |
|---|---|---|
| Training | Train custom models | Model selection • PEFT methods • Real-time monitoring • Wandb |
| Evaluation | Benchmark testing | Load model • Select metrics • CSV export |
| Prediction | Inference | Single/batch prediction • Result visualization |
| Agent | Natural language | Multi-step automation • Tool orchestration |
| Quick Tools | Rapid prediction | Mutation scoring • Function prediction |
| Advanced | Deep analysis | Sequence/structure-based models |
| Download | Data retrieval | AlphaFold • UniProt • RCSB • InterPro |
| Manual | Documentation | Guides & tutorials |
Command Line Examples
# Train model
bash script/train/train_plm_lora.sh \
--model facebook/esm2_t33_650M_UR50D \
--dataset DeepSol --batch_size 32
# Evaluate
bash script/eval/eval.sh \
--model_path ckpt/DeepSol/best_model \
--test_dataset DeepSol
# Download data
bash script/tools/database/alphafold/download_alphafold_structure.sh
bash script/tools/database/uniprot/download_uniprot_seq.sh
# Generate structure sequences
bash script/get_structure_seq/get_esm3_structure_seq.shREST API Examples
# Start server
python src/api_server.py # → http://localhost:5000/docs
# Mutation prediction
curl -X POST http://localhost:5000/api/mutation/predict \
-H "Content-Type: application/json" \
-d '{"sequence": "MKTAYIA...", "mutations": ["A23V", "K45R"]}'
# Function prediction
curl -X POST http://localhost:5000/api/predict/function \
-H "Content-Type: application/json" \
-d '{"sequence": "MKTAYIA...", "tasks": ["solubility", "stability"]}'
# Database search
curl http://localhost:5000/api/database/uniprot/search?query=lysozyme&limit=10Python API
from src.tools.mutation import predict_mutation_effects
from src.tools.predict import predict_protein_function
from src.tools.database import download_alphafold_structure
# Mutations
results = predict_mutation_effects(
sequence="MKTAYIAKQR...",
mutations=["A5V", "K9R"],
model="esm2"
)
# Function
predictions = predict_protein_function(
sequence="MKTAYIA...",
tasks=["solubility", "stability"]
)
# Data
pdb_file = download_alphafold_structure("P12345")| Method | Memory | Speed | Performance | Best For |
|---|---|---|---|---|
| LoRA | Low | Fast | Good | General tasks |
| QLoRA | Very Low | Slow | Good | Limited GPU |
| DoRA | Low | Medium | Better | Improved LoRA |
| AdaLoRA | Low | Medium | Better | Adaptive rank |
| SES-Adapter | Medium | Medium | Better | Selective tuning |
| IA3 | Very Low | Fast | Good | Lightweight |
| Freeze | Low | Fast | Good | Simple tuning |
@article{tan2026venusfactory2,
title={Self-evolving AI agents for protein discovery and directed evolution},
author={Tan, Yang and Zhang, Lingrong and Li, Mingchen and Yu, Yuanxi and Zhong, Bozitao and Zhou, Bingxin and Dong, Nanqing and Hong, Liang},
journal={arXiv preprint arXiv:2603.27303},
year={2026}
}
@inproceedings{tan2025venusfactory,
title={VenusFactory: An Integrated System for Protein Engineering with Data Retrieval and Language Model Fine-Tuning},
author={Tan, Yang and Liu, Chen and Gao, Jingyuan and Banghao, Wu and Li, Mingchen and Wang, Ruilin and Zhang, Lingrong and Yu, Huiqun and Fan, Guisheng and Hong, Liang and others},
booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
pages={230--241},
year={2025}
}Developed by Liang's Lab at Shanghai Jiao Tong University.
Resources: Docs • YouTube • Playground • Issues
Made with ❤️ for the protein engineering community






