GitHub - ai4protein/VenusFactory2: 🏭 AI agent platform with skills for protein engineering, the noob-friendly AI tutorial tool for life science professionals.

🤖 Agent-Driven Protein Engineering Platform One platform, three interfaces, infinite possibilities

🌟 Recent News

[2026-04-01] 🎉 New released VenusFactory2 website at venusfactory.cn/playground
[2026-03-27] 🚀 VenusFactory2 technical report released at arXiv:2603.27303
[2026-01-23] 🚀 Added VenusX (ICLR2026) in VenusFactory2
[2025-04-19] 🎉 VenusREM (ISMB/ECCB2025) #1 in ProteinGym & VenusMutHub!

📨 Join our WeChat Group / 📝 Share Your Feedback

🎯 What is VenusFactory2?

VenusFactory2 is an Agent-driven protein engineering platform combining 40+ AI models with 11 biological databases. Designed for everyone — from biologists to AI researchers.

🚀 Why VenusFactory2?

🤖 Agent-First	🎯 Three Interfaces	⚡ Zero to Results
Natural language → Multi-step automation	Web UI / REST API / CLI	Upload → Predict in seconds
40+ models + 11 databases	Same power, different styles	Or train custom models in minutes

📖 Easy to Learn: Designed for life science professionals with no programming background required. Intuitive Web UI, comprehensive bilingual documentation, rich examples and video tutorials help you quickly grow from beginner to protein AI expert.

💡 Capabilities at a Glance

Task	Solution	Time
🧬 Mutation effects	ESM-2, ProSST, ProtSSN (zero-shot)	<1 min
🎯 Protein function	30+ fine-tuned models	<30 sec
🔬 Custom training	7 PEFT methods (LoRA, QLoRA, etc.)	10-60 min
💾 Data download	AlphaFold, UniProt, RCSB, KEGG, etc.	Real-time
📚 Literature	AI-powered search & analysis	<2 min

⚡ Quick Start

1. Install

git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install -r requirements.txt  # Detailed guide below ↓

2. Build Frontend (WebUI v2 required)

cd frontend
npm install
npm run build
cd ..

3. Launch

# Web UI v1 (legacy Gradio, local mode)
python src/webui.py --mode all  # → http://localhost:7860

# Web UI v2 (FastAPI + React, local mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861  # → http://localhost:7861

# Web UI v2 (FastAPI + React, online mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online

# REST API only
python src/api_server.py  # → http://localhost:5000/docs

# CLI
bash script/train/train_plm_lora.sh

3. Get Results

🤖 Try Agent-0.1 | ⚡ Quick Tools | 🔬 Train Models (Click to expand examples)

Agent-0.1 (Natural Language)

Q: "Predict stability for sequence MKTAYIAKQRQISFV..."
→ Agent auto-selects model → Runs prediction → Returns results + explanations

Quick Mutation Scoring

Upload: PDB/FASTA → Mutations: A23V, K45R → Get: Stability scores

Train Your Model

Model: ESM2-650M → Dataset: DeepSol → Method: LoRA → 15 min → Trained model ✓

🤖 Agent-0.1: The Brain

Agent-0.1 orchestrates all tools via natural language. Powered by LangGraph + LangChain.

You: "Design thermostable mutations for PDB:1ABC"
         ↓
    🤖 Agent Planning
         ↓
  📥 Download → 🧬 Predict → 🎯 Score → 📊 Report
  RCSB PDB     ESM-2 scan    Stability   Ranked list

✨ Agent Capabilities

Category	Features
🔬 Analysis	Mutation prediction • Function/stability scoring • Structure analysis
💾 Data	Multi-database search • Format conversion • Batch processing
🧠 Planning	Multi-step automation • Tool orchestration • Error handling
📚 Research	Literature mining • Family analysis • Report generation

💬 Example Conversations

Mutation Design:

You: "Improve thermostability of MKTAYIAKQR..."
Agent: ✓ ESM-2 scanning... ✓ Stability scoring...
→ Top 3: A5V (+2.8 kcal/mol), K9R (+1.9), T2S (+1.5)

Database Search:

You: "Find lysozyme structures <2.0Å resolution"
Agent: ✓ Searching RCSB... → Found 47 structures
→ Downloaded to: temp_outputs/lysozyme_structures/

💡 Note: Requires API key (OpenAI/Anthropic). Currently in Beta.

🏗️ Architecture

🌐 Interfaces: Web UI | REST API | CLI
        ↓
   🤖 Agent Layer (LangGraph + LangChain)
        ↓
   🔧 Application: Train | Eval | Predict | Tools
        ↓
   🛠️ Core Tools: 9 categories (mutation, database, search, etc.)
        ↓
   📊 Resources: 40+ Models | 30+ Datasets | 11+ Databases

📚 Integrated Resources

Models (40+): ESM, ProtBert, ProtT5, Venus/PETA/ProSST series

Databases (11+): AlphaFold • RCSB PDB • UniProt • NCBI • KEGG • STRING • BRENDA • ChEMBL • HPA • FDA • Foldseek

Datasets (30+): Function • Localization • Stability • Solubility • Mutation fitness

🔧 Tool Categories

Tool	Description	Agent	CLI
🧬 Mutation	ESM-1v, ESM-2, ProSST, ProtSSN, MIF-ST	✅	✅
🎯 Prediction	30+ fine-tuned models	✅	✅
💾 Database	11 integrations	✅	✅
🔍 Search	PubMed, FDA, patents	✅	✅
🏋️ Training	LoRA, QLoRA, DoRA, etc.	✅	✅
📁 File	Format conversion	✅	✅
🔬 Denovo	Protein design	✅	✅
🧪 Discovery	Novel discovery	✅	✅
📊 Visualize	3D viewer	✅	✅

🧬 Supported Models

40+ Protein Language Models (Click to expand)

Venus Series (Liang's Lab): ProSST-20/128/512/1024/2048/4096 (110M) • ProPrime-690M • VenusPLM-300M • PETA-base/bpe/unigram (80M)

ESM Series (Meta AI): ESM2: 8M, 35M, 150M, 650M, 3B, 15B • ESM-1v: 5 models (650M each)

ProtBert & ProtT5: ProtBert-Uniref100/BFD (420M) • IgBert (420M) • ProtT5-XL/XXL (3B-11B) • Ankh-base/large (450M-1.2B)

Selection Guide:

GPU <8GB: ESM2-8M/35M, ProSST
GPU 8-16GB: ESM2-150M/650M, ProtBert
GPU 24GB+: ESM2-3B, ProtT5-XL
Multi-GPU: ESM2-15B, ProtT5-XXL

By Task:

Classification: ESM2, ProtBert
Structure: Ankh
Generation: ProtT5
Antibody: IgBert/IgT5
Lightweight: ProSST, PETA

📚 Supported Datasets

30+ Supervised + Zero-Shot Datasets

Zero-Shot: VenusMutHub • ProteinGym (217 DMS)

Function: EC • GO_BP • GO_CC • GO_MF Localization: DeepLocBinary • DeepLocMulti • DeepLoc2Multi Stability: Thermostability • TAPE_Stability Solubility: DeepSol • DeepSoluE • eSOL • ProtSolM • PETA_CHS/LGK/TEM_Sol Mutation: FLIP_AAV (7 splits) • FLIP_GB1 (5 splits) • TAPE_Fluorescence Others: DeepET_Topt • MetalIonBinding • SortingSignal • PaCRISPR

All datasets available on HuggingFace

📦 Installation

🍎 macOS (M1/M2/M3)

git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install torch_scatter torch-sparse torch-geometric -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements_for_macOS.txt

🪟 Windows / 🐧 Linux (CUDA 12.8)

git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.8.0 torchvision --index-url https://download.pytorch.org/whl/cu128
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
pip install -r requirements.txt

🪟 Windows / 🐧 Linux (CUDA 11.8)

git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.7.0+cu118.html
pip install -r requirements.txt

💻 CPU Only

git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements.txt

Verify: python -c "import torch; print(torch.__version__)"

🚀 Usage

Web UI

WebUI v2 serves static files from frontend/dist in production mode, so run npm run build in frontend/ before starting src/webui_v2.py.

# Build WebUI v2 frontend assets first
cd frontend && npm run build && cd ..

# v1 (legacy Gradio) - local mode
python src/webui.py --mode all  # → http://localhost:7860

# v1 (legacy Gradio) - online-compatible mode (feature-limited)
WEBUI_V2_MODE=online python src/webui.py --mode all  # → http://localhost:7860

# v2 (FastAPI + React) - local mode
python src/webui_v2.py --host 0.0.0.0 --port 7861  # → http://localhost:7861

# v2 (FastAPI + React) - online mode
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online  # → http://localhost:7861

Configuration Entry

Main runtime configuration template: .env.example
Typical flow: cp .env.example .env then adjust required keys for your mode.
Minimal local setup: keep defaults, set OPENAI_API_KEY only if you use LLM-backed features.
Minimal online setup: set WEBUI_V2_MODE=online, WEBUI_V2_SESSION_TOKEN_SECRET, and review all WEBUI_V2_*_LIMIT values.

Tab	Purpose	Features
Training	Train custom models	Model selection • PEFT methods • Real-time monitoring • Wandb
Evaluation	Benchmark testing	Load model • Select metrics • CSV export
Prediction	Inference	Single/batch prediction • Result visualization
Agent	Natural language	Multi-step automation • Tool orchestration
Quick Tools	Rapid prediction	Mutation scoring • Function prediction
Advanced	Deep analysis	Sequence/structure-based models
Download	Data retrieval	AlphaFold • UniProt • RCSB • InterPro
Manual	Documentation	Guides & tutorials

Screenshots

CLI & API

Command Line Examples

# Train model
bash script/train/train_plm_lora.sh \
  --model facebook/esm2_t33_650M_UR50D \
  --dataset DeepSol --batch_size 32

# Evaluate
bash script/eval/eval.sh \
  --model_path ckpt/DeepSol/best_model \
  --test_dataset DeepSol

# Download data
bash script/tools/database/alphafold/download_alphafold_structure.sh
bash script/tools/database/uniprot/download_uniprot_seq.sh

# Generate structure sequences
bash script/get_structure_seq/get_esm3_structure_seq.sh

REST API Examples

# Start server
python src/api_server.py  # → http://localhost:5000/docs

# Mutation prediction
curl -X POST http://localhost:5000/api/mutation/predict \
  -H "Content-Type: application/json" \
  -d '{"sequence": "MKTAYIA...", "mutations": ["A23V", "K45R"]}'

# Function prediction
curl -X POST http://localhost:5000/api/predict/function \
  -H "Content-Type: application/json" \
  -d '{"sequence": "MKTAYIA...", "tasks": ["solubility", "stability"]}'

# Database search
curl http://localhost:5000/api/database/uniprot/search?query=lysozyme&limit=10

Python API

from src.tools.mutation import predict_mutation_effects
from src.tools.predict import predict_protein_function
from src.tools.database import download_alphafold_structure

# Mutations
results = predict_mutation_effects(
    sequence="MKTAYIAKQR...",
    mutations=["A5V", "K9R"],
    model="esm2"
)

# Function
predictions = predict_protein_function(
    sequence="MKTAYIA...",
    tasks=["solubility", "stability"]
)

# Data
pdb_file = download_alphafold_structure("P12345")

📊 Training Methods

Method	Memory	Speed	Performance	Best For
LoRA	Low	Fast	Good	General tasks
QLoRA	Very Low	Slow	Good	Limited GPU
DoRA	Low	Medium	Better	Improved LoRA
AdaLoRA	Low	Medium	Better	Adaptive rank
SES-Adapter	Medium	Medium	Better	Selective tuning
IA3	Very Low	Fast	Good	Lightweight
Freeze	Low	Fast	Good	Simple tuning

🙌 Citation

@article{tan2026venusfactory2,
  title={Self-evolving AI agents for protein discovery and directed evolution},
  author={Tan, Yang and Zhang, Lingrong and Li, Mingchen and Yu, Yuanxi and Zhong, Bozitao and Zhou, Bingxin and Dong, Nanqing and Hong, Liang},
  journal={arXiv preprint arXiv:2603.27303},
  year={2026}
}

@inproceedings{tan2025venusfactory,
  title={VenusFactory: An Integrated System for Protein Engineering with Data Retrieval and Language Model Fine-Tuning},
  author={Tan, Yang and Liu, Chen and Gao, Jingyuan and Banghao, Wu and Li, Mingchen and Wang, Ruilin and Zhang, Lingrong and Yu, Huiqun and Fan, Guisheng and Hong, Liang and others},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
  pages={230--241},
  year={2025}
}

🎊 Acknowledgement

Developed by Liang's Lab at Shanghai Jiao Tong University.

Resources: Docs • YouTube • Playground • Issues

Made with ❤️ for the protein engineering community

⭐ Star • 🐛 Report Bug • 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 753 Commits
.install		.install
ckpt		ckpt
data		data
docker		docker
docs/manual		docs/manual
example		example
frontend		frontend
img		img
script/tools		script/tools
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
Scripts_notebook.ipynb		Scripts_notebook.ipynb
docker-compose.yaml		docker-compose.yaml
environment.yaml		environment.yaml
frpc_linux_amd64_v0.3		frpc_linux_amd64_v0.3
install.py		install.py
install_config.json		install_config.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_for_macOS.txt		requirements_for_macOS.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Recent News

🎯 What is VenusFactory2?

🚀 Why VenusFactory2?

💡 Capabilities at a Glance

⚡ Quick Start

1. Install

2. Build Frontend (WebUI v2 required)

3. Launch

3. Get Results

🤖 Agent-0.1: The Brain

🏗️ Architecture

🧬 Supported Models

📚 Supported Datasets

📦 Installation

🚀 Usage

Web UI

Configuration Entry

CLI & API

📊 Training Methods

🙌 Citation

🎊 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌟 Recent News

🎯 What is VenusFactory2?

🚀 Why VenusFactory2?

💡 Capabilities at a Glance

⚡ Quick Start

1. Install

2. Build Frontend (WebUI v2 required)

3. Launch

3. Get Results

🤖 Agent-0.1: The Brain

🏗️ Architecture

🧬 Supported Models

📚 Supported Datasets

📦 Installation

🚀 Usage

Web UI

Configuration Entry

CLI & API

📊 Training Methods

🙌 Citation

🎊 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages