Skip to content

ai4protein/VenusFactory2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

753 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VenusFactory2 Banner

GitHub stars GitHub forks GitHub issues GitHub license

Python Version Documentation Downloads Youtube Demo

🤖 Agent-Driven Protein Engineering Platform One platform, three interfaces, infinite possibilities


🌟 Recent News

📨 Join our WeChat Group / 📝 Share Your Feedback

WeChat Group


🎯 What is VenusFactory2?

VenusFactory2 is an Agent-driven protein engineering platform combining 40+ AI models with 11 biological databases. Designed for everyone — from biologists to AI researchers.

VenusFactory2 Training Interface

VenusFactory2 Agent Interface

🚀 Why VenusFactory2?

🤖 Agent-First 🎯 Three Interfaces ⚡ Zero to Results
Natural language → Multi-step automation Web UI / REST API / CLI Upload → Predict in seconds
40+ models + 11 databases Same power, different styles Or train custom models in minutes

📖 Easy to Learn: Designed for life science professionals with no programming background required. Intuitive Web UI, comprehensive bilingual documentation, rich examples and video tutorials help you quickly grow from beginner to protein AI expert.

💡 Capabilities at a Glance

Task Solution Time
🧬 Mutation effects ESM-2, ProSST, ProtSSN (zero-shot) <1 min
🎯 Protein function 30+ fine-tuned models <30 sec
🔬 Custom training 7 PEFT methods (LoRA, QLoRA, etc.) 10-60 min
💾 Data download AlphaFold, UniProt, RCSB, KEGG, etc. Real-time
📚 Literature AI-powered search & analysis <2 min

⚡ Quick Start

1. Install

git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install -r requirements.txt  # Detailed guide below ↓

2. Build Frontend (WebUI v2 required)

cd frontend
npm install
npm run build
cd ..

3. Launch

# Web UI v1 (legacy Gradio, local mode)
python src/webui.py --mode all  # → http://localhost:7860

# Web UI v2 (FastAPI + React, local mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861  # → http://localhost:7861

# Web UI v2 (FastAPI + React, online mode)
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online

# REST API only
python src/api_server.py  # → http://localhost:5000/docs

# CLI
bash script/train/train_plm_lora.sh

3. Get Results

🤖 Try Agent-0.1 | ⚡ Quick Tools | 🔬 Train Models (Click to expand examples)

Agent-0.1 (Natural Language)

Q: "Predict stability for sequence MKTAYIAKQRQISFV..."
→ Agent auto-selects model → Runs prediction → Returns results + explanations

Quick Mutation Scoring

Upload: PDB/FASTA → Mutations: A23V, K45R → Get: Stability scores

Train Your Model

Model: ESM2-650M → Dataset: DeepSol → Method: LoRA → 15 min → Trained model ✓


🤖 Agent-0.1: The Brain

Agent-0.1 orchestrates all tools via natural language. Powered by LangGraph + LangChain.

You: "Design thermostable mutations for PDB:1ABC"
         ↓
    🤖 Agent Planning
         ↓
  📥 Download → 🧬 Predict → 🎯 Score → 📊 Report
  RCSB PDB     ESM-2 scan    Stability   Ranked list
✨ Agent Capabilities
Category Features
🔬 Analysis Mutation prediction • Function/stability scoring • Structure analysis
💾 Data Multi-database search • Format conversion • Batch processing
🧠 Planning Multi-step automation • Tool orchestration • Error handling
📚 Research Literature mining • Family analysis • Report generation
💬 Example Conversations

Mutation Design:

You: "Improve thermostability of MKTAYIAKQR..."
Agent: ✓ ESM-2 scanning... ✓ Stability scoring...
→ Top 3: A5V (+2.8 kcal/mol), K9R (+1.9), T2S (+1.5)

Database Search:

You: "Find lysozyme structures <2.0Å resolution"
Agent: ✓ Searching RCSB... → Found 47 structures
→ Downloaded to: temp_outputs/lysozyme_structures/

💡 Note: Requires API key (OpenAI/Anthropic). Currently in Beta.


🏗️ Architecture

🌐 Interfaces: Web UI | REST API | CLI
        ↓
   🤖 Agent Layer (LangGraph + LangChain)
        ↓
   🔧 Application: Train | Eval | Predict | Tools
        ↓
   🛠️ Core Tools: 9 categories (mutation, database, search, etc.)
        ↓
   📊 Resources: 40+ Models | 30+ Datasets | 11+ Databases
📚 Integrated Resources

Models (40+): ESM, ProtBert, ProtT5, Venus/PETA/ProSST series

Databases (11+): AlphaFold • RCSB PDB • UniProt • NCBI • KEGG • STRING • BRENDA • ChEMBL • HPA • FDA • Foldseek

Datasets (30+): Function • Localization • Stability • Solubility • Mutation fitness

🔧 Tool Categories
Tool Description Agent CLI
🧬 Mutation ESM-1v, ESM-2, ProSST, ProtSSN, MIF-ST
🎯 Prediction 30+ fine-tuned models
💾 Database 11 integrations
🔍 Search PubMed, FDA, patents
🏋️ Training LoRA, QLoRA, DoRA, etc.
📁 File Format conversion
🔬 Denovo Protein design
🧪 Discovery Novel discovery
📊 Visualize 3D viewer

🧬 Supported Models

40+ Protein Language Models (Click to expand)

Venus Series (Liang's Lab): ProSST-20/128/512/1024/2048/4096 (110M) • ProPrime-690M • VenusPLM-300M • PETA-base/bpe/unigram (80M)

ESM Series (Meta AI): ESM2: 8M, 35M, 150M, 650M, 3B, 15B • ESM-1v: 5 models (650M each)

ProtBert & ProtT5: ProtBert-Uniref100/BFD (420M) • IgBert (420M) • ProtT5-XL/XXL (3B-11B) • Ankh-base/large (450M-1.2B)

Selection Guide:

  • GPU <8GB: ESM2-8M/35M, ProSST
  • GPU 8-16GB: ESM2-150M/650M, ProtBert
  • GPU 24GB+: ESM2-3B, ProtT5-XL
  • Multi-GPU: ESM2-15B, ProtT5-XXL

By Task:

  • Classification: ESM2, ProtBert
  • Structure: Ankh
  • Generation: ProtT5
  • Antibody: IgBert/IgT5
  • Lightweight: ProSST, PETA

📚 Supported Datasets

30+ Supervised + Zero-Shot Datasets

Zero-Shot: VenusMutHub • ProteinGym (217 DMS)

Function: EC • GO_BP • GO_CC • GO_MF Localization: DeepLocBinary • DeepLocMulti • DeepLoc2Multi Stability: Thermostability • TAPE_Stability Solubility: DeepSol • DeepSoluE • eSOL • ProtSolM • PETA_CHS/LGK/TEM_Sol Mutation: FLIP_AAV (7 splits) • FLIP_GB1 (5 splits) • TAPE_Fluorescence Others: DeepET_Topt • MetalIonBinding • SortingSignal • PaCRISPR

All datasets available on HuggingFace


📦 Installation

🍎 macOS (M1/M2/M3)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install torch_scatter torch-sparse torch-geometric -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements_for_macOS.txt
🪟 Windows / 🐧 Linux (CUDA 12.8)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.8.0 torchvision --index-url https://download.pytorch.org/whl/cu128
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.8.0+cu128.html
pip install -r requirements.txt
🪟 Windows / 🐧 Linux (CUDA 11.8)
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.7.0+cu118.html
pip install -r requirements.txt
💻 CPU Only
git clone https://github.com/AI4Protein/VenusFactory2.git && cd VenusFactory2
conda create -n venus python=3.12 && conda activate venus
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.8.0+cpu.html
pip install -r requirements.txt

Verify: python -c "import torch; print(torch.__version__)"


🚀 Usage

Web UI

WebUI v2 serves static files from frontend/dist in production mode, so run npm run build in frontend/ before starting src/webui_v2.py.

# Build WebUI v2 frontend assets first
cd frontend && npm run build && cd ..

# v1 (legacy Gradio) - local mode
python src/webui.py --mode all  # → http://localhost:7860

# v1 (legacy Gradio) - online-compatible mode (feature-limited)
WEBUI_V2_MODE=online python src/webui.py --mode all  # → http://localhost:7860

# v2 (FastAPI + React) - local mode
python src/webui_v2.py --host 0.0.0.0 --port 7861  # → http://localhost:7861

# v2 (FastAPI + React) - online mode
python src/webui_v2.py --host 0.0.0.0 --port 7861 --online  # → http://localhost:7861

Configuration Entry

  • Main runtime configuration template: .env.example
  • Typical flow: cp .env.example .env then adjust required keys for your mode.
  • Minimal local setup: keep defaults, set OPENAI_API_KEY only if you use LLM-backed features.
  • Minimal online setup: set WEBUI_V2_MODE=online, WEBUI_V2_SESSION_TOKEN_SECRET, and review all WEBUI_V2_*_LIMIT values.
Tab Purpose Features
Training Train custom models Model selection • PEFT methods • Real-time monitoring • Wandb
Evaluation Benchmark testing Load model • Select metrics • CSV export
Prediction Inference Single/batch prediction • Result visualization
Agent Natural language Multi-step automation • Tool orchestration
Quick Tools Rapid prediction Mutation scoring • Function prediction
Advanced Deep analysis Sequence/structure-based models
Download Data retrieval AlphaFold • UniProt • RCSB • InterPro
Manual Documentation Guides & tutorials
Screenshots

Training Evaluation Prediction

CLI & API

Command Line Examples
# Train model
bash script/train/train_plm_lora.sh \
  --model facebook/esm2_t33_650M_UR50D \
  --dataset DeepSol --batch_size 32

# Evaluate
bash script/eval/eval.sh \
  --model_path ckpt/DeepSol/best_model \
  --test_dataset DeepSol

# Download data
bash script/tools/database/alphafold/download_alphafold_structure.sh
bash script/tools/database/uniprot/download_uniprot_seq.sh

# Generate structure sequences
bash script/get_structure_seq/get_esm3_structure_seq.sh
REST API Examples
# Start server
python src/api_server.py  # → http://localhost:5000/docs

# Mutation prediction
curl -X POST http://localhost:5000/api/mutation/predict \
  -H "Content-Type: application/json" \
  -d '{"sequence": "MKTAYIA...", "mutations": ["A23V", "K45R"]}'

# Function prediction
curl -X POST http://localhost:5000/api/predict/function \
  -H "Content-Type: application/json" \
  -d '{"sequence": "MKTAYIA...", "tasks": ["solubility", "stability"]}'

# Database search
curl http://localhost:5000/api/database/uniprot/search?query=lysozyme&limit=10
Python API
from src.tools.mutation import predict_mutation_effects
from src.tools.predict import predict_protein_function
from src.tools.database import download_alphafold_structure

# Mutations
results = predict_mutation_effects(
    sequence="MKTAYIAKQR...",
    mutations=["A5V", "K9R"],
    model="esm2"
)

# Function
predictions = predict_protein_function(
    sequence="MKTAYIA...",
    tasks=["solubility", "stability"]
)

# Data
pdb_file = download_alphafold_structure("P12345")

📊 Training Methods

Method Memory Speed Performance Best For
LoRA Low Fast Good General tasks
QLoRA Very Low Slow Good Limited GPU
DoRA Low Medium Better Improved LoRA
AdaLoRA Low Medium Better Adaptive rank
SES-Adapter Medium Medium Better Selective tuning
IA3 Very Low Fast Good Lightweight
Freeze Low Fast Good Simple tuning

🙌 Citation

@article{tan2026venusfactory2,
  title={Self-evolving AI agents for protein discovery and directed evolution},
  author={Tan, Yang and Zhang, Lingrong and Li, Mingchen and Yu, Yuanxi and Zhong, Bozitao and Zhou, Bingxin and Dong, Nanqing and Hong, Liang},
  journal={arXiv preprint arXiv:2603.27303},
  year={2026}
}

@inproceedings{tan2025venusfactory,
  title={VenusFactory: An Integrated System for Protein Engineering with Data Retrieval and Language Model Fine-Tuning},
  author={Tan, Yang and Liu, Chen and Gao, Jingyuan and Banghao, Wu and Li, Mingchen and Wang, Ruilin and Zhang, Lingrong and Yu, Huiqun and Fan, Guisheng and Hong, Liang and others},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
  pages={230--241},
  year={2025}
}

🎊 Acknowledgement

Developed by Liang's Lab at Shanghai Jiao Tong University.

Resources: DocsYouTubePlaygroundIssues


Made with ❤️ for the protein engineering community

⭐ Star🐛 Report Bug💡 Request Feature

About

🏭 AI agent platform with skills for protein engineering, the noob-friendly AI tutorial tool for life science professionals.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors