Skip to content

hackfest-dev/HF26-24

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

95 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 Drug Discovery AI

Multi-agent precision medicine pipeline for novel drug discovery β€” generating molecules, predicting selectivity, matching clinical trials, and persisting discoveries in under 60 seconds.

Backend CI Frontend CI


πŸš€ What This Is

Drug Discovery AI is a 18-agent pipeline that takes a gene mutation query (e.g. EGFR T790M) and:

  1. Parses the mutation with LLM + regex fallback
  2. Fetches literature (PubMed), proteins (UniProt), structures (RCSB), compounds (PubChem) in parallel
  3. Downloads PDB structures and detects binding pockets (fpocket / centroid fallback)
  4. Generates novel molecules via RDKit scaffold hopping + bioisostere SMARTS mutations
  5. Docks molecules (Gnina β†’ Vina β†’ AI hash fallback)
  6. Dual-docks vs off-target proteins to compute selectivity ratio β€” the key differentiator
  7. Screens ADMET (Lipinski + PAINS + toxicophore images)
  8. Optimizes leads with scaffold hopping, bioisostere replacement, and fragment growing β€” building an evolution tree
  9. Forecasts resistance mutations with LLM
  10. Matches active clinical trials via ClinicalTrials.gov API v2
  11. Builds a knowledge graph and reasoning trace
  12. Saves discoveries to Neon PostgreSQL and exposes a full REST API

πŸ“ Project Structure

drug-discovery-ai/
β”œβ”€β”€ backend/          # Python 3.11 + FastAPI + 18 agents
β”‚   β”œβ”€β”€ agents/       # 18 pipeline agents
β”‚   β”œβ”€β”€ pipeline/     # LangGraph state + orchestrator
β”‚   β”œβ”€β”€ utils/        # LLM router, DB, ADMET, molecule utils
β”‚   β”œβ”€β”€ routers/      # FastAPI route handlers
β”‚   β”œβ”€β”€ data/         # JSON data files (curated profiles, resistance, etc.)
β”‚   └── evaluation/   # Benchmark runner
β”‚
β”œβ”€β”€ frontend/         # Next.js 16 + TypeScript + Tailwind v4
β”‚   └── app/
β”‚       β”œβ”€β”€ components/   # All UI components (analysis, landing, settings)
β”‚       β”œβ”€β”€ hooks/        # useSSEStream, useAnalysis, useDiscoveries, useTheme
β”‚       β”œβ”€β”€ lib/          # api.ts, types.ts, utils.ts, theme.ts
β”‚       β”œβ”€β”€ analysis/     # [sessionId] analysis page (10 tabs)
β”‚       β”œβ”€β”€ discoveries/  # Discovery library browser
β”‚       └── settings/     # Theme customizer, pipeline config, API key checker
β”‚
β”œβ”€β”€ .github/
β”‚   └── workflows/    # backend-ci.yml + frontend-ci.yml
β”œβ”€β”€ start.sh          # Unix quick-start
β”œβ”€β”€ start.bat         # Windows quick-start
└── README.md

⚑ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 20+
  • (Optional) AutoDock Vina or Gnina for real docking
  • (Optional) fpocket for pocket detection

1. Backend

cd backend
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Fill in at least GROQ_API_KEY (free at console.groq.com)
uvicorn main:app --reload --port 8000

2. Frontend

cd frontend
npm install
cp .env.local.example .env.local
npm run dev

Open http://localhost:3000.


πŸ”‘ Environment Variables

Backend (backend/.env)

Variable Required Description
OPENAI_API_KEY Optional GPT-4o-mini (primary LLM)
GROQ_API_KEY Recommended Llama 3.3 70B β€” free at console.groq.com
TOGETHER_API_KEY Optional Mistral 7B fallback
NCBI_API_KEY Optional Higher PubMed rate limits
LANGCHAIN_API_KEY Optional LangSmith observability
DATABASE_URL Optional Neon PostgreSQL connection string
AUTO_SAVE_DISCOVERIES Optional Set true to auto-persist every run

Frontend (frontend/.env.local)

NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_APP_NAME="Drug Discovery AI"

πŸ§ͺ Tech Stack

Backend

  • FastAPI 0.115 + uvicorn + SSE streaming
  • LangGraph 0.2 pipeline orchestration
  • LangSmith observability (enterprise trace dashboard)
  • RDKit β€” molecule generation, ADMET, depiction
  • Multi-LLM fallback: OpenAI GPT-4o-mini β†’ Groq Llama 3.3 70B β†’ Together Mistral 7B β†’ deterministic template
  • Neon PostgreSQL via SQLAlchemy asyncio + asyncpg
  • External APIs: PubMed, UniProt, RCSB, PubChem, ClinicalTrials.gov

Frontend

  • Next.js 16 App Router + TypeScript strict
  • Tailwind CSS v4 + amber minimal theme system
  • GSAP 3.12 + ScrollTrigger β€” hero animations, counter reveals
  • Framer Motion 11 β€” agent status stagger, 2D/3D crossfade
  • D3.js 7 β€” knowledge graph force-directed, evolution tree
  • Recharts β€” ADMET radar chart, docking score chart
  • shadcn/ui (Radix primitives) β€” accessible components
  • NGL β€” 3D molecular viewer
  • Biome β€” linting + formatting (no ESLint, no Prettier)

πŸ† Win Factor Features

Feature Description
Selectivity Ratio Dual-docks top leads vs. off-target proteins. ratio = target / off-target. β‰₯3.0 = High selectivity
Evolution Tree D3 visualization of how seed molecules transform through scaffold hopping + bioisostere operations
LangSmith Tracing Enterprise observability β€” every agent call, token count, latency, auditable in real time
Clinical Trial Matching Live ClinicalTrials.gov API β€” links discovery to active patient trials
Resistance Forecasting LLM predicts which secondary mutations will emerge under treatment pressure
Discovery Database Neon PostgreSQL β€” persists all discoveries, browseable in the Discoveries Library

πŸ—£οΈ Demo Script (Hackathon)

  1. "Meet Sarah. 52 years old. Lung cancer. EGFR T790M. Erlotinib stopped working."
  2. Type EGFR T790M β†’ Launch Analysis
  3. Watch 18 agents stream live in the PipelineStatus panel
  4. Top Leads tab: "Our AI generated molecules that don't exist in any database. This one binds 3.2Γ— harder to the cancer target than to healthy ABL1 kinase."
  5. Evolution Tree: "Here's exactly how the seed molecule was transformed β€” scaffold hop +1.4 kcal/mol, bioisostere +0.8 kcal/mol."
  6. Clinical Trials: "3 active Phase II trials targeting EGFR T790M. Our molecule targets the same pocket."
  7. Open LangSmith tab: "Every decision, auditable. Enterprise ready."
  8. Click Save Discovery β†’ "Permanently saved to our Neon database."

πŸ› οΈ Development

Lint & Format

# Backend
cd backend
ruff check .
mypy .

# Frontend
cd frontend
npm run check        # Biome check
npm run check:fix    # Auto-fix
npm run typecheck    # TypeScript

Run Benchmark

cd backend
python -c "
from evaluation.benchmark_runner import run_benchmark_cases
import asyncio
r = asyncio.run(run_benchmark_cases())
print(f'Accuracy: {r[\"accuracy\"]*100:.0f}%')
"

Verify SelectivityAgent

cd backend
python -c "
from agents.SelectivityAgent import SelectivityAgent
import asyncio
r = asyncio.run(SelectivityAgent().run({
    'docking_results': [{'smiles': 'CC(=O)Nc1ccc(O)cc1', 'binding_energy': -8.5, 'compound_name': 'Test'}],
    'mutation_context': {'gene': 'EGFR'},
    'analysis_plan': type('P', (), {'run_selectivity': True})(),
}))
print('SelectivityAgent OK:', r['selectivity_results'][0]['selectivity_label'])
"

🌿 Git Branches

Branch Owner Scope
main β€” Protected. Merge via PR only
feature/backend-agents Backend dev Python agents, pipeline, bioinformatics
feature/frontend-ui Frontend dev Next.js pages, components, GSAP
feature/testing-validation QA dev Pytest, benchmark, E2E validation

πŸ“„ License

MIT β€” built for a hackathon. Not for clinical use. All results are computational predictions only.

About

Hackfest26 repository for T24

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors