Skip to content

Commit 886365e

Browse files
committed
README: add eval philosophy — first principles, not overfitting
1 parent 8bcd4d8 commit 886365e

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ scAgent is evaluated on [SC-Bench](https://github.com/latchbio/scbench) (Workman
7272

7373
The eval runs the full LLM agent end-to-end: the agent receives a task prompt, reasons about what analysis to perform, calls tools, and produces a structured answer that is graded automatically.
7474

75+
We do not tune scAgent to pass specific eval tasks. Instead, we bake in the reasoning the single-cell community used to arrive at its defaults — best practices from Scanpy, Seurat, and the literature — and let the agent reason from first principles on any given dataset. When an eval fails, we ask *what general knowledge is the agent missing?*, not *what answer does this eval expect?*.
76+
7577
```bash
7678
pip install -e ".[eval]"
7779
python eval/run_llm_benchmark.py # default: claude-opus-4-6

0 commit comments

Comments
 (0)