README: add eval philosophy — first principles, not overfitting

deepmind11 · deepmind11 · commit 886365e5b0bf · 2026-04-15T15:47:30.000-07:00
diff --git a/README.md b/README.md
@@ -72,6 +72,8 @@ scAgent is evaluated on [SC-Bench](https://github.com/latchbio/scbench) (Workman
 
 The eval runs the full LLM agent end-to-end: the agent receives a task prompt, reasons about what analysis to perform, calls tools, and produces a structured answer that is graded automatically.
 
+We do not tune scAgent to pass specific eval tasks. Instead, we bake in the reasoning the single-cell community used to arrive at its defaults — best practices from Scanpy, Seurat, and the literature — and let the agent reason from first principles on any given dataset. When an eval fails, we ask *what general knowledge is the agent missing?*, not *what answer does this eval expect?*.
+
 ```bash
 pip install -e ".[eval]"
 python eval/run_llm_benchmark.py                        # default: claude-opus-4-6