Skip to content

Commit 5512ee6

Browse files
committed
restore eval results table (6/7, trajectory failing)
1 parent a96b72a commit 5512ee6

File tree

1 file changed

+11
-1
lines changed

1 file changed

+11
-1
lines changed

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,17 @@ Example prompts:
5656

5757
## Evaluation: SC-Bench
5858

59-
scAgent is evaluated on [SC-Bench](https://github.com/latchbio/scbench) (Workman et al., 2026, [LatchBio](https://latch.bio)), a benchmark of 394 verifiable problems derived from practical scRNA-seq workflows. The 7 canonical Chromium evaluations are bundled in [`eval/evals_canonical_chromium/`](eval/evals_canonical_chromium/).
59+
scAgent is evaluated on [SC-Bench](https://github.com/latchbio/scbench) (Workman et al., 2026, [LatchBio](https://latch.bio)), a benchmark of 394 verifiable problems derived from practical scRNA-seq workflows. The current top baseline model on SC-Bench scores 52.8%. The 7 canonical Chromium evaluations are bundled in [`eval/evals_canonical_chromium/`](eval/evals_canonical_chromium/).
60+
61+
| Task | Result |
62+
|------|--------|
63+
| QC (cell filtering) | ✅ Pass |
64+
| Normalization | ✅ Pass |
65+
| HVG / Feature Selection | ✅ Pass |
66+
| Clustering | ✅ Pass |
67+
| Cell Type Annotation | ✅ Pass |
68+
| Differential Expression | ✅ Pass |
69+
| Trajectory Analysis | ❌ Fail |
6070

6171
The eval runs the full LLM agent end-to-end: the agent receives a task prompt, reasons about what analysis to perform, calls tools, and produces a structured answer that is graded automatically.
6272

0 commit comments

Comments
 (0)