deepmind11
diff --git a/‎.pi/skills/cell-communication/SKILL.md‎
Lines changed: 61 additions & 0 deletions b/‎.pi/skills/cell-communication/SKILL.md‎
Lines changed: 61 additions & 0 deletions
diff --git a/‎.pi/skills/composition-analysis/SKILL.md‎
Lines changed: 80 additions & 0 deletions b/‎.pi/skills/composition-analysis/SKILL.md‎
Lines changed: 80 additions & 0 deletions
diff --git a/‎.pi/skills/immune-repertoire/SKILL.md‎
Lines changed: 41 additions & 0 deletions b/‎.pi/skills/immune-repertoire/SKILL.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎.pi/skills/multimodal/SKILL.md‎
Lines changed: 53 additions & 0 deletions b/‎.pi/skills/multimodal/SKILL.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎.pi/skills/perturbation-screen/SKILL.md‎
Lines changed: 63 additions & 0 deletions b/‎.pi/skills/perturbation-screen/SKILL.md‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎.pi/skills/temporal-analysis/SKILL.md‎
Lines changed: 79 additions & 0 deletions b/‎.pi/skills/temporal-analysis/SKILL.md‎
Lines changed: 79 additions & 0 deletions
@@ -0,0 +1,61 @@
+---
+name: cell-communication
+description: Infer cell-cell communication via ligand-receptor analysis using LIANA+. Use when analyzing tumor microenvironment, immune interactions, or any multi-cell-type system where signaling is of interest.
+---
+
+# Skill: Cell-Cell Communication
+
+Infer ligand-receptor interactions between cell types using LIANA+ (consensus meta-method).
+
+## When to Use
+
+- **Multi-cell-type systems** — tumor microenvironment, immune niches, developing tissues.
+- **User asks about cell signaling, ligand-receptor interactions, or intercellular communication.**
+- **After cell type annotation** — requires labeled cell types.
+
+## How to Use
+
+```python
+from scagent.tools.communication import run_liana
+
+result = run_liana(
+    adata,
+    cell_type_key="cell_type",
+    resource_name="consensus",  # LIANA's curated L-R database
+    n_perms=1000,
+    top_n=50,
+    plot_dir="plots/communication",
+)
+
+# Top interactions
+for interaction in result["interactions"][:10]:
+    print(f"{interaction['source']} → {interaction['target']}: "
+          f"{interaction['ligand_complex']} — {interaction['receptor_complex']}")
+```
+
+## Why LIANA+ Over CellChat?
+
+- **LIANA+ wraps 8 methods** (CellPhoneDB, NATMI, Connectome, etc.) + consensus ranking.
+- **Python-native**, scverse-maintained. CellChat is R-only with a single scoring function.
+- **Multiple databases.** CellChat uses only its own; LIANA provides consensus, CellPhoneDB, CellChatDB, etc.
+- **[BP-1]:** "Owing to the lack of consensus between tools, we recommend using LIANA."
+
+## Key Caveats
+
+1. **L-R databases are biased** toward specific pathways, functional categories, and tissue-enriched proteins. [BP-1] Choose the resource carefully — `"consensus"` is the safest default.
+2. **Statistical enrichment ≠ biological activity.** A significant L-R pair means co-expression, not proven signaling.
+3. **Method + database choice strongly affects results.** [BP-1] Use LIANA's consensus to reduce method-specific artifacts.
+4. **For higher confidence,** consider NicheNet for complementary intracellular activity estimates. [BP-1]
+
+## Guard Rails
+
+1. **Requires ≥2 cell types.** Refuse if only one cell type is annotated.
+2. **Warn if cell types have <10 cells.** Limited statistical power.
+3. **Always mention database bias** in results interpretation.
+4. **Don't over-interpret individual interactions.** Focus on patterns across cell-type pairs.
+
+## Best-Practice References
+
+- [BP-1] §"Communication events across cells" (p. 557)
+- [BP-2] Ch. 22 — Cell-cell communication (LIANA tutorial)
+- Dimitrov et al. 2022, Nat Commun & 2024, Nat Cell Biol — LIANA/LIANA+
@@ -0,0 +1,80 @@
+---
+name: composition-analysis
+description: Detect changes in cell-type proportions between conditions using scCODA or Milo. Use when the paradigm involves multi-condition data and the user asks about proportion changes, differential abundance, or cell-type composition shifts.
+---
+
+# Skill: Compositional Analysis
+
+Detect which cell types change in relative abundance between experimental conditions.
+
+## When to Use
+
+- **Paradigm is `disease_vs_healthy` or `temporal_longitudinal`** — composition is a DAG step.
+- **User asks "which cell types change in proportion?"**
+- **Multi-condition experiment with biological replicates.**
+
+## CRITICAL: Why Not Simple Proportion Tests?
+
+Cell-type counts from scRNA-seq are **compositional data** — proportions sum to 1 per sample. If one cell type increases, others MUST decrease proportionally, even if unchanged biologically. Naive per-type tests (Wilcoxon, Fisher, Poisson regression) produce **systematic false positives** due to this induced negative correlation. [BP-1, BP-2 Ch. 18]
+
+Example: if disease doubles cell type A but leaves B and C unchanged, sequencing a fixed sample of 600 cells makes B and C APPEAR to decrease. scCODA correctly identifies only A as changed; naive tests flag B and C as false positives.
+
+## Two Approaches
+
+### 1. scCODA — Labeled clusters (DEFAULT)
+
+Use when cell types are well-defined and you want to know which types change.
+
+```python
+from scagent.tools.composition import run_sccoda
+
+result = run_sccoda(
+    adata,
+    condition_key="condition",
+    sample_key="donor",
+    cell_type_key="cell_type",
+    reference_cell_type="automatic",  # or specify a stable type
+    fdr=0.05,
+    plot_dir="plots/composition",
+)
+```
+
+**Key decisions:**
+- **Reference cell type:** Must be one believed unchanged. Use `"automatic"` if unsure. [BP-2 Ch. 18]
+- **FDR threshold:** Default 0.05. Loosen to 0.2 for exploratory analysis. [BP-2 Ch. 18]
+- **NUTS acceptance rate:** Should be 0.4–0.9. Outside range → suspect sampling issues.
+
+### 2. Milo — KNN-based (no predefined labels)
+
+Use when clusters are unclear, transitional states exist, or you want sub-cell-type resolution.
+
+```python
+from scagent.tools.composition import run_milo
+
+result = run_milo(
+    adata,
+    condition_key="condition",
+    sample_key="donor",
+    cell_type_key="cell_type",  # optional, for annotation
+    plot_dir="plots/composition",
+)
+```
+
+**Key decisions:**
+- **n_neighbors:** Must be ≥ 3 × n_samples for adequate power. [BP-2 Ch. 18]
+- **Batch correction first:** If conditions and batches are confounded, integrate with scVI before Milo.
+- **Continuous covariates:** Milo supports `design="~ timepoint"` for time-course DA testing.
+
+## Guard Rails
+
+1. **Never use naive proportion tests for cross-condition comparisons.** Always use scCODA or Milo.
+2. **scCODA requires ≥2 samples per condition.** Warn if <3.
+3. **Milo requires sufficient neighbours.** Check n_neighbors ≥ 3 × n_samples.
+4. **Composition ≠ DE.** Composition finds proportion changes; DE finds gene expression changes. Both should be checked — they answer different questions.
+5. **If conditions and batch are confounded, warn the user** and recommend scVI integration before Milo. [BP-2 Ch. 18]
+
+## Best-Practice References
+
+- [BP-1] §"Deciphering changes in cell composition" (p. 555)
+- [BP-2] Ch. 18 — Compositional analysis (scCODA + Milo tutorial)
+- Büttner et al. 2021, Nat Commun — scCODA: MCC 0.64 vs ~0.20 naive
@@ -0,0 +1,41 @@
+---
+name: immune-repertoire
+description: Analyze TCR/BCR immune receptor data from 10x Chromium V(D)J. Use when the paradigm is immune_repertoire or when V(D)J data is available alongside gene expression.
+---
+
+# Skill: Immune Repertoire Analysis
+
+Analyze adaptive immune receptor (TCR/BCR) repertoires from 10x Chromium V(D)J data using Scirpy.
+
+## When to Use
+
+- **Paradigm is `immune_repertoire`** — the DAG includes VDJ loading and clonotype analysis.
+- **User has 10x Chromium V(D)J data** (filtered_contig_annotations.csv).
+- **Interest in clonotype diversity, clonal expansion, or repertoire overlap.**
+
+## Workflow
+
+### Step 1: Load V(D)J Data
+
+```python
+from scagent.tools.repertoire import load_vdj
+result = load_vdj(adata, vdj_path="filtered_contig_annotations.csv")
+```
+
+### Step 2: Clonotype Analysis
+
+```python
+from scagent.tools.repertoire import run_clonotype_analysis
+result = run_clonotype_analysis(adata, sequence="aa")
+```
+
+## Guard Rails
+
+1. **Check barcode matching** between GEX and VDJ — warn if <10% cells have IR data.
+2. **Clonotype definition** depends on sequence type (aa vs nt) and receptor arms.
+3. **Diversity metrics** (Shannon, Simpson) require sufficient clonotype counts per group.
+
+## Best-Practice References
+
+- [BP-1] §"Adaptive immune receptor repertoires" (pp. 559-560)
+- [BP-2] Ch. 38-39 (Scirpy tutorials)
@@ -0,0 +1,53 @@
+---
+name: multimodal
+description: Analyze CITE-seq (RNA + surface protein) data from 10x Chromium. Use when the paradigm is multimodal or when ADT / antibody-derived tag data is available.
+---
+
+# Skill: Multimodal (CITE-seq) Analysis
+
+Analyze joint RNA + surface protein data from 10x Chromium CITE-seq experiments.
+
+## When to Use
+
+- **Paradigm is `multimodal`** — the DAG includes protein loading, normalization, and WNN.
+- **User has CITE-seq data** with Antibody-Derived Tag (ADT) counts.
+- **Interest in surface protein markers, joint RNA+protein clustering, or protein-informed annotation.**
+
+## Workflow
+
+### Step 1: Load Protein Data
+
+```python
+from scagent.tools.multimodal import load_protein
+result = load_protein(adata)  # auto-detects from Cell Ranger multi output
+```
+
+### Step 2: Normalize Protein (CLR)
+
+```python
+from scagent.tools.multimodal import normalize_protein
+result = normalize_protein(adata, method="clr")
+```
+
+### Step 3: WNN (Joint Graph)
+
+```python
+from scagent.tools.multimodal import run_wnn
+result = run_wnn(adata, n_neighbors=20, rna_weight=0.5)
+```
+
+### Step 4: Cluster + Annotate on WNN graph
+
+Use standard Leiden clustering on the WNN graph, then annotate using both RNA markers and protein markers.
+
+## Guard Rails
+
+1. **Check for isotype controls** in the ADT panel — flag if missing. [BP-2 Ch. 32]
+2. **CLR normalization** is the default and most robust. DSB requires empty droplet data.
+3. **WNN weight** between RNA and protein should be tuned (default 0.5). More informative modality gets higher weight.
+4. **Protein panel size** affects interpretation — small panels (<50 antibodies) limit protein-only analysis.
+
+## Best-Practice References
+
+- [BP-1] §"Surface protein expression" (pp. 558-559)
+- [BP-2] Ch. 32-37 (CITE-seq QC, CLR, WNN, annotation)
@@ -0,0 +1,63 @@
+---
+name: perturbation-screen
+description: Analyze Perturb-seq / CROP-seq CRISPR screen data. Use when the paradigm is perturbation_screen or when data includes CRISPR guide assignments.
+---
+
+# Skill: Perturbation Screen Analysis
+
+Analyze single-cell CRISPR perturbation screens (Perturb-seq, CROP-seq) on 10x Chromium.
+
+## When to Use
+
+- **Paradigm is `perturbation_screen`** — the DAG includes guide assignment and perturbation DE.
+- **User has CRISPR guide/feature barcode data** alongside gene expression.
+- **User asks about perturbation effects, gene knockdowns, or screen results.**
+
+## Workflow
+
+### Step 1: Guide Assignment
+
+Assign CRISPR guide identities to cells from Cell Ranger multi output.
+
+```python
+from scagent.tools.perturbation import assign_guides
+
+result = assign_guides(adata, guide_calls_key="guide_ids")
+# Adds: adata.obs["guide"], adata.obs["perturbation"], adata.obs["n_guides"]
+```
+
+### Step 2: Perturbation DE
+
+Compare each perturbation to non-targeting controls.
+
+```python
+from scagent.tools.perturbation import run_perturbation_de
+
+results = run_perturbation_de(
+    adata,
+    control_label="non-targeting",
+    min_cells=50,
+    alpha=0.05,
+)
+```
+
+### Step 3: Pathway Enrichment (reuse existing tool)
+
+Run GSEA on per-perturbation DE results to identify affected pathways.
+
+## Key Difference from Cross-Condition DE
+
+In Perturb-seq, **cells ARE the replicates** — each cell receives an independent guide. Cell-level Wilcoxon tests are acceptable here, unlike cross-condition DE where pseudobulk is mandatory. [BP-1, BP-2 Ch. 20]
+
+## Guard Rails
+
+1. **Check guide assignment rate.** Warn if <50% of cells have guides.
+2. **Minimum cells per perturbation: 50.** Skip perturbations below this threshold.
+3. **Non-targeting controls must be present** and have sufficient cells.
+4. **Perturbation ≠ complete knockout.** Knockdown efficiency varies — effect sizes may be attenuated.
+5. **Multi-guide cells** should be flagged and excluded from per-perturbation analysis.
+
+## Best-Practice References
+
+- [BP-1] §"Inferring perturbation effects" (p. 557)
+- [BP-2] Ch. 20 — Perturbation modeling
@@ -0,0 +1,79 @@
+---
+name: temporal-analysis
+description: Analyze time-series / longitudinal scRNA-seq experiments. Use when the paradigm is temporal_longitudinal, or when data includes multiple timepoints from the same biological process.
+---
+
+# Skill: Temporal / Longitudinal Analysis
+
+Analyze scRNA-seq data with multiple timepoints (e.g., disease progression, drug response over time, developmental time courses).
+
+## When to Use
+
+- **Paradigm is `temporal_longitudinal`** — the DAG requires this workflow.
+- **User has samples from multiple timepoints** (Day 0, Day 7, Day 14, etc.).
+- **Interest in how cell composition or gene expression changes over time.**
+
+## Key Principle: Timepoints = Batches
+
+Different timepoints are almost always processed in different batches. **Batch correction is mandatory** before any cross-timepoint analysis. The DAG enforces this — batch correction is required, not optional.
+
+## Workflow
+
+### 1. Standard Preprocessing (with mandatory batch correction)
+
+The DAG runs the standard prefix (QC → normalize → HVG → PCA) then **always** applies batch correction (Harmony or scVI). This is not optional because timepoint-batch confounding is expected. [BP-1 pp. 552-553]
+
+### 2. Pseudobulk DE (Timepoint Contrasts)
+
+Pairwise comparison between timepoints using the same pseudobulk DE pipeline as `disease_vs_healthy`.
+
+```python
+from scagent.tools.pseudobulk_de import run_pseudobulk_de
+
+# Treat timepoint as the condition
+results = run_pseudobulk_de(
+    adata,
+    cell_type_key="cell_type",
+    sample_key="donor",
+    condition_key="timepoint",  # e.g., "Day0" vs "Day7"
+)
+```
+
+**Critical:** Pseudobulk aggregation by sample is mandatory. Cell-level tests on temporal data inflate FDR even more than in standard cross-condition DE because of correlated time effects within samples. [BP-1 p. 555]
+
+### 3. Composition Analysis (Proportion Trends)
+
+How do cell type proportions change over time?
+
+- **scCODA** for discrete timepoint comparisons (labeled clusters)
+- **Milo** for continuous time covariates — Milo supports GLM testing with continuous covariates, directly applicable to time-course DA. [BP-2 Ch. 18]
+
+```python
+# Milo with continuous timepoint (via pertpy)
+# design="~ timepoint" where timepoint is numeric
+```
+
+### 4. Pathway Enrichment (Optional)
+
+Run GSEA on DE results to identify pathways that change over time.
+
+## Guard Rails
+
+1. **Batch correction is always required.** The DAG enforces this. Do not skip it.
+2. **Pseudobulk aggregation is mandatory.** Never do cell-level DE for cross-timepoint comparisons.
+3. **Minimum replicates per timepoint.** At least 2 biological replicates per timepoint for valid DE. Warn if <3.
+4. **Cell cycle effects should be assessed.** Time-course data often has varying cell cycle composition. Consider regressing out cell cycle effects before DE. [BP-1]
+5. **If only 2 timepoints, treat as `disease_vs_healthy`.** Temporal analysis adds value primarily with ≥3 timepoints where trends can be assessed.
+
+## Interpretation Guidance
+
+- **Composition changes ≠ gene expression changes.** A cell type can change proportion without changing its expression profile, and vice versa. Both should be checked.
+- **Pseudobulk DE between consecutive timepoints** may miss gradual trends. Consider regression on time if ≥3 timepoints.
+- **Batch effects can mimic temporal effects.** If each timepoint was processed in a separate batch, some "temporal" signals may be batch artifacts. Check with PCA on pseudobulk samples. [BP-1]
+
+## Best-Practice References
+
+- [BP-1] §"Removing confounding sources of variation" (pp. 552-553)
+- [BP-1] §"Differential gene expression analysis" (p. 555)
+- [BP-2] Ch. 18 §"Without labeled clusters" — Milo time-course example
+- [BP-3] QC/normalization per-sample before integration