Standalone LangGraph pipeline that ingests a curriculum document (URL — PDF or plain text), diagnoses its knowledge architecture, derives a KUD (Know / Understand / Do) map with assessment routes, and generates learning targets aligned to Claude Education Skills MCP tools.
Downstream half of a two-tool architecture; the upstream Constitution Tool (not in this repo) ingests narrative prose and produces the structured curriculum document or exam specification that this harness consumes.
Repository rename pending. The project was renamed from
curriculum-decomposer→Curriculum Harness(package:curriculum_harness). The GitHub repo URL still readscurriculum-decomposer— a manual rename via GitHub settings is outstanding.
cd curriculum-decomposer
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# set ANTHROPIC_API_KEY in .envpython -m curriculum_harness.run --config configs/ontario_grade7_history_v1_0.json --dry-run
python -m curriculum_harness.run --config configs/ontario_grade7_history_v1_0.json
python -m curriculum_harness.run --config configs/ontario_grade7_history_v1_0.json --resumeOutputs are written under the configured outputPath (versioned *_v1.* files). Checkpoints use Sqlite (checkpointDb).
Phase 1 classifies the document into a curriculum_profile (JSON + run report section). The model infers document_family, level_model, scoping_strategy, and assessment_signals; output_conventions includes lt_statement_format and recommended_adjacent_radius (product default ±1 when using adjacent bands).
- Resolved format:
resolve_lt_statement_format(profile)picks the active format after merge: ifoutput_conventions.lt_statement_formatis set toi_can,outcome_statement, orcompetency_descriptor, it wins; otherwisehigher_ed_syllabus→outcome_statement, and all other families (includingexam_specification) default toi_can. - Phase 4 branches MCP + Sonnet prompts and validation on that format. Each emitted
LearningTargetcarrieslt_statement_formatfor downstream storage (e.g. Supabase). - Exam specifications (
exam_specification): an AQA-style command-word / mark-tariff awareness block is appended to Phase 4 instructions. Output format remainsi_canunless you overridelt_statement_formatin config. - Higher education (
higher_ed_syllabus): after the KUD loop, a supplementary Sonnet pass adds up to three inferred Type 3 dispositions (assessment_route: observation_protocol, flagHE_DISPOSITION_INFERRED). Near-duplicates vs existing targets are skipped using cosine similarity on token-frequency vectors (threshold 0.92 in code). - Phase 5 uses the profile default and per-LT
lt_statement_formatwhen generating structured rows and anchor-level fallbacks. - Run report lists the count (and truncated statements) of
HE_DISPOSITION_INFERREDlearning targets.
ltConstraints.iCanFormat in a run config does not override the profile resolver; use curriculumProfile.output_conventions.lt_statement_format to force a format.
Overrides
source.documentFamily— if set, overrides the inferred family. Values:exam_specification,national_framework,school_scoped_programme,higher_ed_syllabus,other.curriculumProfile— optional object merged into the profile (e.g.output_conventions,scoping_strategy,assessment_signals).
REAL School — use document_family: "school_scoped_programme" (or leave inference) and set source.jurisdiction (e.g. REAL School / location).
Adjacent levels — prefer outputStructure.adjacent_count: 1 for ±1 bands; you can still set 2 for wider spreads.
- Haiku: Phase 1 ingestion / parsing volume.
- Sonnet (
claude-sonnet-4-20250514): Phases 2–4, including MCP tool calls against the configured remote MCP server.