Vector SNOMED grounding mapping: hybrid retrieval update + live T22 gold validation

## Summary
This issue tracks the vector-SNOMED grounding mapping update for Table 22 gold evaluation: why we changed it, what was implemented, and which validation runs are currently active.

## Idea / motivation
- Baseline lexical grounding is robust but can miss semantically close SNOMED candidates when wording diverges.
- We added vector-assisted retrieval to improve candidate recall while preserving lexical precision through reranking.
- Goal: compare `vector` vs `non-vector` grounding directly against gold SNOMED IDs from Table 22, without concept/rule extraction confounds.

## Implemented change
### 1) Grounding-only eval path (gold SNOMED comparison)
- Added dedicated evaluator:
  - `src/cardio_graph_core/evaluation/table22_snomed_grounding_only_eval.py`
- Reads gold SNOMED concepts from `table_22_manual_snomed.json` and measures exact-ID hit accuracy.
- Supports both modes:
  - `--mode non-vector`
  - `--mode vector`
- Writes per-item predictions and aggregate metrics JSON.

### 2) Separate Slurm jobs for clean A/B comparison
- Non-vector launcher:
  - `slurm/run_table22_snomed_grounding_only_nonvector.sh`
- Vector launcher:
  - `slurm/run_table22_snomed_grounding_only_vector.sh`
- Both run on `g3:11433` (model and embedding endpoints aligned to avoid prior endpoint drift).

### 3) Vector retriever warning fix
- Updated vector retriever to avoid Neo4j warning-prone direct property access in Cypher:
  - `src/cardio_graph_core/extraction/vector_candidate_retriever.py`
- Query now returns `node, score`; concept-id/term fallbacks resolved in Python.
- This removes repeated `UnknownPropertyKeyWarning` noise and keeps fallback behavior explicit.

## Currently running tests (live)
- `625300` — `t22_gold_vec` — RUNNING
- `625298` — `t22_gold_nonvec` — RUNNING

Current log signals:
- Vector log (`slurm/run_table22_snomed_grounding_only_vector.log`)
  - vector grounding enabled against `snomed_term_embeddings_4096`
  - connected to SNOMED DB
  - ongoing candidate truncation messages (expected while iterating terms)
- Non-vector log (`slurm/run_table22_snomed_grounding_only_nonvector_625298.log`)
  - connected to SNOMED DB
  - ongoing candidate truncation messages (expected)

Expected output artifacts on completion:
- Vector:
  - `docs/table22_snomed_grounding_compare/grounding_only/vector_job_625300/vector_eval.json`
- Non-vector:
  - `docs/table22_snomed_grounding_compare/grounding_only/nonvector_job_625298/nonvector_eval.json`

## Next update
I will post final accuracy deltas (`vector` minus `non-vector`) and improved/regressed examples once both runs finish.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector SNOMED grounding mapping: hybrid retrieval update + live T22 gold validation #47

Summary

Idea / motivation

Implemented change

1) Grounding-only eval path (gold SNOMED comparison)

2) Separate Slurm jobs for clean A/B comparison

3) Vector retriever warning fix

Currently running tests (live)

Next update

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vector SNOMED grounding mapping: hybrid retrieval update + live T22 gold validation #47

Description

Summary

Idea / motivation

Implemented change

1) Grounding-only eval path (gold SNOMED comparison)

2) Separate Slurm jobs for clean A/B comparison

3) Vector retriever warning fix

Currently running tests (live)

Next update

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions