Skip to content

Latest commit

 

History

History
73 lines (54 loc) · 2.81 KB

File metadata and controls

73 lines (54 loc) · 2.81 KB

TODO: ann base branch + consolidated benchmarks

1. Create ann branch with shared code

1.1 Branch setup

  • git checkout -B ann origin/main
  • Cherry-pick 624f998 (vec0_distance_full shared distance dispatch)
  • Cherry-pick stdint.h fix for test header
  • Pull NEON cosine optimization from ivf-yolo3 into shared code
    • Currently only in ivf branch but is general-purpose (benefits all distance calcs)
    • Lives in distance_cosine_float() — ~57 lines of ARM NEON vectorized cosine

1.2 Benchmark infrastructure (benchmarks-ann/)

  • Seed data pipeline (seed/Makefile, seed/build_base_db.py)
  • Ground truth generator (ground_truth.py)
  • Results schema (schema.sql)
  • Benchmark runner with INDEX_REGISTRY extension point (bench.py)
    • Baseline configs (float, int8-rescore, bit-rescore) implemented
    • Index branches register their types via INDEX_REGISTRY dict
  • Makefile with baseline targets
  • README

1.3 Rebase feature branches onto ann

  • Rebase diskann-yolo2 onto ann (1 commit: DiskANN implementation)
  • Rebase ivf-yolo3 onto ann (1 commit: IVF implementation)
  • Rebase annoy-yolo2 onto ann (2 commits: Annoy implementation + schema fix)
  • Verify each branch has only its index-specific commits remaining
  • Force-push all 4 branches to origin

2. Per-branch: register index type in benchmarks

Each index branch should add to benchmarks-ann/ when rebased onto ann:

2.1 Register in bench.py

Add an INDEX_REGISTRY entry. Each entry provides:

  • defaults — default param values
  • create_table_sql(params) — CREATE VIRTUAL TABLE with INDEXED BY clause
  • insert_sql(params) — custom insert SQL, or None for default
  • post_insert_hook(conn, params) — training/building step, returns time
  • run_query(conn, params, query, k) — custom query, or None for default MATCH
  • describe(params) — one-line description for report output

2.2 Add configs to Makefile

Append index-specific config variables and targets. Example pattern:

DISKANN_CONFIGS = \
    "diskann-R48-binary:type=diskann,R=48,L=128,quantizer=binary" \
    ...

ALL_CONFIGS += $(DISKANN_CONFIGS)

bench-diskann: seed
    $(BENCH) --subset-size 10000 -k 10 -o runs/diskann $(BASELINES) $(DISKANN_CONFIGS)
    ...

2.3 Migrate existing benchmark results/docs

  • Move useful results docs (RESULTS.md, etc.) into benchmarks-ann/results/
  • Delete redundant per-branch benchmark directories once consolidated infra is proven

3. Future improvements

  • Reporting script (report.py) — query results.db, produce markdown comparison tables
  • Profiling targets in Makefile (lift from ivf-yolo3's Instruments/perf wrappers)
  • Pre-computed ground truth integration (use GT DB files instead of on-the-fly brute-force)