Skip to content

HACKE-RC/sx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

searxh

Small BM25 search tool for local code/docs. No third-party dependencies.

what it does

  • Indexes text/code files into SQLite (bm25.sqlite by default)
  • Supports incremental indexing (only changed files are reprocessed)
  • Ranks with BM25
  • Shows optional snippets with line numbers
  • Supports | alternation in queries (sx "ACLLoad|ACLSetUser|load")
  • Supports path/extension filters, JSON output, and colored matches

requirements

  • Python 3.9+

install

Editable install:

python3 -m pip install -e .

After install, use:

searxh index .
searxh "replication backlog"
sx index .
sx "replication backlog"

With uv:

uv tool install searxh
searxh "replication backlog"
sx "replication backlog"

Upgrade with uv:

uv tool upgrade searxh

quick start

Index first (required before search):

sx index .

Check status:

sx status

Search:

sx "replication backlog"

Reindex options:

sx index .          # incremental update
sx index . --full   # full rebuild

alternation (pipe search)

Use | to search for multiple terms at once — each alternative is tokenized and matched against the index:

sx "ACLLoad|ACLSetUser|ACLParse|load"
sx "ACLLoad|ACLSetUser" src/acl.c       # with path filter
sx --ext .c,.h "dict|hash|set"

command forms

sx [global-options] index [root] [index-options]
sx [global-options] status
sx [global-options] search "query"
sx [global-options] "query"              # BM25 ranked search
sx [global-options] "query" path         # BM25 search scoped to path

common examples

Full rebuild:

sx index . --full

Search with snippet + color:

sx --snippet --color "aof fsync"

Filter by path:

sx --path src/ "replication"

Filter by extension:

sx --ext .c,.h,.md "dict"

JSON output:

sx --json "cluster slots"

Custom index path:

sx index . --out /path/to/myindex.sqlite
sx --index /path/to/myindex.sqlite "term"

key options

  • --k: number of results (default 10)
  • --k1, --b: BM25 tuning knobs
  • --path-boost: extra weight for path token matches (default 1.5)
  • --stem: enable simple stemming
  • --no-stopwords: disable stopword filtering
  • --workers: indexing worker threads
  • --no-progress: hide indexing progress output

indexing behavior

  1. Scan files by extension/name and skip likely binary files.
  2. Compare mtime and size with index metadata.
  3. Reindex changed files and remove deleted files.
  4. Update postings and document metadata in SQLite.

If a file produces no tokens (for example, empty/whitespace-only), it is saved as a zero-length doc so incremental runs do not keep retrying it.

tests

PYTHONPATH=src python3 -m unittest discover -s tests -p 'test_*.py' -q

troubleshooting

sqlite3.OperationalError: unable to open database file

  • Make sure the DB parent directory exists and is writable.
  • Try writing the DB in the current project first:
sx index . --out ./bm25.sqlite

Results look weak

  • Rebuild with --full
  • Try --stem
  • Increase --k
  • Recheck filters (--ext, --path)

files

  • src/sx_search/cli.py: CLI
  • src/sx_search/engine.py: indexing/search engine
  • src/bm25tool.py: compatibility import wrapper
  • tests/test_bm25tool.py: tests
  • docs/search.md: short usage notes

About

Extremely fast indexed keyword search for agents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages