Skip to content

vectorstores/duckdb: add DuckDB vectorstore implementation#1482

Open
krus210 wants to merge 1 commit intotmc:mainfrom
krus210:feat/duckdb-vectorstore
Open

vectorstores/duckdb: add DuckDB vectorstore implementation#1482
krus210 wants to merge 1 commit intotmc:mainfrom
krus210:feat/duckdb-vectorstore

Conversation

@krus210
Copy link
Copy Markdown

@krus210 krus210 commented Mar 1, 2026

Fixes #1465

Description

This PR adds a new VectorStore implementation backed by DuckDB with the vss extension for vector similarity search.

DuckDB is an embedded analytical database that requires no external server, making it ideal for local RAG pipelines and single-binary deployments. The vss extension provides HNSW-based vector indexing with support for cosine, L2, and inner product distance metrics.

Key features

  • Full vectorstores.VectorStore interface implementation (AddDocuments, SimilaritySearch, Search)
  • In-memory and file-based persistent storage via DuckDB connection URL
  • HNSW index with configurable distance metrics (cosine, l2sq, ip)
  • Metadata filtering using JSON extraction on the cmetadata column
  • Score threshold filtering for similarity search results
  • Document deduplication support
  • Collection-based document organization (consistent with pgvector/chroma patterns)
  • Option to pass a pre-existing *sql.DB connection (WithDB)
  • Configurable table names, collection names, and vector dimensions

New files

  • vectorstores/duckdb/duckdb.go — core Store implementation
  • vectorstores/duckdb/options.go — functional options and defaults
  • vectorstores/duckdb/doc.go — package documentation
  • vectorstores/duckdb/duckdb_test.go — comprehensive test suite
  • vectorstores/duckdb/testdata/*.httprr.gz — recorded HTTP responses for reproducible tests
  • examples/duckdb-vectorstore-example/ — usage example with OpenAI embeddings

Source of new concepts

The implementation follows existing vectorstore patterns in the repository (particularly pgvector and chroma) for consistency. DuckDB-specific concepts:

Tests

The test suite covers the following scenarios using httprr recorded fixtures:

  • TestDuckDBStoreBasic — add documents and basic similarity search
  • TestDuckDBStoreWithScoreThreshold — score threshold filtering
  • TestDuckDBStoreSimilarityScore — verify similarity score values
  • TestSimilaritySearchWithInvalidScoreThreshold — error on invalid thresholds
  • TestDuckDBAsRetriever — integration with RetrievalQA chain
  • TestDuckDBAsRetrieverWithScoreThreshold — retriever with score threshold
  • TestDuckDBAsRetrieverWithMetadataFilters — retriever with metadata filters
  • TestDeduplicater — document deduplication
  • TestWithAllOptions — all configuration options exercised

PR Checklist

  • Read the Contributing documentation.
  • Read the Code of conduct documentation.
  • Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
  • Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
  • Describes the source of new concepts.
  • References existing implementations as appropriate.
  • Contains test coverage for new functions.
  • Passes all golangci-lint checks.

@detro
Copy link
Copy Markdown

detro commented Mar 13, 2026

@krus210 I'm very conflicted on how to react to this.

On the one end is great to get somebody else interested in this. On the other, I feel completely ignored for weeks by the maintainers (#1465), and when I just gave up paying attention to this, I get entirely sidestepped.

Since nobody was reacting to my issue (either here or on Discord) I have finished and implemented this feature, fully, in my own project.

Maybe I should have done like you @krus210 and just open a PR, ignoring the project guidelines that called for reaching out to the team before coming up with PRs.

@krus210
Copy link
Copy Markdown
Author

krus210 commented Mar 14, 2026

Hey @detro, thanks a lot for sharing this and for all the work you’ve done on this feature.

I’m really sorry for stepping on your toes here. I clearly didn’t dig deep enough into the contributing guidelines and existing discussions before opening this PR, and I understand how frustrating it must feel to be ignored for weeks and then see a parallel implementation appear out of nowhere.

If the maintainers consider this PR a duplicate or feel it undermines the effort you’ve already put in, I’m totally fine with closing or withdrawing it. My goal wasn’t to bypass you or the process, just to contribute something useful to the project.

If you’re open to it, I’d actually be happy to align this work with what you’ve already built, or adapt this PR based on your approach/ideas so that your earlier work is properly reflected and credited.

Again, apologies for the misstep here and for not being more careful with the guidelines and prior discussion.

@detro
Copy link
Copy Markdown

detro commented Mar 16, 2026

Thank you @krus210 - I think it would be a shame now to drop the PR.

I'd rather collaborate. And I can start by providing my own review, and maybe touch point on how I solved or wired certain things.

I should also share the code publicly, so I can show instead of just telling.

Out of curiosity: did you get any reply or feedback from the maintainers? Because that would be a great achievement in per se 😄

@detro
Copy link
Copy Markdown

detro commented Mar 27, 2026

Unfortunately I haven't been able to take a look. So I hope maintainers can react and review this PR. This feature is needed.

If I can contribute anything at later date, I will.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vectorstores: DuckDB implementation

2 participants