vectorstores/duckdb: add DuckDB vectorstore implementation#1482
vectorstores/duckdb: add DuckDB vectorstore implementation#1482
Conversation
41c2b21 to
c2eb59e
Compare
|
@krus210 I'm very conflicted on how to react to this. On the one end is great to get somebody else interested in this. On the other, I feel completely ignored for weeks by the maintainers (#1465), and when I just gave up paying attention to this, I get entirely sidestepped. Since nobody was reacting to my issue (either here or on Discord) I have finished and implemented this feature, fully, in my own project. Maybe I should have done like you @krus210 and just open a PR, ignoring the project guidelines that called for reaching out to the team before coming up with PRs. |
|
Hey @detro, thanks a lot for sharing this and for all the work you’ve done on this feature. I’m really sorry for stepping on your toes here. I clearly didn’t dig deep enough into the contributing guidelines and existing discussions before opening this PR, and I understand how frustrating it must feel to be ignored for weeks and then see a parallel implementation appear out of nowhere. If the maintainers consider this PR a duplicate or feel it undermines the effort you’ve already put in, I’m totally fine with closing or withdrawing it. My goal wasn’t to bypass you or the process, just to contribute something useful to the project. If you’re open to it, I’d actually be happy to align this work with what you’ve already built, or adapt this PR based on your approach/ideas so that your earlier work is properly reflected and credited. Again, apologies for the misstep here and for not being more careful with the guidelines and prior discussion. |
|
Thank you @krus210 - I think it would be a shame now to drop the PR. I'd rather collaborate. And I can start by providing my own review, and maybe touch point on how I solved or wired certain things. I should also share the code publicly, so I can show instead of just telling. Out of curiosity: did you get any reply or feedback from the maintainers? Because that would be a great achievement in per se 😄 |
|
Unfortunately I haven't been able to take a look. So I hope maintainers can react and review this PR. This feature is needed. If I can contribute anything at later date, I will. |
Fixes #1465
Description
This PR adds a new
VectorStoreimplementation backed by DuckDB with the vss extension for vector similarity search.DuckDB is an embedded analytical database that requires no external server, making it ideal for local RAG pipelines and single-binary deployments. The
vssextension provides HNSW-based vector indexing with support for cosine, L2, and inner product distance metrics.Key features
vectorstores.VectorStoreinterface implementation (AddDocuments,SimilaritySearch,Search)cosine,l2sq,ip)cmetadatacolumn*sql.DBconnection (WithDB)New files
vectorstores/duckdb/duckdb.go— core Store implementationvectorstores/duckdb/options.go— functional options and defaultsvectorstores/duckdb/doc.go— package documentationvectorstores/duckdb/duckdb_test.go— comprehensive test suitevectorstores/duckdb/testdata/*.httprr.gz— recorded HTTP responses for reproducible testsexamples/duckdb-vectorstore-example/— usage example with OpenAI embeddingsSource of new concepts
The implementation follows existing vectorstore patterns in the repository (particularly
pgvectorandchroma) for consistency. DuckDB-specific concepts:Tests
The test suite covers the following scenarios using
httprrrecorded fixtures:TestDuckDBStoreBasic— add documents and basic similarity searchTestDuckDBStoreWithScoreThreshold— score threshold filteringTestDuckDBStoreSimilarityScore— verify similarity score valuesTestSimilaritySearchWithInvalidScoreThreshold— error on invalid thresholdsTestDuckDBAsRetriever— integration with RetrievalQA chainTestDuckDBAsRetrieverWithScoreThreshold— retriever with score thresholdTestDuckDBAsRetrieverWithMetadataFilters— retriever with metadata filtersTestDeduplicater— document deduplicationTestWithAllOptions— all configuration options exercisedPR Checklist
memory: add interfaces for X, Yorutil: add whizzbang helpers).Fixes #123).golangci-lintchecks.