Skip to content

feat(collection): add get_embeddings_by_id method#70

Open
lpbirch wants to merge 7 commits intoAlayaDB-AI:mainfrom
lpbirch:feat/langchain-mmr-async
Open

feat(collection): add get_embeddings_by_id method#70
lpbirch wants to merge 7 commits intoAlayaDB-AI:mainfrom
lpbirch:feat/langchain-mmr-async

Conversation

@lpbirch
Copy link
Copy Markdown

@lpbirch lpbirch commented Feb 10, 2026


Description

This PR adds and validates the get_embeddings_by_id API in the Collection class, enabling retrieval of stored embeddings by a list of IDs while preserving input order.
It also updates and fixes the corresponding unit tests to fully comply with the project’s linting and CI requirements.

This change is part of the groundwork for upcoming MMR and async VectorStore support.


Related Issue

N/A


Type of Change

  • New feature (non-breaking change that adds functionality)
  • Bug fix (non-breaking change that fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Performance improvement
  • Code refactoring
  • Documentation update
  • CI/CD changes

Changes Made

  • Added get_embeddings_by_id method to Collection

    • Supports batch retrieval by ID
    • Preserves the order of input IDs
    • Raises KeyError when any requested ID does not exist
  • Added and updated unit tests for get_embeddings_by_id

  • Fixed test module docstring to satisfy pylint (C0114)

  • Ensured code passes ruff, pylint, and existing CI checks


Testing

The changes were tested locally using the project’s standard test and lint workflow.

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
make lint
make test

Checklist

  • My code follows the project's coding style
  • I have run make lint and fixed any issues
  • I have added tests that prove my fix/feature works
  • All new and existing tests pass (make test)
  • I have updated documentation if needed
  • My commits follow the conventional commits format

Screenshots (if applicable)

N/A


Additional Notes

This PR focuses on establishing a reliable low-level embedding retrieval API, which will be reused by higher-level features such as Maximal Marginal Relevance (MMR) search and async VectorStore integration in subsequent PRs.


@lpbirch lpbirch changed the title Feat/langchain mmr async feat/langchain_mmr_async Feb 10, 2026
@lpbirch lpbirch changed the title feat/langchain_mmr_async feat(collection): add get_embeddings_by_id method Feb 10, 2026
@lpbirch lpbirch force-pushed the feat/langchain-mmr-async branch from 76cd8fc to e6c3d6e Compare February 10, 2026 08:02
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
python/src/alayalite/collection.py 71.42% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new low-level API on Collection to fetch embeddings by external IDs (preserving input order) and introduces unit tests to validate correct ordering and missing-ID behavior, as groundwork for future MMR/VectorStore features.

Changes:

  • Added Collection.get_embeddings_by_id(ids) to retrieve stored vectors by ID in input order and raise on missing IDs.
  • Added a new test module covering successful roundtrip retrieval and missing-ID error handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
python/src/alayalite/collection.py Implements get_embeddings_by_id on Collection.
python/tests/test_collection_get_embedding_by_id.py Adds tests for order preservation and missing-ID error behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/src/alayalite/collection.py Outdated
Comment on lines +299 to +302
# this function is mainly for MMR purpose,add by liangpeiran
def get_embeddings_by_id(self, ids: List[str]) -> List[List[float]]:
"""
Retrieve embeddings (vectors) from the underlying index by external ids.
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above get_embeddings_by_id includes a non-ASCII comma and personal attribution ("add by ..."). This is likely to be noisy in a public API module and can be removed or replaced with a neutral rationale in the docstring (e.g., "Used by MMR retrieval").

Suggested change
# this function is mainly for MMR purpose,add by liangpeiran
def get_embeddings_by_id(self, ids: List[str]) -> List[List[float]]:
"""
Retrieve embeddings (vectors) from the underlying index by external ids.
def get_embeddings_by_id(self, ids: List[str]) -> List[List[float]]:
"""
Retrieve embeddings (vectors) from the underlying index by external ids.
This method is primarily used by MMR-based retrieval algorithms.

Copilot uses AI. Check for mistakes.
Comment thread python/src/alayalite/collection.py Outdated
if not ids:
return []

_assert(self.__index_py is not None, "Index is not initialized yet")
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this method raises RuntimeError when the index is not initialized, but the _assert(...) helper raises ValueError (see common._assert). Either update the docstring (and any public API docs) to reflect ValueError, or raise RuntimeError explicitly here to match the documented contract.

Suggested change
_assert(self.__index_py is not None, "Index is not initialized yet")
if self.__index_py is None:
raise RuntimeError("Index is not initialized yet")

Copilot uses AI. Check for mistakes.
Comment thread python/src/alayalite/collection.py Outdated
for outer_id in ids:
inner_id = self.__outer_inner_map[outer_id]
vec = self.__index_py.get_data_by_id(inner_id)
vectors.append(np.asarray(vec, dtype=np.float32).tolist())
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unconditionally converts stored vectors to float32, which can silently change precision (e.g., float64 -> float32) and ignores the index's configured dtype (self.__index_py.get_dtype()). Consider returning the native dtype (or casting to self.__index_py.get_dtype()), or clearly documenting that the API always returns float32.

Suggested change
vectors.append(np.asarray(vec, dtype=np.float32).tolist())
dtype = None
if hasattr(self.__index_py, "get_dtype"):
index_dtype = self.__index_py.get_dtype()
if index_dtype is not None:
dtype = index_dtype
if dtype is not None:
arr = np.asarray(vec, dtype=dtype)
else:
arr = np.asarray(vec)
vectors.append(arr.tolist())

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +22
from alayalite.collection import Collection

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests in python/tests import Collection from the package root (from alayalite import Collection). Using the same import style here would keep test code consistent and avoids relying on internal module paths.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +17
"""
Tests for Collection.get_embeddings_by_id.
"""
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test module filename is test_collection_get_embedding_by_id.py (singular) while the API under test is get_embeddings_by_id (plural). Renaming the file to match the method name would make it easier to discover/grep related tests.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants