feat(collection): add get_embeddings_by_id method by lpbirch · Pull Request #70 · AlayaDB-AI/AlayaLite

lpbirch · 2026-02-10T07:29:58Z

Description

This PR adds and validates the get_embeddings_by_id API in the Collection class, enabling retrieval of stored embeddings by a list of IDs while preserving input order.
It also updates and fixes the corresponding unit tests to fully comply with the project’s linting and CI requirements.

This change is part of the groundwork for upcoming MMR and async VectorStore support.

Related Issue

N/A

Type of Change

New feature (non-breaking change that adds functionality)
Bug fix (non-breaking change that fixes an issue)
Breaking change (fix or feature that would cause existing functionality to change)
Performance improvement
Code refactoring
Documentation update
CI/CD changes

Changes Made

Added get_embeddings_by_id method to Collection
- Supports batch retrieval by ID
- Preserves the order of input IDs
- Raises KeyError when any requested ID does not exist
Added and updated unit tests for get_embeddings_by_id
Fixed test module docstring to satisfy pylint (C0114)
Ensured code passes ruff, pylint, and existing CI checks

Testing

The changes were tested locally using the project’s standard test and lint workflow.

Unit tests added/updated
Integration tests added/updated
Manual testing performed

make lint
make test

Checklist

My code follows the project's coding style
I have run make lint and fixed any issues
I have added tests that prove my fix/feature works
All new and existing tests pass (make test)
I have updated documentation if needed
My commits follow the conventional commits format

Screenshots (if applicable)

N/A

Additional Notes

This PR focuses on establishing a reliable low-level embedding retrieval API, which will be reused by higher-level features such as Maximal Marginal Relevance (MMR) search and async VectorStore integration in subsequent PRs.

codecov · 2026-02-10T08:18:17Z

Codecov Report

❌ Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
python/src/alayalite/collection.py	71.42%	2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds a new low-level API on Collection to fetch embeddings by external IDs (preserving input order) and introduces unit tests to validate correct ordering and missing-ID behavior, as groundwork for future MMR/VectorStore features.

Changes:

Added Collection.get_embeddings_by_id(ids) to retrieve stored vectors by ID in input order and raise on missing IDs.
Added a new test module covering successful roundtrip retrieval and missing-ID error handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
python/src/alayalite/collection.py	Implements `get_embeddings_by_id` on `Collection`.
python/tests/test_collection_get_embedding_by_id.py	Adds tests for order preservation and missing-ID error behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T08:53:10Z

+    # this function is mainly for MMR purpose，add by liangpeiran
+    def get_embeddings_by_id(self, ids: List[str]) -> List[List[float]]:
+        """
+        Retrieve embeddings (vectors) from the underlying index by external ids.


The comment above get_embeddings_by_id includes a non-ASCII comma and personal attribution ("add by ..."). This is likely to be noisy in a public API module and can be removed or replaced with a neutral rationale in the docstring (e.g., "Used by MMR retrieval").

Suggested change

# this function is mainly for MMR purpose，add by liangpeiran

def get_embeddings_by_id(self, ids: List[str]) -> List[List[float]]:

"""

Retrieve embeddings (vectors) from the underlying index by external ids.

def get_embeddings_by_id(self, ids: List[str]) -> List[List[float]]:

"""

Retrieve embeddings (vectors) from the underlying index by external ids.

This method is primarily used by MMR-based retrieval algorithms.

Copilot · 2026-02-10T08:53:10Z

+        if not ids:
+            return []
+
+        _assert(self.__index_py is not None, "Index is not initialized yet")


The docstring says this method raises RuntimeError when the index is not initialized, but the _assert(...) helper raises ValueError (see common._assert). Either update the docstring (and any public API docs) to reflect ValueError, or raise RuntimeError explicitly here to match the documented contract.

Suggested change

_assert(self.__index_py is not None, "Index is not initialized yet")

if self.__index_py is None:

raise RuntimeError("Index is not initialized yet")

Copilot · 2026-02-10T08:53:11Z

+        for outer_id in ids:
+            inner_id = self.__outer_inner_map[outer_id]
+            vec = self.__index_py.get_data_by_id(inner_id)
+            vectors.append(np.asarray(vec, dtype=np.float32).tolist())


This unconditionally converts stored vectors to float32, which can silently change precision (e.g., float64 -> float32) and ignores the index's configured dtype (self.__index_py.get_dtype()). Consider returning the native dtype (or casting to self.__index_py.get_dtype()), or clearly documenting that the API always returns float32.

Suggested change

vectors.append(np.asarray(vec, dtype=np.float32).tolist())

dtype = None

if hasattr(self.__index_py, "get_dtype"):

index_dtype = self.__index_py.get_dtype()

if index_dtype is not None:

dtype = index_dtype

if dtype is not None:

arr = np.asarray(vec, dtype=dtype)

else:

arr = np.asarray(vec)

vectors.append(arr.tolist())

Copilot · 2026-02-10T08:53:11Z

+from alayalite.collection import Collection
+


Other tests in python/tests import Collection from the package root (from alayalite import Collection). Using the same import style here would keep test code consistent and avoids relying on internal module paths.

Copilot · 2026-02-10T08:53:11Z

+"""
+Tests for Collection.get_embeddings_by_id.
+"""


The test module filename is test_collection_get_embedding_by_id.py (singular) while the API under test is get_embeddings_by_id (plural). Renaming the file to match the method name would make it easier to discover/grep related tests.

lpbirch changed the title ~~Feat/langchain mmr async~~ feat/langchain_mmr_async Feb 10, 2026

lpbirch changed the title ~~feat/langchain_mmr_async~~ feat(collection): add get_embeddings_by_id method Feb 10, 2026

lpbirch force-pushed the feat/langchain-mmr-async branch from 76cd8fc to e6c3d6e Compare February 10, 2026 08:02

lpbirch and others added 4 commits February 10, 2026 16:23

feat(collection): add get_embeddings_by_id method

5ad7f19

test: remove redundant comment in embedding test

37aefe9

test: add docstring for embeddings-by-id test

279dc58

chore: update tests metadata

c6c2a74

lpbirch force-pushed the feat/langchain-mmr-async branch from e6c3d6e to c6c2a74 Compare February 10, 2026 08:26

huanglune requested review from ScissorJack-ever and Copilot February 10, 2026 08:47

Copilot started reviewing on behalf of huanglune February 10, 2026 08:48 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

lpbirch added 3 commits February 10, 2026 17:43

feat(collection): align get_embeddings_by_id error and dtype behavior

c6ab908

feat(collection): align get_embeddings_by_id error and dtype behavior

90fc024

feat(collection): align get_embeddings_by_id error and dtype behavior

9f2438d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(collection): add get_embeddings_by_id method#70

feat(collection): add get_embeddings_by_id method#70
lpbirch wants to merge 7 commits intoAlayaDB-AI:mainfrom
lpbirch:feat/langchain-mmr-async

lpbirch commented Feb 10, 2026

Uh oh!

codecov bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_assert(self.__index_py is not None, "Index is not initialized yet")
	if self.__index_py is None:
	raise RuntimeError("Index is not initialized yet")

-            vectors.append(np.asarray(vec, dtype=np.float32).tolist())
+            dtype = None
+            if hasattr(self.__index_py, "get_dtype"):
+                index_dtype = self.__index_py.get_dtype()
+                if index_dtype is not None:
+                    dtype = index_dtype
+            if dtype is not None:
+                arr = np.asarray(vec, dtype=dtype)
+            else:
+                arr = np.asarray(vec)
+            vectors.append(arr.tolist())

Conversation

lpbirch commented Feb 10, 2026

Description

Related Issue

Type of Change

Changes Made

Testing

Checklist

Screenshots (if applicable)

Additional Notes

Uh oh!

codecov bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 10, 2026 •

edited

Loading