Python: fix(python/redis): handle redisvl 0.5 API change in process_results and guard vector buffer decode#13899
Conversation
…nd guard vector buffer decode Two bugs prevent FT.SEARCH from working in the Python Redis connector: 1. redisvl 0.5.0 changed the signature of process_results() to accept an IndexSchema instead of StorageType. Detect the active API at import time via inspect.signature and pass the appropriate argument. When IndexSchema is required, construct a minimal schema from the collection name and storage type so no search-path code needs to know the redisvl version. 2. RedisHashsetCollection._deserialize_store_models_to_dicts calls buffer_to_array() unconditionally for every vector field. When include_vectors=False (the default for search), the vector field is absent from the result dict and a KeyError is raised. Add a guard so vector decoding is skipped for absent fields. Fixes microsoft#13896 Signed-off-by: Doondi-Ashlesh <doondiashlesh@gmail.com>
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 89%
✓ Correctness
This PR adds backward-compatible support for the redisvl API change where
process_resultsswitched its third parameter fromStorageTypetoIndexSchema. It uses runtime introspection to detect which API is available and constructs a minimalIndexSchemawhen needed. It also adds a guard for missing vector fields during deserialization. The changes are logically correct for the current usage patterns. The minimal IndexSchema (with emptyfields) would cause a KeyError ifnormalize_vector_distance=Truewere ever passed to VectorQuery, but the SK code never sets that flag, so this is not a current bug — just a latent fragility.
✓ Security Reliability
This diff introduces two changes: (1) a module-level compatibility shim that uses inspect.signature to detect whether the redisvl process_results function accepts a schema parameter (new API) or a StorageType (old API), and branches accordingly at search time; (2) a defensive guard in RedisHashsetCollection._deserialize_store_models_to_dicts that checks if a vector field exists in the record before attempting buffer_to_array deserialization, preventing a KeyError when vectors are excluded from search results (include_vectors=False). Both changes are straightforward reliability improvements with no security concerns. The try/except ImportError at module level correctly handles older redisvl versions that lack IndexSchema, and the IndexSchema.from_dict call uses only trusted internal values (collection_name and storage type enum value).
✗ Test Coverage
This PR makes two defensive changes to the Redis connector: (1) a compatibility shim for the redisvl
process_resultsAPI change (new versions require anIndexSchemainstead ofStorageType), and (2) a guard againstKeyErrorwhen vector fields are absent from search results (e.g., wheninclude_vectors=False). Neither change has any corresponding unit tests. The existing test file (test_redis_store.py) has no tests for_inner_searchor for_deserialize_store_models_to_dictswith missing vector fields. Both behaviors are easily unit-testable with mocks already present in the test file.
✗ Design Approach
The diff has two changes. The vector-field presence guard (change 2) is a reasonable defensive fix. The first change — using runtime
inspect.signatureto detect whetherprocess_resultsaccepts aschemaorStorageTypeargument — is a fragile, symptom-level workaround for an API-breaking change in redisvl. The project already pinsredisvl ~= 0.4in pyproject.toml, so the right fix is to either tighten the version constraint to the sub-version that introduced the new signature, or migrate to the new API exclusively and drop the old branch. Additionally, thetry/except ImportErrorguard aroundIndexSchemais redundant:StorageTypeis already imported from the sameredisvl.schemamodule unconditionally at line 24, so if redisvl is importable,IndexSchemawill always be importable too, making the fallback path unreachable.
Flagged Issues
- Runtime signature introspection (
inspect.signature(process_results).parameters) is used to branch between two incompatibleprocess_resultscall sites. This couples SK permanently to detecting a transient redisvl API break instead of expressing the version requirement explicitly. The proper fix is to tighten theredisvl ~= 0.4pin to the sub-version whereprocess_resultsgained theschemaparameter (and use only that API path), or cap the upper bound and keep only the old path. Signature sniffing will silently do the wrong thing if redisvl's signature changes again.
Suggestions
- Add unit tests for both new code paths: (1) a test for
_deserialize_store_models_to_dictswith a record missing the vector field key, verifying noKeyErroris raised; (2) a parametrized test for_inner_searchthat patches_PROCESS_RESULTS_USES_SCHEMAtoTrue/Falseand asserts the correctprocess_resultscall signature is used.
Automated review by Doondi-Ashlesh's agents
…tests Address review feedback on the earlier compatibility shim approach: - Drop the inspect.signature dual-path. redisvl ~= 0.4 allowed 0.5.x per PEP 440, but the right fix is to express the requirement explicitly. Update pyproject.toml to redisvl >= 0.5 and use only the IndexSchema API path in _inner_search. The redundant try/except ImportError guard is removed since StorageType is already imported from the same module. - Add test_deserialize_hashset_skips_missing_vector_field: calls _deserialize_store_models_to_dicts with a record that has no vector key (as returned by search with include_vectors=False) and asserts no KeyError is raised and other fields are intact. - Add test_inner_search_passes_index_schema_to_process_results: parametrized over hashset and json, patches process_results and AsyncSearch.search, calls _inner_search with a direct vector, and asserts that the third argument to process_results is an IndexSchema with the correct storage_type - not a bare StorageType enum. Fixes microsoft#13896 Signed-off-by: Doondi-Ashlesh <doondiashlesh@gmail.com>
Summary
Fixes #13896
Two bugs in the Python Redis connector cause
FT.SEARCH(collection.search(...)) to fail completely:Bug 1 - redisvl 0.5.0 broke
process_resultssignatureredisvl0.5.0 changed the third parameter ofprocess_results()fromStorageTypetoIndexSchema. Becausepyproject.tomlpinsredisvl ~= 0.4(which allows 0.5.x per PEP 440), users who upgrade redisvl to 0.5.x get:Fix: detect the active API at import time via
inspect.signature. WhenIndexSchemais required, construct a minimal schema from the collection name and storage type before callingprocess_results.Bug 2 - KeyError on missing vector field during deserialization
RedisHashsetCollection._deserialize_store_models_to_dictscallsbuffer_to_array()for every vector field unconditionally. Wheninclude_vectors=False(the default for search), the vector field is absent from the result dict, raising:Fix: skip the vector buffer decode when the field is absent from the record.
Test plan
cd python && make test TEST_FILTERS="connectors/memory/test_redis"(no new failures)create_collection,upsert,searchwithinclude_vectors=False(default) completes without errorinclude_vectors=Truestill decodes vectors correctlySigned-off-by: Doondi-Ashlesh doondiashlesh@gmail.com