Conversation
📝 WalkthroughWalkthroughAdds a SAM2-based segmentation processor and associated event type, thread-safe model loading helpers, an example CLI demonstrating agent-based segmentation, and a comprehensive test suite for segmentation/detection integration and edge cases. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant SegProc as TransformersSegmentationProcessor
participant Track as QueuedVideoTrack
participant Detector as DetectionProcessor / EventPublisher
participant Model as SAM2 Model
participant Events as EventManager
Agent->>SegProc: process_video(track, shared_forwarder?)
SegProc->>Detector: subscribe to DetectionCompletedEvent
SegProc->>Track: attach _process_frame handler
loop per frame
Track->>SegProc: _process_frame(frame)
SegProc->>SegProc: check resources & _latest_detections
alt detections present
SegProc->>Model: run inference on ThreadPool (_segment)
Model-->>SegProc: masks, iou_scores
SegProc->>SegProc: post-process masks, build SegmentedObject list
alt annotate enabled
SegProc->>SegProc: annotate image (async thread)
end
SegProc->>Events: emit SegmentationCompletedEvent(objects, masks, metadata)
SegProc->>Track: append annotated/original frame
else no detections
SegProc->>Track: forward original frame
end
end
Agent->>SegProc: close()
SegProc->>SegProc: shutdown executor, unload model, clear caches
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (4)
plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py (2)
164-167: Hoist thetransformersimport to the module top.Per the coding guidelines, avoid function-local imports. The
transformersdependency is already handled by the optional-dependencytry/except ImportErrorinvision_agents/plugins/huggingface/__init__.py, so nothing is gained by importingAutoModelForMaskGeneration/AutoProcessorlazily inside_load_model_sync.♻️ Proposed change
import torch from PIL import Image +from transformers import AutoModelForMaskGeneration, AutoProcessor from vision_agents.core import Agent @@ - def _load_model_sync(self) -> SegmentationResources: - from transformers import AutoModelForMaskGeneration, AutoProcessor - - device = self._resolve_device() + def _load_model_sync(self) -> SegmentationResources: + device = self._resolve_device()As per coding guidelines: "Do not use local imports; import at the top of the module".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py` around lines 164 - 167, Move the transformers imports out of the function and into the module top: remove the local import of AutoModelForMaskGeneration and AutoProcessor inside _load_model_sync and instead add "from transformers import AutoModelForMaskGeneration, AutoProcessor" at the top of the file; keep using the same symbols in the _load_model_sync method (and ensure any optional-dependency try/except logic remains in vision_agents/plugins/huggingface/__init__.py so no additional runtime checks are needed).
309-319:masks=[m for m in masks]can just belist(masks).
maskshere is already the(N, H, W)numpy array returned from_segment, and iterating it yields the per-object 2D mask arrays. The comprehension is equivalent tolist(masks)but heavier-looking.♻️ Proposed tweak
- masks=[m for m in masks], + masks=list(masks),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py` around lines 309 - 319, The code creates a list of masks using a comprehension masks=[m for m in masks], which is redundant and less clear; replace that comprehension with list(masks) when constructing the SegmentationCompletedEvent in the call to SegmentationCompletedEvent (reference: variable masks, event sender self.events.send, and class SegmentationCompletedEvent) so the masks are converted to a list more succinctly and clearly (the upstream _segment return remains unchanged).plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py (1)
142-160: Consider also guardingAutoImageProcessor.from_pretrainedwith the same lock.The comment on
_model_load_lockwarns that concurrentfrom_pretrainedcalls are not thread-safe. You serialize the model load on lines 146-148 butAutoImageProcessor.from_pretrained(...)on line 152 is still outside the lock. In practice image-processor loading is lightweight, but if two detection/segmentation processors warm up in parallel threads it's still a (small) concurrentfrom_pretrainedcall that could hit the same underlying hub/cache code paths.♻️ Optional tightening
- device = self._resolve_device() - with _model_load_lock: - model = AutoModelForObjectDetection.from_pretrained(self.model_id) - model = model.to(device) - - model.eval() - - image_processor = AutoImageProcessor.from_pretrained(self.model_id) + device = self._resolve_device() + with _model_load_lock: + model = AutoModelForObjectDetection.from_pretrained(self.model_id) + model = model.to(device) + image_processor = AutoImageProcessor.from_pretrained(self.model_id) + + model.eval()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py` around lines 142 - 160, In _load_model_sync move the AutoImageProcessor.from_pretrained call inside the same _model_load_lock used for AutoModelForObjectDetection so both from_pretrained calls are serialized; specifically, acquire _model_load_lock around the model load (AutoModelForObjectDetection.from_pretrained) and the image processor load (AutoImageProcessor.from_pretrained) in the _load_model_sync method so the hub/cache code paths are not invoked concurrently, then release the lock before calling model.eval and returning DetectionResources (retain using first_param.device/first_param.dtype).plugins/huggingface/vision_agents/plugins/huggingface/events.py (1)
70-88: Prefer a concrete mask type overlist[Any].The docstring already commits to "numpy arrays... of shape
(H, W)", and the processor emits exactly that (seetransformers_segmentation.pylines 373, 391). Tightening the annotation tolist[np.ndarray]gives callers proper type info and aligns with the coding guideline to avoidAnyin annotations.♻️ Proposed refactor
+import numpy as np from dataclasses import dataclass, field from typing import Any, Optional, TypedDict @@ - objects: list[SegmentedObject] = field(default_factory=list) - masks: list[Any] = field(default_factory=list) + objects: list[SegmentedObject] = field(default_factory=list) + masks: list[np.ndarray] = field(default_factory=list)As per coding guidelines: "Avoid using
Anytype in type annotations".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/events.py` around lines 70 - 88, The masks field in SegmentationCompletedEvent is annotated as list[Any] but the docstring and the producer (transformers_segmentation) emit numpy arrays of shape (H, W); change the annotation on SegmentationCompletedEvent.masks from list[Any] to list[np.ndarray] (import numpy as np if necessary) so callers get concrete typing and update any related type imports/usages to avoid Any while preserving default_factory=list and runtime behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 59-194: Move all unit tests from TestSegmentWithMockedModel and
TestSegmentationAnnotation into the TestTransformersSegmentationProcessor test
class so all unit tests for TransformersSegmentationProcessor live in one place;
specifically relocate methods _make_resources,
test_segment_returns_objects_and_masks,
test_segment_passes_correct_boxes_to_processor,
test_segment_extracts_sizes_before_device_move,
test_segment_casts_float_inputs_to_model_dtype, test_segment_empty_detections,
test_set_detections_updates_latest, and test_annotate_changes_image into
TestTransformersSegmentationProcessor (you can use nested classes or method name
prefixes to separate "segment" vs "annotation" groups), remove the now-empty
TestSegmentWithMockedModel and TestSegmentationAnnotation classes, and ensure
the class-level `@pytest.mark.integration` is removed and any integration tests
keep `@pytest.mark.integration` at the method level only.
- Line 7: Replace all uses of unittest.mock.MagicMock in this module (notably
the agent_mock fixture, the _make_resources helper, and tests in
TestSegmentWithMockedModel and TestSegmentationAnnotation) with either: (A) a
small deterministic fake class that mimics the SAM2 model/processor API by
exposing attributes and methods .pred_masks, .iou_scores, and
post_process_masks(...) returning correctly shaped arrays/values matching SAM2
expectations; or (B) convert those tests to `@pytest.mark.integration` and run the
real tiny HF/SAM2 model as the existing integration test does. Update places
referencing agent_mock and _make_resources to instantiate the fake object (or
call the real model) so tests no longer import or use MagicMock.
- Around line 268-295: The test creates a QueuedVideoTrack but never injects a
real frame, so original_frame is the default placeholder; fix by adding a real
VideoFrame to input_track before calling segmentation.process_video (follow the
existing cat_video_track pattern or synthesize a VideoFrame and push it into
QueuedVideoTrack via its add/enqueue method), then call await input_track.recv()
to capture that real frame and continue the existing assertions (references:
test_no_detections_passes_through, QueuedVideoTrack, input_track.recv(),
segmentation.process_video, output_track.recv()).
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 106-120: The constructor currently validates mask_opacity but not
fps; add a one-line guard in __init__ (in transformers_segmentation.py) to
validate fps (e.g. if not isinstance(fps, int) or fps <= 0: raise
ValueError("fps must be a positive integer")) before assigning self.fps so
invalid values can't flow into QueuedVideoTrack(fps=self.fps,
max_queue_size=self.fps) or VideoForwarder(..., fps=self.fps).
- Around line 331-391: The _segment function assumes SAM/SAM2 output shapes and
currently indexes outputs.pred_masks, masks_tensor, and outputs.iou_scores as if
they are (batch, num_masks, H, W) and (batch, num_masks) respectively; add a
defensive shape check after running the model (inspect outputs.pred_masks.shape
and outputs.iou_scores.shape and masks_tensor.shape) to assert they match the
expected 4D/2D layouts (e.g., batch_size x num_masks x H x W and batch_size x
num_masks) and raise a clear error mentioning _segment and the model class if an
unexpected extra dimension like point_batch_size is present (suggested message:
unsupported model output layout for EdgeTAM/perceiver-based models); update the
docstring for _segment to state explicitly which model output layouts are
supported (SAM/SAM2/SAM-HQ) so callers know the compatibility.
- Around line 237-243: attach_agent defines an on_detection closure via
self._events.subscribe that captures self but is never unsubscribed, so store
the returned handler reference (e.g., self._on_detection_handler) when
subscribing in attach_agent and then unsubscribe it in close by calling
self._events.unsubscribe(self._on_detection_handler) (ensure you check for None
and that self._events is set), before setting self._closed; reference symbols:
attach_agent, on_detection, self._on_detection_handler, close,
self._events.unsubscribe.
---
Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/events.py`:
- Around line 70-88: The masks field in SegmentationCompletedEvent is annotated
as list[Any] but the docstring and the producer (transformers_segmentation) emit
numpy arrays of shape (H, W); change the annotation on
SegmentationCompletedEvent.masks from list[Any] to list[np.ndarray] (import
numpy as np if necessary) so callers get concrete typing and update any related
type imports/usages to avoid Any while preserving default_factory=list and
runtime behavior.
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py`:
- Around line 142-160: In _load_model_sync move the
AutoImageProcessor.from_pretrained call inside the same _model_load_lock used
for AutoModelForObjectDetection so both from_pretrained calls are serialized;
specifically, acquire _model_load_lock around the model load
(AutoModelForObjectDetection.from_pretrained) and the image processor load
(AutoImageProcessor.from_pretrained) in the _load_model_sync method so the
hub/cache code paths are not invoked concurrently, then release the lock before
calling model.eval and returning DetectionResources (retain using
first_param.device/first_param.dtype).
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 164-167: Move the transformers imports out of the function and
into the module top: remove the local import of AutoModelForMaskGeneration and
AutoProcessor inside _load_model_sync and instead add "from transformers import
AutoModelForMaskGeneration, AutoProcessor" at the top of the file; keep using
the same symbols in the _load_model_sync method (and ensure any
optional-dependency try/except logic remains in
vision_agents/plugins/huggingface/__init__.py so no additional runtime checks
are needed).
- Around line 309-319: The code creates a list of masks using a comprehension
masks=[m for m in masks], which is redundant and less clear; replace that
comprehension with list(masks) when constructing the SegmentationCompletedEvent
in the call to SegmentationCompletedEvent (reference: variable masks, event
sender self.events.send, and class SegmentationCompletedEvent) so the masks are
converted to a list more succinctly and clearly (the upstream _segment return
remains unchanged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ece20194-20e8-4ed6-877c-feebe480a5e9
📒 Files selected for processing (7)
plugins/huggingface/examples/transformers/transformers_segmentation_example.pyplugins/huggingface/tests/test_transformers_segmentation.pyplugins/huggingface/vision_agents/plugins/huggingface/__init__.pyplugins/huggingface/vision_agents/plugins/huggingface/events.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py
| import io | ||
| import os | ||
| import pathlib | ||
| from unittest.mock import MagicMock |
There was a problem hiding this comment.
unittest.mock.MagicMock usage contradicts the repo's testing rules.
Every MagicMock(...) in this module (the agent_mock fixture, the _make_resources helper, and all tests inside TestSegmentWithMockedModel / TestSegmentationAnnotation that rely on mocked HF model/processor return values) violates the testing guideline. Since SAM2 output shapes are quite specific, the cleanest replacement is a hand-rolled fake class exposing .pred_masks, .iou_scores, .post_process_masks(...) — no MagicMock, just small deterministic Python objects — or moving these cases under @pytest.mark.integration and running the real tiny model as the integration block already does.
As per coding guidelines: "Framework: pytest. Never mock." and "Never mock in tests; use pytest for testing".
Also applies to: 47-50, 59-163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/tests/test_transformers_segmentation.py` at line 7,
Replace all uses of unittest.mock.MagicMock in this module (notably the
agent_mock fixture, the _make_resources helper, and tests in
TestSegmentWithMockedModel and TestSegmentationAnnotation) with either: (A) a
small deterministic fake class that mimics the SAM2 model/processor API by
exposing attributes and methods .pred_masks, .iou_scores, and
post_process_masks(...) returning correctly shaped arrays/values matching SAM2
expectations; or (B) convert those tests to `@pytest.mark.integration` and run the
real tiny HF/SAM2 model as the existing integration test does. Update places
referencing agent_mock and _make_resources to instantiate the fake object (or
call the real model) so tests no longer import or use MagicMock.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| class TestSegmentWithMockedModel: | ||
| """Test _segment() with mocked SAM model and processor.""" | ||
|
|
||
| def _make_resources( | ||
| self, num_objects: int = 2, img_h: int = 480, img_w: int = 640 | ||
| ) -> SegmentationResources: | ||
| model = MagicMock() | ||
| pred_masks = torch.randn(1, num_objects, 1, 256, 256) | ||
| iou_scores = torch.tensor([[[0.85], [0.92]]])[:, :num_objects, :] | ||
| model.return_value = MagicMock(pred_masks=pred_masks, iou_scores=iou_scores) | ||
|
|
||
| processor = MagicMock() | ||
| processor.return_value = { | ||
| "pixel_values": torch.zeros(1, 3, 1024, 1024), | ||
| "original_sizes": torch.tensor([[img_h, img_w]]), | ||
| "reshaped_input_sizes": torch.tensor([[1024, 1024]]), | ||
| "input_boxes": torch.zeros(1, num_objects, 4), | ||
| } | ||
| binary_masks = torch.ones(num_objects, 1, img_h, img_w, dtype=torch.bool) | ||
| processor.post_process_masks.return_value = [binary_masks] | ||
|
|
||
| return SegmentationResources( | ||
| model=model, | ||
| processor=processor, | ||
| device=torch.device("cpu"), | ||
| dtype=torch.float32, | ||
| ) | ||
|
|
||
| def test_segment_returns_objects_and_masks(self): | ||
| resources = self._make_resources(num_objects=2) | ||
| processor = TransformersSegmentationProcessor() | ||
| processor.on_warmed_up(resources) | ||
|
|
||
| image = np.zeros((480, 640, 3), dtype=np.uint8) | ||
| objects, masks = processor._segment(image, FAKE_DETECTIONS) | ||
|
|
||
| assert len(objects) == 2 | ||
| assert objects[0]["label"] == "cat" | ||
| assert objects[1]["label"] == "dog" | ||
| assert objects[0]["confidence"] == 0.85 | ||
| assert objects[1]["confidence"] == 0.92 | ||
| assert objects[0]["mask_area"] == 480 * 640 | ||
| assert masks.shape == (2, 480, 640) | ||
|
|
||
| def test_segment_passes_correct_boxes_to_processor(self): | ||
| resources = self._make_resources(num_objects=2) | ||
| processor = TransformersSegmentationProcessor() | ||
| processor.on_warmed_up(resources) | ||
|
|
||
| image = np.zeros((480, 640, 3), dtype=np.uint8) | ||
| processor._segment(image, FAKE_DETECTIONS) | ||
|
|
||
| call_kwargs = resources.processor.call_args | ||
| input_boxes = call_kwargs.kwargs.get( | ||
| "input_boxes", call_kwargs.args[0] if call_kwargs.args else None | ||
| ) | ||
| assert input_boxes == [[[10, 20, 100, 200], [150, 60, 300, 250]]] | ||
|
|
||
| def test_segment_extracts_sizes_before_device_move(self): | ||
| resources = self._make_resources(num_objects=1) | ||
| processor = TransformersSegmentationProcessor() | ||
| processor.on_warmed_up(resources) | ||
|
|
||
| image = np.zeros((480, 640, 3), dtype=np.uint8) | ||
| processor._segment(image, FAKE_DETECTIONS[:1]) | ||
|
|
||
| post_process_call = resources.processor.post_process_masks.call_args | ||
| original_sizes = post_process_call.args[1] | ||
| assert isinstance(original_sizes, torch.Tensor) | ||
| assert original_sizes.device == torch.device("cpu") | ||
|
|
||
| def test_segment_casts_float_inputs_to_model_dtype(self): | ||
| resources = self._make_resources(num_objects=1) | ||
| resources.dtype = torch.float16 | ||
| processor = TransformersSegmentationProcessor() | ||
| processor.on_warmed_up(resources) | ||
|
|
||
| image = np.zeros((480, 640, 3), dtype=np.uint8) | ||
| processor._segment(image, FAKE_DETECTIONS[:1]) | ||
|
|
||
| call_kwargs = resources.model.call_args | ||
| pixel_values = call_kwargs.kwargs.get( | ||
| "pixel_values", call_kwargs.args[0] if call_kwargs.args else None | ||
| ) | ||
| if pixel_values is not None and isinstance(pixel_values, torch.Tensor): | ||
| assert pixel_values.dtype == torch.float16 | ||
|
|
||
| def test_segment_empty_detections(self): | ||
| resources = self._make_resources(num_objects=0) | ||
| processor = TransformersSegmentationProcessor() | ||
| processor.on_warmed_up(resources) | ||
|
|
||
| image = np.zeros((480, 640, 3), dtype=np.uint8) | ||
| objects, masks = processor._segment(image, []) | ||
|
|
||
| assert objects == [] | ||
| resources.model.assert_not_called() | ||
|
|
||
| def test_set_detections_updates_latest(self): | ||
| processor = TransformersSegmentationProcessor() | ||
| assert processor._latest_detections == [] | ||
|
|
||
| processor.set_detections(FAKE_DETECTIONS) | ||
| assert len(processor._latest_detections) == 2 | ||
| assert processor._latest_detections[0]["label"] == "cat" | ||
|
|
||
|
|
||
| class TestSegmentationAnnotation: | ||
| """Test that annotation draws masks on the image.""" | ||
|
|
||
| def test_annotate_changes_image(self): | ||
| processor = TransformersSegmentationProcessor(mask_opacity=0.5) | ||
|
|
||
| image = np.full((100, 100, 3), 128, dtype=np.uint8) | ||
| objects = [ | ||
| FAKE_DETECTIONS[0], | ||
| ] | ||
| masks = np.ones((1, 100, 100), dtype=bool) | ||
|
|
||
| annotated = processor._annotate(image, objects, masks) | ||
|
|
||
| assert annotated.shape == image.shape | ||
| assert (annotated != image).any() | ||
|
|
||
|
|
||
| SEGMENTATION_MODEL_ID = os.getenv( | ||
| "TRANSFORMERS_TEST_SEGMENTATION_MODEL", "facebook/sam2-hiera-tiny" | ||
| ) | ||
| DETECTION_MODEL_ID = os.getenv( | ||
| "TRANSFORMERS_TEST_DETECTION_MODEL", "PekingU/rtdetr_v2_r18vd" | ||
| ) | ||
|
|
||
|
|
||
| @pytest.mark.integration | ||
| @skip_blockbuster | ||
| class TestTransformersSegmentationProcessor: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Consolidate unit tests under TestTransformersSegmentationProcessor.
Unit tests for TransformersSegmentationProcessor are currently split across TestSegmentWithMockedModel, TestSegmentationAnnotation, and the integration-marked TestTransformersSegmentationProcessor. The guideline asks for a single class per subject under test. Suggest merging the unit cases into TestTransformersSegmentationProcessor (using nested classes or method naming to distinguish segment / annotation / integration groups) and keeping @pytest.mark.integration at the method level where appropriate instead of at class level.
As per coding guidelines: "Keep unit tests for the class under the same test class. For example, tests for Agent must be inside TestAgent. Do not spread them around different test classes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 59
- 194, Move all unit tests from TestSegmentWithMockedModel and
TestSegmentationAnnotation into the TestTransformersSegmentationProcessor test
class so all unit tests for TransformersSegmentationProcessor live in one place;
specifically relocate methods _make_resources,
test_segment_returns_objects_and_masks,
test_segment_passes_correct_boxes_to_processor,
test_segment_extracts_sizes_before_device_move,
test_segment_casts_float_inputs_to_model_dtype, test_segment_empty_detections,
test_set_detections_updates_latest, and test_annotate_changes_image into
TestTransformersSegmentationProcessor (you can use nested classes or method name
prefixes to separate "segment" vs "annotation" groups), remove the now-empty
TestSegmentWithMockedModel and TestSegmentationAnnotation classes, and ensure
the class-level `@pytest.mark.integration` is removed and any integration tests
keep `@pytest.mark.integration` at the method level only.
| async def test_no_detections_passes_through(self, agent_mock, events_manager): | ||
| """Without detection events, frames pass through unmodified.""" | ||
| segmentation = TransformersSegmentationProcessor( | ||
| model=SEGMENTATION_MODEL_ID, fps=10 | ||
| ) | ||
| await segmentation.warmup() | ||
| segmentation.attach_agent(agent_mock) | ||
|
|
||
| seg_future: asyncio.Future[SegmentationCompletedEvent] = asyncio.Future() | ||
|
|
||
| @events_manager.subscribe | ||
| async def on_seg_event(event: SegmentationCompletedEvent): | ||
| if not seg_future.done(): | ||
| seg_future.set_result(event) | ||
|
|
||
| input_track = QueuedVideoTrack() | ||
| original_frame = await input_track.recv() | ||
| assert isinstance(original_frame, VideoFrame) | ||
|
|
||
| output_track = segmentation.publish_video_track() | ||
| await segmentation.process_video(input_track, "user_id") | ||
|
|
||
| with pytest.raises(asyncio.TimeoutError): | ||
| await asyncio.wait_for(seg_future, 5) | ||
|
|
||
| output_frame = await output_track.recv() | ||
| assert isinstance(output_frame, VideoFrame) | ||
| assert (original_frame.to_ndarray() == output_frame.to_ndarray()).all() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm QueuedVideoTrack.recv() behavior on an empty queue.
fd -t f video_track.py
rg -nP -C10 '(class QueuedVideoTrack|async def recv)' --type=pyRepository: GetStream/Vision-Agents
Length of output: 25353
🏁 Script executed:
cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '63,85p'Repository: GetStream/Vision-Agents
Length of output: 1026
🏁 Script executed:
cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '29,45p'Repository: GetStream/Vision-Agents
Length of output: 750
🏁 Script executed:
cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '29,52p'Repository: GetStream/Vision-Agents
Length of output: 1111
🏁 Script executed:
cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '63,95p'Repository: GetStream/Vision-Agents
Length of output: 1297
🏁 Script executed:
cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '29,38p'Repository: GetStream/Vision-Agents
Length of output: 565
🏁 Script executed:
cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '268,295p'Repository: GetStream/Vision-Agents
Length of output: 1436
🏁 Script executed:
cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '268,300p'Repository: GetStream/Vision-Agents
Length of output: 1529
test_no_detections_passes_through compares empty placeholder frames instead of real video content.
QueuedVideoTrack() on line 283 is created empty. When await input_track.recv() is called on line 284, it will wait for up to 1 second for a frame, then catch the timeout and return self.last_frame—which is initialized as an empty blue placeholder frame, not real input data.
The test does not hang, but original_frame becomes this placeholder rather than actual video content. Comparing this empty frame at line 295 does not test whether frames pass through unmodified. Add frames to input_track before processing (following the pattern from cat_video_track), or synthesize and add a real VideoFrame to the track.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 268
- 295, The test creates a QueuedVideoTrack but never injects a real frame, so
original_frame is the default placeholder; fix by adding a real VideoFrame to
input_track before calling segmentation.process_video (follow the existing
cat_video_track pattern or synthesize a VideoFrame and push it into
QueuedVideoTrack via its add/enqueue method), then call await input_track.recv()
to capture that real frame and continue the existing assertions (references:
test_no_detections_passes_through, QueuedVideoTrack, input_track.recv(),
segmentation.process_video, output_track.recv()).
| def __init__( | ||
| self, | ||
| model: str = "facebook/sam2.1-hiera-tiny", | ||
| fps: int = 5, | ||
| device: DeviceType = "auto", | ||
| torch_dtype: TorchDtypeType = "auto", | ||
| annotate: bool = True, | ||
| mask_opacity: float = 0.35, | ||
| mask_threshold: float = 0.0, | ||
| ): | ||
| if not 0 <= mask_opacity <= 1.0: | ||
| raise ValueError("mask_opacity must be between 0 and 1.") | ||
|
|
||
| self.model_id = model | ||
| self.fps = fps |
There was a problem hiding this comment.
Also validate fps.
mask_opacity is range-checked in the constructor, but fps flows directly into QueuedVideoTrack(fps=self.fps, max_queue_size=self.fps) and VideoForwarder(..., fps=self.fps). A zero or negative fps will silently produce a broken track / zero-sized queue. A one-line guard mirrors the existing input validation style.
🛡️ Proposed guard
if not 0 <= mask_opacity <= 1.0:
raise ValueError("mask_opacity must be between 0 and 1.")
+ if fps <= 0:
+ raise ValueError("fps must be positive.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__( | |
| self, | |
| model: str = "facebook/sam2.1-hiera-tiny", | |
| fps: int = 5, | |
| device: DeviceType = "auto", | |
| torch_dtype: TorchDtypeType = "auto", | |
| annotate: bool = True, | |
| mask_opacity: float = 0.35, | |
| mask_threshold: float = 0.0, | |
| ): | |
| if not 0 <= mask_opacity <= 1.0: | |
| raise ValueError("mask_opacity must be between 0 and 1.") | |
| self.model_id = model | |
| self.fps = fps | |
| def __init__( | |
| self, | |
| model: str = "facebook/sam2.1-hiera-tiny", | |
| fps: int = 5, | |
| device: DeviceType = "auto", | |
| torch_dtype: TorchDtypeType = "auto", | |
| annotate: bool = True, | |
| mask_opacity: float = 0.35, | |
| mask_threshold: float = 0.0, | |
| ): | |
| if not 0 <= mask_opacity <= 1.0: | |
| raise ValueError("mask_opacity must be between 0 and 1.") | |
| if fps <= 0: | |
| raise ValueError("fps must be positive.") | |
| self.model_id = model | |
| self.fps = fps |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 106 - 120, The constructor currently validates mask_opacity but not
fps; add a one-line guard in __init__ (in transformers_segmentation.py) to
validate fps (e.g. if not isinstance(fps, int) or fps <= 0: raise
ValueError("fps must be a positive integer")) before assigning self.fps so
invalid values can't flow into QueuedVideoTrack(fps=self.fps,
max_queue_size=self.fps) or VideoForwarder(..., fps=self.fps).
| def attach_agent(self, agent: Agent) -> None: | ||
| self._events = agent.events | ||
| self._events.register(DetectionCompletedEvent, SegmentationCompletedEvent) | ||
|
|
||
| @self._events.subscribe | ||
| async def on_detection(event: DetectionCompletedEvent) -> None: | ||
| self._latest_detections = list(event.objects) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check what unsubscribe mechanism EventManager offers.
fd -t f events.py --full-path 'agents-core'
rg -nP -C3 '(def subscribe|def unsubscribe|def remove_handler)' --type=py -g '**/events.py' -g '**/event_manager*.py'Repository: GetStream/Vision-Agents
Length of output: 396
🏁 Script executed:
cat -n agents-core/vision_agents/core/agents/events.py | head -100Repository: GetStream/Vision-Agents
Length of output: 2219
🏁 Script executed:
rg -n "class EventManager|def subscribe|def unsubscribe|def remove" agents-core/vision_agents/core/agents/events.py -A 5Repository: GetStream/Vision-Agents
Length of output: 49
🏁 Script executed:
cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py | sed -n '200,300p'Repository: GetStream/Vision-Agents
Length of output: 4480
🏁 Script executed:
rg "class EventManager" --type=py -B 2 -A 15Repository: GetStream/Vision-Agents
Length of output: 1564
🏁 Script executed:
fd -type f "event_manager.py"Repository: GetStream/Vision-Agents
Length of output: 237
🏁 Script executed:
rg "def subscribe|def unsubscribe" --type=py -B 1 -A 10 agents-core/vision_agents/core/agents/events.pyRepository: GetStream/Vision-Agents
Length of output: 49
🏁 Script executed:
cat agents-core/vision_agents/core/events/manager.py | head -200Repository: GetStream/Vision-Agents
Length of output: 7296
🏁 Script executed:
rg "def subscribe|def unsubscribe|def remove_handler|def unregister_handler" agents-core/vision_agents/core/events/manager.py -A 8Repository: GetStream/Vision-Agents
Length of output: 682
🏁 Script executed:
rg "def subscribe|def unsubscribe" agents-core/vision_agents/core/events/manager.py -A 25Repository: GetStream/Vision-Agents
Length of output: 1940
🏁 Script executed:
rg "def subscribe" agents-core/vision_agents/core/events/manager.py -A 40 | head -60Repository: GetStream/Vision-Agents
Length of output: 1712
🏁 Script executed:
cat agents-core/vision_agents/core/events/manager.py | sed -n '290,360p'Repository: GetStream/Vision-Agents
Length of output: 2754
🏁 Script executed:
rg "@self._events.subscribe|@.*\.events\.subscribe" plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py -B 2 -A 5Repository: GetStream/Vision-Agents
Length of output: 1146
🏁 Script executed:
cat agents-core/vision_agents/core/events/manager.py | sed -n '340,380p'Repository: GetStream/Vision-Agents
Length of output: 1802
🏁 Script executed:
cat plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py | sed -n '1,50p'Repository: GetStream/Vision-Agents
Length of output: 1630
on_detection handler should be unsubscribed in close().
The attach_agent method registers an event handler as a local closure that captures self. Since EventManager.subscribe() returns the function, you can store the reference and call self._events.unsubscribe(handler) in close(). Without explicit unsubscription, the handler remains in the event manager's handler list indefinitely, preventing garbage collection of the processor and its resources if the agent outlives this processor instance (e.g., hot-reload or re-attach scenarios).
Although _process_frame includes a _closed guard, this doesn't remove the handler from memory—it only prevents it from processing. Store the handler reference during attach_agent and unsubscribe it in close():
Fix approach
def attach_agent(self, agent: Agent) -> None:
self._events = agent.events
self._events.register(DetectionCompletedEvent, SegmentationCompletedEvent)
`@self._events.subscribe`
async def on_detection(event: DetectionCompletedEvent) -> None:
self._latest_detections = list(event.objects)
self._on_detection_handler = on_detection
async def close(self) -> None:
await self.stop_processing()
if self._on_detection_handler is not None:
self._events.unsubscribe(self._on_detection_handler)
self._closed = True
# ... rest of close() ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 237 - 243, attach_agent defines an on_detection closure via
self._events.subscribe that captures self but is never unsubscribed, so store
the returned handler reference (e.g., self._on_detection_handler) when
subscribing in attach_agent and then unsubscribe it in close by calling
self._events.unsubscribe(self._on_detection_handler) (ensure you check for None
and that self._events is set), before setting self._closed; reference symbols:
attach_agent, on_detection, self._on_detection_handler, close,
self._events.unsubscribe.
| def _segment( | ||
| self, | ||
| image: np.ndarray, | ||
| detections: list[DetectedObject], | ||
| ) -> tuple[list[SegmentedObject], np.ndarray]: | ||
| """Run segmentation on a single frame (called in thread pool).""" | ||
| resources = self._resources | ||
| if resources is None or not detections: | ||
| return [], np.empty(0) | ||
|
|
||
| pil_image = Image.fromarray(image) | ||
| boxes = [[d["x1"], d["y1"], d["x2"], d["y2"]] for d in detections] | ||
|
|
||
| inputs = resources.processor( | ||
| images=pil_image, input_boxes=[boxes], return_tensors="pt" | ||
| ) | ||
|
|
||
| original_sizes = inputs.pop("original_sizes") | ||
| reshaped_input_sizes = inputs.pop("reshaped_input_sizes") | ||
|
|
||
| inputs = { | ||
| k: ( | ||
| v.to(device=resources.device, dtype=resources.dtype) | ||
| if v.is_floating_point() | ||
| else v.to(device=resources.device) | ||
| ) | ||
| if isinstance(v, torch.Tensor) | ||
| else v | ||
| for k, v in inputs.items() | ||
| } | ||
|
|
||
| with torch.no_grad(): | ||
| outputs = resources.model(**inputs, multimask_output=False) | ||
|
|
||
| masks_tensor = resources.processor.post_process_masks( | ||
| outputs.pred_masks.cpu(), | ||
| original_sizes, | ||
| reshaped_input_sizes=reshaped_input_sizes, | ||
| binarize=True, | ||
| mask_threshold=self._mask_threshold, | ||
| ) | ||
|
|
||
| per_object_masks = masks_tensor[0].squeeze(1).numpy() | ||
| iou_scores = outputs.iou_scores[0].squeeze(-1).cpu() | ||
|
|
||
| objects: list[SegmentedObject] = [] | ||
| for i, detection in enumerate(detections): | ||
| mask = per_object_masks[i] | ||
| objects.append( | ||
| SegmentedObject( | ||
| label=detection["label"], | ||
| confidence=round(iou_scores[i].item(), 4), | ||
| x1=detection["x1"], | ||
| y1=detection["y1"], | ||
| x2=detection["x2"], | ||
| y2=detection["y2"], | ||
| mask_area=int(mask.sum()), | ||
| ) | ||
| ) | ||
|
|
||
| return objects, per_object_masks |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Do Hugging Face SAM-HQ and EdgeTAM checkpoints return pred_masks and iou_scores in the same shape as SAM/SAM2 via AutoModelForMaskGeneration?
💡 Result:
No, Hugging Face SAM-HQ and EdgeTAM checkpoints do not return pred_masks and iou_scores in exactly the same shape as SAM/SAM2 via AutoModelForMaskGeneration. SAM/SAM2 (SamModel/Sam2Model): pred_masks shape (batch_size, num_masks, height, width), iou_scores (batch_size, num_masks). For single point prompt, num_masks=3 (or 1 if multimask_output=False), no point_batch_size dimension. SAM-HQ (SamHQModel): Similar usage and code examples (outputs.pred_masks, outputs.iou_scores post-processed identically), but source code shows potential for (batch_size, num_prompts, num_masks, height, width) due to HQ token modifications. Docs/examples treat as compatible. EdgeTAM (EdgeTamModel): pred_masks (batch_size, point_batch_size, num_masks, height, width), iou_scores (batch_size, point_batch_size, num_masks). Includes explicit point_batch_size dimension (e.g., tensor shapes in examples show 3 masks post-processed, but raw includes batching dim), differing from SAM/SAM2 for batched points. All use compatible processors (SamProcessor/Sam2Processor) and post_process_masks, but raw output shapes vary slightly due to architectural differences (SAM-HQ enhancements, EdgeTAM SAM2 adaptation with perceiver). For single prompts, post-processed masks/iou_scores align, but raw pred_masks/iou_scores shapes differ for EdgeTAM.
Citations:
- 1: https://huggingface.co/docs/transformers/model_doc/sam_hq
- 2: https://huggingface.co/docs/transformers/en/model_doc/sam
- 3: https://huggingface.co/docs/transformers/model_doc/edgetam
- 4: https://github.com/huggingface/transformers/blob/main/src/transformers/models/sam_hq/modeling_sam_hq.py
- 5: https://huggingface.co/docs/transformers/v4.57.1/model_doc/sam2
_segment assumes SAM/SAM2 output shapes — not compatible with all advertised models.
The code directly indexes raw outputs — outputs.pred_masks[0], masks_tensor[0].squeeze(1), and outputs.iou_scores[0].squeeze(-1) — which assumes batch-first layout matching SAM/SAM2. However, EdgeTAM outputs pred_masks with shape (batch_size, point_batch_size, num_masks, H, W) and iou_scores with (batch_size, point_batch_size, num_masks) due to its perceiver-based architecture, introducing an extra point_batch_size dimension that breaks the squeeze chain. Since the docstring advertises support for "SAM2, SAM-HQ, EdgeTAM, etc.", either narrow the docstring to SAM/SAM-HQ only or add a defensive shape assertion before indexing/squeezing to clarify supported models and fail clearly on unsupported variants.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 331 - 391, The _segment function assumes SAM/SAM2 output shapes and
currently indexes outputs.pred_masks, masks_tensor, and outputs.iou_scores as if
they are (batch, num_masks, H, W) and (batch, num_masks) respectively; add a
defensive shape check after running the model (inspect outputs.pred_masks.shape
and outputs.iou_scores.shape and masks_tensor.shape) to assert they match the
expected 4D/2D layouts (e.g., batch_size x num_masks x H x W and batch_size x
num_masks) and raise a clear error mentioning _segment and the model class if an
unexpected extra dimension like point_batch_size is present (suggested message:
unsupported model output layout for EdgeTAM/perceiver-based models); update the
docstring for _segment to state explicitly which model output layouts are
supported (SAM/SAM2/SAM-HQ) so callers know the compatibility.
| import io | ||
| import os | ||
| import pathlib | ||
| from unittest.mock import MagicMock |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
plugins/huggingface/tests/test_transformers_segmentation.py (1)
231-232: Redundantis not Noneoninference_time_ms.
SegmentationCompletedEvent.inference_time_msis a float populated unconditionally in_process_frame, soassert result.inference_time_ms is not Nonefollowed by> 0is slightly belt-and-braces. Dropping the first line keeps the intent (> 0) and avoids implying the field is nullable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 231 - 232, Remove the redundant null-check assertion on inference_time_ms: in the test in plugins/huggingface/tests/test_transformers_segmentation.py drop the line asserting result.inference_time_ms is not None and keep only the numeric check (assert result.inference_time_ms > 0), since SegmentationCompletedEvent.inference_time_ms is always populated by _process_frame.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 222-226: The test races two independent forwarders over the same
QueuedVideoTrack because detection.process_video(cat_video_track, ...) and
segmentation.process_video(cat_video_track, ...) are called without a
shared_forwarder; each call creates its own VideoForwarder and both call recv(),
causing frames to be dequeued by only one consumer. Fix by wiring both
processors to the same forwarder: create or reuse a single
VideoForwarder/shared_forwarder (or use the Agent-managed forwarder) and pass
that shared_forwarder into both detection.process_video(...) and
segmentation.process_video(...) so detection and segmentation deterministically
see the same frames in order.
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 212-218: The close() method races with in-flight _segment calls
because self._executor.shutdown(wait=False) returns immediately while worker
threads may still be executing model(**inputs); change the shutdown logic in
transformers_segmentation.py::close (and the analogous close in
transformers_detection.py) to wait for running tasks to finish before unloading
resources: either call self._executor.shutdown(wait=True) or explicitly
await/track the current in-flight Future(s) and ensure they complete prior to
calling self.unload()/torch.cuda.empty_cache(), then set _closed and stop tracks
after shutdown to avoid tearing down CUDA while model execution is ongoing.
---
Nitpick comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 231-232: Remove the redundant null-check assertion on
inference_time_ms: in the test in
plugins/huggingface/tests/test_transformers_segmentation.py drop the line
asserting result.inference_time_ms is not None and keep only the numeric check
(assert result.inference_time_ms > 0), since
SegmentationCompletedEvent.inference_time_ms is always populated by
_process_frame.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 94525557-3869-47d2-9175-b962e52cab37
📒 Files selected for processing (4)
plugins/huggingface/tests/test_transformers_segmentation.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.pyplugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
| seg_output_track = segmentation.publish_video_track() | ||
| detection.publish_video_track() | ||
| await detection.process_video(cat_video_track, "user_id") | ||
| await segmentation.process_video(cat_video_track, "user_id") | ||
|
|
There was a problem hiding this comment.
Two forwarders racing on one cat_video_track.
detection.process_video(cat_video_track, ...) and segmentation.process_video(cat_video_track, ...) are called without a shared_forwarder, so each processor spins up its own VideoForwarder, and both end up calling recv() on the same QueuedVideoTrack. Since recv() dequeues, every frame goes to only one of the two forwarders — detection and segmentation see disjoint interleavings of the 100 queued frames. The test likely still passes because 100 frames is enough for both to eventually process one each, but the cache-and-chain contract (detection produces a box, then the segmenter segments that same frame) isn't actually being exercised deterministically. Consider sharing a single VideoForwarder between the two processors, or wiring them through the Agent-managed forwarder the way production does.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 222
- 226, The test races two independent forwarders over the same QueuedVideoTrack
because detection.process_video(cat_video_track, ...) and
segmentation.process_video(cat_video_track, ...) are called without a
shared_forwarder; each call creates its own VideoForwarder and both call recv(),
causing frames to be dequeued by only one consumer. Fix by wiring both
processors to the same forwarder: create or reuse a single
VideoForwarder/shared_forwarder (or use the Agent-managed forwarder) and pass
that shared_forwarder into both detection.process_video(...) and
segmentation.process_video(...) so detection and segmentation deterministically
see the same frames in order.
| async def close(self) -> None: | ||
| await self.stop_processing() | ||
| self._closed = True | ||
| self._executor.shutdown(wait=False) | ||
| self._video_track.stop() | ||
| self.unload() | ||
| logger.info("Transformers segmentation processor closed") |
There was a problem hiding this comment.
close() races with in-flight _segment calls.
self._executor.shutdown(wait=False) returns immediately, so a running _segment task keeps executing in the worker thread while self.unload() drops self._resources and calls gc.collect() / torch.cuda.empty_cache(). The in-flight call already captured resources locally on line 326 so it won't NPE, but tearing down CUDA caches while a worker thread is still doing model(**inputs) is a recipe for intermittent CUDA errors / device-side asserts in production shutdown paths. Consider shutdown(wait=True) (or awaiting the current future) before unload().
🛡️ Proposed fix
async def close(self) -> None:
await self.stop_processing()
self._closed = True
- self._executor.shutdown(wait=False)
+ # Drain in-flight inference before releasing GPU memory.
+ await asyncio.to_thread(self._executor.shutdown, True)
self._video_track.stop()
self.unload()
logger.info("Transformers segmentation processor closed")Same pattern exists in transformers_detection.py::close (line 189) — worth fixing both together.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 212 - 218, The close() method races with in-flight _segment calls
because self._executor.shutdown(wait=False) returns immediately while worker
threads may still be executing model(**inputs); change the shutdown logic in
transformers_segmentation.py::close (and the analogous close in
transformers_detection.py) to wait for running tasks to finish before unloading
resources: either call self._executor.shutdown(wait=True) or explicitly
await/track the current in-flight Future(s) and ensure they complete prior to
calling self.unload()/torch.cuda.empty_cache(), then set _closed and stop tracks
after shutdown to avoid tearing down CUDA while model execution is ongoing.
Summary by CodeRabbit
New Features
Tests