Add hf segmentation processor by maxkahan · Pull Request #498 · GetStream/Vision-Agents

maxkahan · 2026-04-20T15:00:46Z

Summary by CodeRabbit

New Features
- Local image/video segmentation with mask generation, configurable thresholds, optional annotation, and async agent integration.
- Example CLI demonstrating agent-based segmentation workflows and call-join flow.
- Public API additions for segmentation results/events and a new segmentation processor export.
Tests
- New unit and integration tests covering segmentation inference, mask post-processing, annotation behavior, and detection→segmentation chaining.

coderabbitai · 2026-04-20T15:04:56Z

📝 Walkthrough

Walkthrough

Adds a SAM2-based segmentation processor and associated event type, thread-safe model loading helpers, an example CLI demonstrating agent-based segmentation, and a comprehensive test suite for segmentation/detection integration and edge cases.

Changes

Cohort / File(s)	Summary
Segmentation Processor Core `plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`	New `TransformersSegmentationProcessor`: warmup/load SAM2 model, resolve device/dtype, attach to agent, subscribe to detections, run threaded per-frame segmentation, optional annotation, emit `SegmentationCompletedEvent`, and lifecycle (stop/close/unload).
Event Types `plugins/huggingface/vision_agents/plugins/huggingface/events.py`	Added `SegmentedObject` TypedDict and `SegmentationCompletedEvent` dataclass with per-object `mask_area`, `masks`, and image dimensions; updates detection_count in post-init.
Module Exports `plugins/huggingface/vision_agents/plugins/huggingface/__init__.py`	Export list extended to include `SegmentationCompletedEvent` and `TransformersSegmentationProcessor`.
Model Loading & Device Resolution `plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py`, `.../transformers_detection.py`	Added `resolve_device(...)` helper and `_model_load_lock` to serialize `from_pretrained` loads; detection loader updated to use resolved device and perform model.to(device) inside the lock.
Example Script `plugins/huggingface/examples/transformers/transformers_segmentation_example.py`	New CLI example that constructs an `Agent` with segmentation and detection processors, subscribes to `SegmentationCompletedEvent`, creates/joins a call, and runs via `Runner(AgentLauncher(...)).cli()`.
Tests `plugins/huggingface/tests/test_transformers_segmentation.py`	New test module with async fixtures, unit tests (mocked model/processor behavior, mask shapes, dtype/cpu checks, empty-detections behavior), and integration tests (warmup/detection→segmentation chaining, annotation effects, no-detections path).

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant SegProc as TransformersSegmentationProcessor
    participant Track as QueuedVideoTrack
    participant Detector as DetectionProcessor / EventPublisher
    participant Model as SAM2 Model
    participant Events as EventManager

    Agent->>SegProc: process_video(track, shared_forwarder?)
    SegProc->>Detector: subscribe to DetectionCompletedEvent
    SegProc->>Track: attach _process_frame handler

    loop per frame
        Track->>SegProc: _process_frame(frame)
        SegProc->>SegProc: check resources & _latest_detections
        alt detections present
            SegProc->>Model: run inference on ThreadPool (_segment)
            Model-->>SegProc: masks, iou_scores
            SegProc->>SegProc: post-process masks, build SegmentedObject list
            alt annotate enabled
                SegProc->>SegProc: annotate image (async thread)
            end
            SegProc->>Events: emit SegmentationCompletedEvent(objects, masks, metadata)
            SegProc->>Track: append annotated/original frame
        else no detections
            SegProc->>Track: forward original frame
        end
    end

    Agent->>SegProc: close()
    SegProc->>SegProc: shutdown executor, unload model, clear caches

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I knifed the light; the masks bloom under skin—
the tiny dead geometry of an eye, precise, obscene.
Each pixel counts its small betrayals, a census of holes.
The model breathes and does not apologize, only names.
I keep watch while the silence learns its shapes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add hf segmentation processor' directly and clearly describes the main change: introducing a new TransformersSegmentationProcessor for the Hugging Face plugin.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch add-hf-segmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py (2)
164-167: Hoist the transformers import to the module top.

Per the coding guidelines, avoid function-local imports. The transformers dependency is already handled by the optional-dependency try/except ImportError in vision_agents/plugins/huggingface/__init__.py, so nothing is gained by importing AutoModelForMaskGeneration / AutoProcessor lazily inside _load_model_sync.
♻️ Proposed change
 import torch
 from PIL import Image
+from transformers import AutoModelForMaskGeneration, AutoProcessor
 from vision_agents.core import Agent
@@
-    def _load_model_sync(self) -> SegmentationResources:
-        from transformers import AutoModelForMaskGeneration, AutoProcessor
-
-        device = self._resolve_device()
+    def _load_model_sync(self) -> SegmentationResources:
+        device = self._resolve_device()
As per coding guidelines: "Do not use local imports; import at the top of the module".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 164 - 167, Move the transformers imports out of the function and
into the module top: remove the local import of AutoModelForMaskGeneration and
AutoProcessor inside _load_model_sync and instead add "from transformers import
AutoModelForMaskGeneration, AutoProcessor" at the top of the file; keep using
the same symbols in the _load_model_sync method (and ensure any
optional-dependency try/except logic remains in
vision_agents/plugins/huggingface/__init__.py so no additional runtime checks
are needed).
309-319: masks=[m for m in masks] can just be list(masks).

masks here is already the (N, H, W) numpy array returned from _segment, and iterating it yields the per-object 2D mask arrays. The comprehension is equivalent to list(masks) but heavier-looking.
♻️ Proposed tweak
-                masks=[m for m in masks],
+                masks=list(masks),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 309 - 319, The code creates a list of masks using a comprehension
masks=[m for m in masks], which is redundant and less clear; replace that
comprehension with list(masks) when constructing the SegmentationCompletedEvent
in the call to SegmentationCompletedEvent (reference: variable masks, event
sender self.events.send, and class SegmentationCompletedEvent) so the masks are
converted to a list more succinctly and clearly (the upstream _segment return
remains unchanged).
plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py (1)
142-160: Consider also guarding AutoImageProcessor.from_pretrained with the same lock.

The comment on _model_load_lock warns that concurrent from_pretrained calls are not thread-safe. You serialize the model load on lines 146-148 but AutoImageProcessor.from_pretrained(...) on line 152 is still outside the lock. In practice image-processor loading is lightweight, but if two detection/segmentation processors warm up in parallel threads it's still a (small) concurrent from_pretrained call that could hit the same underlying hub/cache code paths.
♻️ Optional tightening
-        device = self._resolve_device()
-        with _model_load_lock:
-            model = AutoModelForObjectDetection.from_pretrained(self.model_id)
-            model = model.to(device)
-
-        model.eval()
-
-        image_processor = AutoImageProcessor.from_pretrained(self.model_id)
+        device = self._resolve_device()
+        with _model_load_lock:
+            model = AutoModelForObjectDetection.from_pretrained(self.model_id)
+            model = model.to(device)
+            image_processor = AutoImageProcessor.from_pretrained(self.model_id)
+
+        model.eval()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py`
around lines 142 - 160, In _load_model_sync move the
AutoImageProcessor.from_pretrained call inside the same _model_load_lock used
for AutoModelForObjectDetection so both from_pretrained calls are serialized;
specifically, acquire _model_load_lock around the model load
(AutoModelForObjectDetection.from_pretrained) and the image processor load
(AutoImageProcessor.from_pretrained) in the _load_model_sync method so the
hub/cache code paths are not invoked concurrently, then release the lock before
calling model.eval and returning DetectionResources (retain using
first_param.device/first_param.dtype).
plugins/huggingface/vision_agents/plugins/huggingface/events.py (1)
70-88: Prefer a concrete mask type over list[Any].

The docstring already commits to "numpy arrays... of shape (H, W)", and the processor emits exactly that (see transformers_segmentation.py lines 373, 391). Tightening the annotation to list[np.ndarray] gives callers proper type info and aligns with the coding guideline to avoid Any in annotations.
♻️ Proposed refactor
+import numpy as np
 from dataclasses import dataclass, field
 from typing import Any, Optional, TypedDict
@@
-    objects: list[SegmentedObject] = field(default_factory=list)
-    masks: list[Any] = field(default_factory=list)
+    objects: list[SegmentedObject] = field(default_factory=list)
+    masks: list[np.ndarray] = field(default_factory=list)
As per coding guidelines: "Avoid using Any type in type annotations".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/events.py` around lines
70 - 88, The masks field in SegmentationCompletedEvent is annotated as list[Any]
but the docstring and the producer (transformers_segmentation) emit numpy arrays
of shape (H, W); change the annotation on SegmentationCompletedEvent.masks from
list[Any] to list[np.ndarray] (import numpy as np if necessary) so callers get
concrete typing and update any related type imports/usages to avoid Any while
preserving default_factory=list and runtime behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 59-194: Move all unit tests from TestSegmentWithMockedModel and
TestSegmentationAnnotation into the TestTransformersSegmentationProcessor test
class so all unit tests for TransformersSegmentationProcessor live in one place;
specifically relocate methods _make_resources,
test_segment_returns_objects_and_masks,
test_segment_passes_correct_boxes_to_processor,
test_segment_extracts_sizes_before_device_move,
test_segment_casts_float_inputs_to_model_dtype, test_segment_empty_detections,
test_set_detections_updates_latest, and test_annotate_changes_image into
TestTransformersSegmentationProcessor (you can use nested classes or method name
prefixes to separate "segment" vs "annotation" groups), remove the now-empty
TestSegmentWithMockedModel and TestSegmentationAnnotation classes, and ensure
the class-level `@pytest.mark.integration` is removed and any integration tests
keep `@pytest.mark.integration` at the method level only.
- Line 7: Replace all uses of unittest.mock.MagicMock in this module (notably
the agent_mock fixture, the _make_resources helper, and tests in
TestSegmentWithMockedModel and TestSegmentationAnnotation) with either: (A) a
small deterministic fake class that mimics the SAM2 model/processor API by
exposing attributes and methods .pred_masks, .iou_scores, and
post_process_masks(...) returning correctly shaped arrays/values matching SAM2
expectations; or (B) convert those tests to `@pytest.mark.integration` and run the
real tiny HF/SAM2 model as the existing integration test does. Update places
referencing agent_mock and _make_resources to instantiate the fake object (or
call the real model) so tests no longer import or use MagicMock.
- Around line 268-295: The test creates a QueuedVideoTrack but never injects a
real frame, so original_frame is the default placeholder; fix by adding a real
VideoFrame to input_track before calling segmentation.process_video (follow the
existing cat_video_track pattern or synthesize a VideoFrame and push it into
QueuedVideoTrack via its add/enqueue method), then call await input_track.recv()
to capture that real frame and continue the existing assertions (references:
test_no_detections_passes_through, QueuedVideoTrack, input_track.recv(),
segmentation.process_video, output_track.recv()).

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 106-120: The constructor currently validates mask_opacity but not
fps; add a one-line guard in __init__ (in transformers_segmentation.py) to
validate fps (e.g. if not isinstance(fps, int) or fps <= 0: raise
ValueError("fps must be a positive integer")) before assigning self.fps so
invalid values can't flow into QueuedVideoTrack(fps=self.fps,
max_queue_size=self.fps) or VideoForwarder(..., fps=self.fps).
- Around line 331-391: The _segment function assumes SAM/SAM2 output shapes and
currently indexes outputs.pred_masks, masks_tensor, and outputs.iou_scores as if
they are (batch, num_masks, H, W) and (batch, num_masks) respectively; add a
defensive shape check after running the model (inspect outputs.pred_masks.shape
and outputs.iou_scores.shape and masks_tensor.shape) to assert they match the
expected 4D/2D layouts (e.g., batch_size x num_masks x H x W and batch_size x
num_masks) and raise a clear error mentioning _segment and the model class if an
unexpected extra dimension like point_batch_size is present (suggested message:
unsupported model output layout for EdgeTAM/perceiver-based models); update the
docstring for _segment to state explicitly which model output layouts are
supported (SAM/SAM2/SAM-HQ) so callers know the compatibility.
- Around line 237-243: attach_agent defines an on_detection closure via
self._events.subscribe that captures self but is never unsubscribed, so store
the returned handler reference (e.g., self._on_detection_handler) when
subscribing in attach_agent and then unsubscribe it in close by calling
self._events.unsubscribe(self._on_detection_handler) (ensure you check for None
and that self._events is set), before setting self._closed; reference symbols:
attach_agent, on_detection, self._on_detection_handler, close,
self._events.unsubscribe.

---

Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/events.py`:
- Around line 70-88: The masks field in SegmentationCompletedEvent is annotated
as list[Any] but the docstring and the producer (transformers_segmentation) emit
numpy arrays of shape (H, W); change the annotation on
SegmentationCompletedEvent.masks from list[Any] to list[np.ndarray] (import
numpy as np if necessary) so callers get concrete typing and update any related
type imports/usages to avoid Any while preserving default_factory=list and
runtime behavior.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py`:
- Around line 142-160: In _load_model_sync move the
AutoImageProcessor.from_pretrained call inside the same _model_load_lock used
for AutoModelForObjectDetection so both from_pretrained calls are serialized;
specifically, acquire _model_load_lock around the model load
(AutoModelForObjectDetection.from_pretrained) and the image processor load
(AutoImageProcessor.from_pretrained) in the _load_model_sync method so the
hub/cache code paths are not invoked concurrently, then release the lock before
calling model.eval and returning DetectionResources (retain using
first_param.device/first_param.dtype).

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 164-167: Move the transformers imports out of the function and
into the module top: remove the local import of AutoModelForMaskGeneration and
AutoProcessor inside _load_model_sync and instead add "from transformers import
AutoModelForMaskGeneration, AutoProcessor" at the top of the file; keep using
the same symbols in the _load_model_sync method (and ensure any
optional-dependency try/except logic remains in
vision_agents/plugins/huggingface/__init__.py so no additional runtime checks
are needed).
- Around line 309-319: The code creates a list of masks using a comprehension
masks=[m for m in masks], which is redundant and less clear; replace that
comprehension with list(masks) when constructing the SegmentationCompletedEvent
in the call to SegmentationCompletedEvent (reference: variable masks, event
sender self.events.send, and class SegmentationCompletedEvent) so the masks are
converted to a list more succinctly and clearly (the upstream _segment return
remains unchanged).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ece20194-20e8-4ed6-877c-feebe480a5e9

📥 Commits

Reviewing files that changed from the base of the PR and between d887697 and 59048cc.

📒 Files selected for processing (7)

plugins/huggingface/examples/transformers/transformers_segmentation_example.py
plugins/huggingface/tests/test_transformers_segmentation.py
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
plugins/huggingface/vision_agents/plugins/huggingface/events.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py

coderabbitai · 2026-04-20T15:12:05Z

+import io
+import os
+import pathlib
+from unittest.mock import MagicMock


⚠️ Potential issue | 🟠 Major

unittest.mock.MagicMock usage contradicts the repo's testing rules.

Every MagicMock(...) in this module (the agent_mock fixture, the _make_resources helper, and all tests inside TestSegmentWithMockedModel / TestSegmentationAnnotation that rely on mocked HF model/processor return values) violates the testing guideline. Since SAM2 output shapes are quite specific, the cleanest replacement is a hand-rolled fake class exposing .pred_masks, .iou_scores, .post_process_masks(...) — no MagicMock, just small deterministic Python objects — or moving these cases under @pytest.mark.integration and running the real tiny model as the integration block already does.

As per coding guidelines: "Framework: pytest. Never mock." and "Never mock in tests; use pytest for testing".

Also applies to: 47-50, 59-163

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_transformers_segmentation.py` at line 7, Replace all uses of unittest.mock.MagicMock in this module (notably the agent_mock fixture, the _make_resources helper, and tests in TestSegmentWithMockedModel and TestSegmentationAnnotation) with either: (A) a small deterministic fake class that mimics the SAM2 model/processor API by exposing attributes and methods .pred_masks, .iou_scores, and post_process_masks(...) returning correctly shaped arrays/values matching SAM2 expectations; or (B) convert those tests to `@pytest.mark.integration` and run the real tiny HF/SAM2 model as the existing integration test does. Update places referencing agent_mock and _make_resources to instantiate the fake object (or call the real model) so tests no longer import or use MagicMock.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-04-20T15:12:05Z

+class TestSegmentWithMockedModel:
+    """Test _segment() with mocked SAM model and processor."""
+
+    def _make_resources(
+        self, num_objects: int = 2, img_h: int = 480, img_w: int = 640
+    ) -> SegmentationResources:
+        model = MagicMock()
+        pred_masks = torch.randn(1, num_objects, 1, 256, 256)
+        iou_scores = torch.tensor([[[0.85], [0.92]]])[:, :num_objects, :]
+        model.return_value = MagicMock(pred_masks=pred_masks, iou_scores=iou_scores)
+
+        processor = MagicMock()
+        processor.return_value = {
+            "pixel_values": torch.zeros(1, 3, 1024, 1024),
+            "original_sizes": torch.tensor([[img_h, img_w]]),
+            "reshaped_input_sizes": torch.tensor([[1024, 1024]]),
+            "input_boxes": torch.zeros(1, num_objects, 4),
+        }
+        binary_masks = torch.ones(num_objects, 1, img_h, img_w, dtype=torch.bool)
+        processor.post_process_masks.return_value = [binary_masks]
+
+        return SegmentationResources(
+            model=model,
+            processor=processor,
+            device=torch.device("cpu"),
+            dtype=torch.float32,
+        )
+
+    def test_segment_returns_objects_and_masks(self):
+        resources = self._make_resources(num_objects=2)
+        processor = TransformersSegmentationProcessor()
+        processor.on_warmed_up(resources)
+
+        image = np.zeros((480, 640, 3), dtype=np.uint8)
+        objects, masks = processor._segment(image, FAKE_DETECTIONS)
+
+        assert len(objects) == 2
+        assert objects[0]["label"] == "cat"
+        assert objects[1]["label"] == "dog"
+        assert objects[0]["confidence"] == 0.85
+        assert objects[1]["confidence"] == 0.92
+        assert objects[0]["mask_area"] == 480 * 640
+        assert masks.shape == (2, 480, 640)
+
+    def test_segment_passes_correct_boxes_to_processor(self):
+        resources = self._make_resources(num_objects=2)
+        processor = TransformersSegmentationProcessor()
+        processor.on_warmed_up(resources)
+
+        image = np.zeros((480, 640, 3), dtype=np.uint8)
+        processor._segment(image, FAKE_DETECTIONS)
+
+        call_kwargs = resources.processor.call_args
+        input_boxes = call_kwargs.kwargs.get(
+            "input_boxes", call_kwargs.args[0] if call_kwargs.args else None
+        )
+        assert input_boxes == [[[10, 20, 100, 200], [150, 60, 300, 250]]]
+
+    def test_segment_extracts_sizes_before_device_move(self):
+        resources = self._make_resources(num_objects=1)
+        processor = TransformersSegmentationProcessor()
+        processor.on_warmed_up(resources)
+
+        image = np.zeros((480, 640, 3), dtype=np.uint8)
+        processor._segment(image, FAKE_DETECTIONS[:1])
+
+        post_process_call = resources.processor.post_process_masks.call_args
+        original_sizes = post_process_call.args[1]
+        assert isinstance(original_sizes, torch.Tensor)
+        assert original_sizes.device == torch.device("cpu")
+
+    def test_segment_casts_float_inputs_to_model_dtype(self):
+        resources = self._make_resources(num_objects=1)
+        resources.dtype = torch.float16
+        processor = TransformersSegmentationProcessor()
+        processor.on_warmed_up(resources)
+
+        image = np.zeros((480, 640, 3), dtype=np.uint8)
+        processor._segment(image, FAKE_DETECTIONS[:1])
+
+        call_kwargs = resources.model.call_args
+        pixel_values = call_kwargs.kwargs.get(
+            "pixel_values", call_kwargs.args[0] if call_kwargs.args else None
+        )
+        if pixel_values is not None and isinstance(pixel_values, torch.Tensor):
+            assert pixel_values.dtype == torch.float16
+
+    def test_segment_empty_detections(self):
+        resources = self._make_resources(num_objects=0)
+        processor = TransformersSegmentationProcessor()
+        processor.on_warmed_up(resources)
+
+        image = np.zeros((480, 640, 3), dtype=np.uint8)
+        objects, masks = processor._segment(image, [])
+
+        assert objects == []
+        resources.model.assert_not_called()
+
+    def test_set_detections_updates_latest(self):
+        processor = TransformersSegmentationProcessor()
+        assert processor._latest_detections == []
+
+        processor.set_detections(FAKE_DETECTIONS)
+        assert len(processor._latest_detections) == 2
+        assert processor._latest_detections[0]["label"] == "cat"
+
+
+class TestSegmentationAnnotation:
+    """Test that annotation draws masks on the image."""
+
+    def test_annotate_changes_image(self):
+        processor = TransformersSegmentationProcessor(mask_opacity=0.5)
+
+        image = np.full((100, 100, 3), 128, dtype=np.uint8)
+        objects = [
+            FAKE_DETECTIONS[0],
+        ]
+        masks = np.ones((1, 100, 100), dtype=bool)
+
+        annotated = processor._annotate(image, objects, masks)
+
+        assert annotated.shape == image.shape
+        assert (annotated != image).any()
+
+
+SEGMENTATION_MODEL_ID = os.getenv(
+    "TRANSFORMERS_TEST_SEGMENTATION_MODEL", "facebook/sam2-hiera-tiny"
+)
+DETECTION_MODEL_ID = os.getenv(
+    "TRANSFORMERS_TEST_DETECTION_MODEL", "PekingU/rtdetr_v2_r18vd"
+)
+
+
+@pytest.mark.integration
+@skip_blockbuster
+class TestTransformersSegmentationProcessor:


🛠️ Refactor suggestion | 🟠 Major

Consolidate unit tests under TestTransformersSegmentationProcessor.

Unit tests for TransformersSegmentationProcessor are currently split across TestSegmentWithMockedModel, TestSegmentationAnnotation, and the integration-marked TestTransformersSegmentationProcessor. The guideline asks for a single class per subject under test. Suggest merging the unit cases into TestTransformersSegmentationProcessor (using nested classes or method naming to distinguish segment / annotation / integration groups) and keeping @pytest.mark.integration at the method level where appropriate instead of at class level.

As per coding guidelines: "Keep unit tests for the class under the same test class. For example, tests for Agent must be inside TestAgent. Do not spread them around different test classes".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 59 - 194, Move all unit tests from TestSegmentWithMockedModel and TestSegmentationAnnotation into the TestTransformersSegmentationProcessor test class so all unit tests for TransformersSegmentationProcessor live in one place; specifically relocate methods _make_resources, test_segment_returns_objects_and_masks, test_segment_passes_correct_boxes_to_processor, test_segment_extracts_sizes_before_device_move, test_segment_casts_float_inputs_to_model_dtype, test_segment_empty_detections, test_set_detections_updates_latest, and test_annotate_changes_image into TestTransformersSegmentationProcessor (you can use nested classes or method name prefixes to separate "segment" vs "annotation" groups), remove the now-empty TestSegmentWithMockedModel and TestSegmentationAnnotation classes, and ensure the class-level `@pytest.mark.integration` is removed and any integration tests keep `@pytest.mark.integration` at the method level only.

coderabbitai · 2026-04-20T15:12:05Z

+    async def test_no_detections_passes_through(self, agent_mock, events_manager):
+        """Without detection events, frames pass through unmodified."""
+        segmentation = TransformersSegmentationProcessor(
+            model=SEGMENTATION_MODEL_ID, fps=10
+        )
+        await segmentation.warmup()
+        segmentation.attach_agent(agent_mock)
+
+        seg_future: asyncio.Future[SegmentationCompletedEvent] = asyncio.Future()
+
+        @events_manager.subscribe
+        async def on_seg_event(event: SegmentationCompletedEvent):
+            if not seg_future.done():
+                seg_future.set_result(event)
+
+        input_track = QueuedVideoTrack()
+        original_frame = await input_track.recv()
+        assert isinstance(original_frame, VideoFrame)
+
+        output_track = segmentation.publish_video_track()
+        await segmentation.process_video(input_track, "user_id")
+
+        with pytest.raises(asyncio.TimeoutError):
+            await asyncio.wait_for(seg_future, 5)
+
+        output_frame = await output_track.recv()
+        assert isinstance(output_frame, VideoFrame)
+        assert (original_frame.to_ndarray() == output_frame.to_ndarray()).all()


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm QueuedVideoTrack.recv() behavior on an empty queue. fd -t f video_track.py rg -nP -C10 '(class QueuedVideoTrack|async def recv)' --type=py

Repository: GetStream/Vision-Agents

Length of output: 25353

🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '63,85p'

Repository: GetStream/Vision-Agents

Length of output: 1026

🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '29,45p'

Repository: GetStream/Vision-Agents

Length of output: 750

🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '29,52p'

Repository: GetStream/Vision-Agents

Length of output: 1111

🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '63,95p'

Repository: GetStream/Vision-Agents

Length of output: 1297

🏁 Script executed:

cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '29,38p'

Repository: GetStream/Vision-Agents

Length of output: 565

🏁 Script executed:

cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '268,295p'

Repository: GetStream/Vision-Agents

Length of output: 1436

🏁 Script executed:

cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '268,300p'

Repository: GetStream/Vision-Agents

Length of output: 1529

test_no_detections_passes_through compares empty placeholder frames instead of real video content.

QueuedVideoTrack() on line 283 is created empty. When await input_track.recv() is called on line 284, it will wait for up to 1 second for a frame, then catch the timeout and return self.last_frame—which is initialized as an empty blue placeholder frame, not real input data.

The test does not hang, but original_frame becomes this placeholder rather than actual video content. Comparing this empty frame at line 295 does not test whether frames pass through unmodified. Add frames to input_track before processing (following the pattern from cat_video_track), or synthesize and add a real VideoFrame to the track.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 268 - 295, The test creates a QueuedVideoTrack but never injects a real frame, so original_frame is the default placeholder; fix by adding a real VideoFrame to input_track before calling segmentation.process_video (follow the existing cat_video_track pattern or synthesize a VideoFrame and push it into QueuedVideoTrack via its add/enqueue method), then call await input_track.recv() to capture that real frame and continue the existing assertions (references: test_no_detections_passes_through, QueuedVideoTrack, input_track.recv(), segmentation.process_video, output_track.recv()).

coderabbitai · 2026-04-20T15:12:05Z

+    def __init__(
+        self,
+        model: str = "facebook/sam2.1-hiera-tiny",
+        fps: int = 5,
+        device: DeviceType = "auto",
+        torch_dtype: TorchDtypeType = "auto",
+        annotate: bool = True,
+        mask_opacity: float = 0.35,
+        mask_threshold: float = 0.0,
+    ):
+        if not 0 <= mask_opacity <= 1.0:
+            raise ValueError("mask_opacity must be between 0 and 1.")
+
+        self.model_id = model
+        self.fps = fps


⚠️ Potential issue | 🟡 Minor

Also validate fps.

mask_opacity is range-checked in the constructor, but fps flows directly into QueuedVideoTrack(fps=self.fps, max_queue_size=self.fps) and VideoForwarder(..., fps=self.fps). A zero or negative fps will silently produce a broken track / zero-sized queue. A one-line guard mirrors the existing input validation style.

🛡️ Proposed guard

if not 0 <= mask_opacity <= 1.0: raise ValueError("mask_opacity must be between 0 and 1.") + if fps <= 0: + raise ValueError("fps must be positive.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(

self,

model: str = "facebook/sam2.1-hiera-tiny",

fps: int = 5,

device: DeviceType = "auto",

torch_dtype: TorchDtypeType = "auto",

annotate: bool = True,

mask_opacity: float = 0.35,

mask_threshold: float = 0.0,

):

if not 0 <= mask_opacity <= 1.0:

raise ValueError("mask_opacity must be between 0 and 1.")

self.model_id = model

self.fps = fps

def __init__(

self,

model: str = "facebook/sam2.1-hiera-tiny",

fps: int = 5,

device: DeviceType = "auto",

torch_dtype: TorchDtypeType = "auto",

annotate: bool = True,

mask_opacity: float = 0.35,

mask_threshold: float = 0.0,

):

if not 0 <= mask_opacity <= 1.0:

raise ValueError("mask_opacity must be between 0 and 1.")

if fps <= 0:

raise ValueError("fps must be positive.")

self.model_id = model

self.fps = fps

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py` around lines 106 - 120, The constructor currently validates mask_opacity but not fps; add a one-line guard in __init__ (in transformers_segmentation.py) to validate fps (e.g. if not isinstance(fps, int) or fps <= 0: raise ValueError("fps must be a positive integer")) before assigning self.fps so invalid values can't flow into QueuedVideoTrack(fps=self.fps, max_queue_size=self.fps) or VideoForwarder(..., fps=self.fps).

coderabbitai · 2026-04-20T15:12:05Z

+    def attach_agent(self, agent: Agent) -> None:
+        self._events = agent.events
+        self._events.register(DetectionCompletedEvent, SegmentationCompletedEvent)
+
+        @self._events.subscribe
+        async def on_detection(event: DetectionCompletedEvent) -> None:
+            self._latest_detections = list(event.objects)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what unsubscribe mechanism EventManager offers. fd -t f events.py --full-path 'agents-core' rg -nP -C3 '(def subscribe|def unsubscribe|def remove_handler)' --type=py -g '**/events.py' -g '**/event_manager*.py'

Repository: GetStream/Vision-Agents

Length of output: 396

🏁 Script executed:

cat -n agents-core/vision_agents/core/agents/events.py | head -100

Repository: GetStream/Vision-Agents

Length of output: 2219

🏁 Script executed:

rg -n "class EventManager|def subscribe|def unsubscribe|def remove" agents-core/vision_agents/core/agents/events.py -A 5

Repository: GetStream/Vision-Agents

Length of output: 49

🏁 Script executed:

cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py | sed -n '200,300p'

Repository: GetStream/Vision-Agents

Length of output: 4480

🏁 Script executed:

rg "class EventManager" --type=py -B 2 -A 15

Repository: GetStream/Vision-Agents

Length of output: 1564

🏁 Script executed:

fd -type f "event_manager.py"

Repository: GetStream/Vision-Agents

Length of output: 237

🏁 Script executed:

rg "def subscribe|def unsubscribe" --type=py -B 1 -A 10 agents-core/vision_agents/core/agents/events.py

Repository: GetStream/Vision-Agents

Length of output: 49

🏁 Script executed:

cat agents-core/vision_agents/core/events/manager.py | head -200

Repository: GetStream/Vision-Agents

Length of output: 7296

🏁 Script executed:

rg "def subscribe|def unsubscribe|def remove_handler|def unregister_handler" agents-core/vision_agents/core/events/manager.py -A 8

Repository: GetStream/Vision-Agents

Length of output: 682

🏁 Script executed:

rg "def subscribe|def unsubscribe" agents-core/vision_agents/core/events/manager.py -A 25

Repository: GetStream/Vision-Agents

Length of output: 1940

🏁 Script executed:

rg "def subscribe" agents-core/vision_agents/core/events/manager.py -A 40 | head -60

Repository: GetStream/Vision-Agents

Length of output: 1712

🏁 Script executed:

cat agents-core/vision_agents/core/events/manager.py | sed -n '290,360p'

Repository: GetStream/Vision-Agents

Length of output: 2754

🏁 Script executed:

rg "@self._events.subscribe|@.*\.events\.subscribe" plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py -B 2 -A 5

Repository: GetStream/Vision-Agents

Length of output: 1146

🏁 Script executed:

cat agents-core/vision_agents/core/events/manager.py | sed -n '340,380p'

Repository: GetStream/Vision-Agents

Length of output: 1802

🏁 Script executed:

cat plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py | sed -n '1,50p'

Repository: GetStream/Vision-Agents

Length of output: 1630

on_detection handler should be unsubscribed in close().

The attach_agent method registers an event handler as a local closure that captures self. Since EventManager.subscribe() returns the function, you can store the reference and call self._events.unsubscribe(handler) in close(). Without explicit unsubscription, the handler remains in the event manager's handler list indefinitely, preventing garbage collection of the processor and its resources if the agent outlives this processor instance (e.g., hot-reload or re-attach scenarios).

Although _process_frame includes a _closed guard, this doesn't remove the handler from memory—it only prevents it from processing. Store the handler reference during attach_agent and unsubscribe it in close():

Fix approach

def attach_agent(self, agent: Agent) -> None: self._events = agent.events self._events.register(DetectionCompletedEvent, SegmentationCompletedEvent) `@self._events.subscribe` async def on_detection(event: DetectionCompletedEvent) -> None: self._latest_detections = list(event.objects) self._on_detection_handler = on_detection async def close(self) -> None: await self.stop_processing() if self._on_detection_handler is not None: self._events.unsubscribe(self._on_detection_handler) self._closed = True # ... rest of close() ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py` around lines 237 - 243, attach_agent defines an on_detection closure via self._events.subscribe that captures self but is never unsubscribed, so store the returned handler reference (e.g., self._on_detection_handler) when subscribing in attach_agent and then unsubscribe it in close by calling self._events.unsubscribe(self._on_detection_handler) (ensure you check for None and that self._events is set), before setting self._closed; reference symbols: attach_agent, on_detection, self._on_detection_handler, close, self._events.unsubscribe.

coderabbitai · 2026-04-20T15:12:05Z

+    def _segment(
+        self,
+        image: np.ndarray,
+        detections: list[DetectedObject],
+    ) -> tuple[list[SegmentedObject], np.ndarray]:
+        """Run segmentation on a single frame (called in thread pool)."""
+        resources = self._resources
+        if resources is None or not detections:
+            return [], np.empty(0)
+
+        pil_image = Image.fromarray(image)
+        boxes = [[d["x1"], d["y1"], d["x2"], d["y2"]] for d in detections]
+
+        inputs = resources.processor(
+            images=pil_image, input_boxes=[boxes], return_tensors="pt"
+        )
+
+        original_sizes = inputs.pop("original_sizes")
+        reshaped_input_sizes = inputs.pop("reshaped_input_sizes")
+
+        inputs = {
+            k: (
+                v.to(device=resources.device, dtype=resources.dtype)
+                if v.is_floating_point()
+                else v.to(device=resources.device)
+            )
+            if isinstance(v, torch.Tensor)
+            else v
+            for k, v in inputs.items()
+        }
+
+        with torch.no_grad():
+            outputs = resources.model(**inputs, multimask_output=False)
+
+        masks_tensor = resources.processor.post_process_masks(
+            outputs.pred_masks.cpu(),
+            original_sizes,
+            reshaped_input_sizes=reshaped_input_sizes,
+            binarize=True,
+            mask_threshold=self._mask_threshold,
+        )
+
+        per_object_masks = masks_tensor[0].squeeze(1).numpy()
+        iou_scores = outputs.iou_scores[0].squeeze(-1).cpu()
+
+        objects: list[SegmentedObject] = []
+        for i, detection in enumerate(detections):
+            mask = per_object_masks[i]
+            objects.append(
+                SegmentedObject(
+                    label=detection["label"],
+                    confidence=round(iou_scores[i].item(), 4),
+                    x1=detection["x1"],
+                    y1=detection["y1"],
+                    x2=detection["x2"],
+                    y2=detection["y2"],
+                    mask_area=int(mask.sum()),
+                )
+            )
+
+        return objects, per_object_masks


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Do Hugging Face SAM-HQ and EdgeTAM checkpoints return pred_masks and iou_scores in the same shape as SAM/SAM2 via AutoModelForMaskGeneration?

💡 Result:

No, Hugging Face SAM-HQ and EdgeTAM checkpoints do not return pred_masks and iou_scores in exactly the same shape as SAM/SAM2 via AutoModelForMaskGeneration. SAM/SAM2 (SamModel/Sam2Model): pred_masks shape (batch_size, num_masks, height, width), iou_scores (batch_size, num_masks). For single point prompt, num_masks=3 (or 1 if multimask_output=False), no point_batch_size dimension. SAM-HQ (SamHQModel): Similar usage and code examples (outputs.pred_masks, outputs.iou_scores post-processed identically), but source code shows potential for (batch_size, num_prompts, num_masks, height, width) due to HQ token modifications. Docs/examples treat as compatible. EdgeTAM (EdgeTamModel): pred_masks (batch_size, point_batch_size, num_masks, height, width), iou_scores (batch_size, point_batch_size, num_masks). Includes explicit point_batch_size dimension (e.g., tensor shapes in examples show 3 masks post-processed, but raw includes batching dim), differing from SAM/SAM2 for batched points. All use compatible processors (SamProcessor/Sam2Processor) and post_process_masks, but raw output shapes vary slightly due to architectural differences (SAM-HQ enhancements, EdgeTAM SAM2 adaptation with perceiver). For single prompts, post-processed masks/iou_scores align, but raw pred_masks/iou_scores shapes differ for EdgeTAM.

Citations:

1: https://huggingface.co/docs/transformers/model_doc/sam_hq

2: https://huggingface.co/docs/transformers/en/model_doc/sam

3: https://huggingface.co/docs/transformers/model_doc/edgetam

4: https://github.com/huggingface/transformers/blob/main/src/transformers/models/sam_hq/modeling_sam_hq.py

5: https://huggingface.co/docs/transformers/v4.57.1/model_doc/sam2

_segment assumes SAM/SAM2 output shapes — not compatible with all advertised models.

The code directly indexes raw outputs — outputs.pred_masks[0], masks_tensor[0].squeeze(1), and outputs.iou_scores[0].squeeze(-1) — which assumes batch-first layout matching SAM/SAM2. However, EdgeTAM outputs pred_masks with shape (batch_size, point_batch_size, num_masks, H, W) and iou_scores with (batch_size, point_batch_size, num_masks) due to its perceiver-based architecture, introducing an extra point_batch_size dimension that breaks the squeeze chain. Since the docstring advertises support for "SAM2, SAM-HQ, EdgeTAM, etc.", either narrow the docstring to SAM/SAM-HQ only or add a defensive shape assertion before indexing/squeezing to clarify supported models and fail clearly on unsupported variants.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py` around lines 331 - 391, The _segment function assumes SAM/SAM2 output shapes and currently indexes outputs.pred_masks, masks_tensor, and outputs.iou_scores as if they are (batch, num_masks, H, W) and (batch, num_masks) respectively; add a defensive shape check after running the model (inspect outputs.pred_masks.shape and outputs.iou_scores.shape and masks_tensor.shape) to assert they match the expected 4D/2D layouts (e.g., batch_size x num_masks x H x W and batch_size x num_masks) and raise a clear error mentioning _segment and the model class if an unexpected extra dimension like point_batch_size is present (suggested message: unsupported model output layout for EdgeTAM/perceiver-based models); update the docstring for _segment to state explicitly which model output layouts are supported (SAM/SAM2/SAM-HQ) so callers know the compatibility.

Nash0x7E2 · 2026-04-20T17:29:39Z

+import io
+import os
+import pathlib
+from unittest.mock import MagicMock


coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

plugins/huggingface/tests/test_transformers_segmentation.py (1)
231-232: Redundant is not None on inference_time_ms.

SegmentationCompletedEvent.inference_time_ms is a float populated unconditionally in _process_frame, so assert result.inference_time_ms is not None followed by > 0 is slightly belt-and-braces. Dropping the first line keeps the intent (> 0) and avoids implying the field is nullable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 231
- 232, Remove the redundant null-check assertion on inference_time_ms: in the
test in plugins/huggingface/tests/test_transformers_segmentation.py drop the
line asserting result.inference_time_ms is not None and keep only the numeric
check (assert result.inference_time_ms > 0), since
SegmentationCompletedEvent.inference_time_ms is always populated by
_process_frame.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 222-226: The test races two independent forwarders over the same
QueuedVideoTrack because detection.process_video(cat_video_track, ...) and
segmentation.process_video(cat_video_track, ...) are called without a
shared_forwarder; each call creates its own VideoForwarder and both call recv(),
causing frames to be dequeued by only one consumer. Fix by wiring both
processors to the same forwarder: create or reuse a single
VideoForwarder/shared_forwarder (or use the Agent-managed forwarder) and pass
that shared_forwarder into both detection.process_video(...) and
segmentation.process_video(...) so detection and segmentation deterministically
see the same frames in order.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 212-218: The close() method races with in-flight _segment calls
because self._executor.shutdown(wait=False) returns immediately while worker
threads may still be executing model(**inputs); change the shutdown logic in
transformers_segmentation.py::close (and the analogous close in
transformers_detection.py) to wait for running tasks to finish before unloading
resources: either call self._executor.shutdown(wait=True) or explicitly
await/track the current in-flight Future(s) and ensure they complete prior to
calling self.unload()/torch.cuda.empty_cache(), then set _closed and stop tracks
after shutdown to avoid tearing down CUDA while model execution is ongoing.

---

Nitpick comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 231-232: Remove the redundant null-check assertion on
inference_time_ms: in the test in
plugins/huggingface/tests/test_transformers_segmentation.py drop the line
asserting result.inference_time_ms is not None and keep only the numeric check
(assert result.inference_time_ms > 0), since
SegmentationCompletedEvent.inference_time_ms is always populated by
_process_frame.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94525557-3869-47d2-9175-b962e52cab37

📥 Commits

Reviewing files that changed from the base of the PR and between 59048cc and 1fe0f14.

📒 Files selected for processing (4)

plugins/huggingface/tests/test_transformers_segmentation.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py

coderabbitai · 2026-04-21T14:51:35Z

+        seg_output_track = segmentation.publish_video_track()
+        detection.publish_video_track()
+        await detection.process_video(cat_video_track, "user_id")
+        await segmentation.process_video(cat_video_track, "user_id")
+


⚠️ Potential issue | 🟡 Minor

Two forwarders racing on one cat_video_track.

detection.process_video(cat_video_track, ...) and segmentation.process_video(cat_video_track, ...) are called without a shared_forwarder, so each processor spins up its own VideoForwarder, and both end up calling recv() on the same QueuedVideoTrack. Since recv() dequeues, every frame goes to only one of the two forwarders — detection and segmentation see disjoint interleavings of the 100 queued frames. The test likely still passes because 100 frames is enough for both to eventually process one each, but the cache-and-chain contract (detection produces a box, then the segmenter segments that same frame) isn't actually being exercised deterministically. Consider sharing a single VideoForwarder between the two processors, or wiring them through the Agent-managed forwarder the way production does.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 222 - 226, The test races two independent forwarders over the same QueuedVideoTrack because detection.process_video(cat_video_track, ...) and segmentation.process_video(cat_video_track, ...) are called without a shared_forwarder; each call creates its own VideoForwarder and both call recv(), causing frames to be dequeued by only one consumer. Fix by wiring both processors to the same forwarder: create or reuse a single VideoForwarder/shared_forwarder (or use the Agent-managed forwarder) and pass that shared_forwarder into both detection.process_video(...) and segmentation.process_video(...) so detection and segmentation deterministically see the same frames in order.

coderabbitai · 2026-04-21T14:51:35Z

+    async def close(self) -> None:
+        await self.stop_processing()
+        self._closed = True
+        self._executor.shutdown(wait=False)
+        self._video_track.stop()
+        self.unload()
+        logger.info("Transformers segmentation processor closed")


⚠️ Potential issue | 🟠 Major

close() races with in-flight _segment calls.

self._executor.shutdown(wait=False) returns immediately, so a running _segment task keeps executing in the worker thread while self.unload() drops self._resources and calls gc.collect() / torch.cuda.empty_cache(). The in-flight call already captured resources locally on line 326 so it won't NPE, but tearing down CUDA caches while a worker thread is still doing model(**inputs) is a recipe for intermittent CUDA errors / device-side asserts in production shutdown paths. Consider shutdown(wait=True) (or awaiting the current future) before unload().

🛡️ Proposed fix

async def close(self) -> None: await self.stop_processing() self._closed = True - self._executor.shutdown(wait=False) + # Drain in-flight inference before releasing GPU memory. + await asyncio.to_thread(self._executor.shutdown, True) self._video_track.stop() self.unload() logger.info("Transformers segmentation processor closed")

Same pattern exists in transformers_detection.py::close (line 189) — worth fixing both together.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py` around lines 212 - 218, The close() method races with in-flight _segment calls because self._executor.shutdown(wait=False) returns immediately while worker threads may still be executing model(**inputs); change the shutdown logic in transformers_segmentation.py::close (and the analogous close in transformers_detection.py) to wait for running tasks to finish before unloading resources: either call self._executor.shutdown(wait=True) or explicitly await/track the current in-flight Future(s) and ensure they complete prior to calling self.unload()/torch.cuda.empty_cache(), then set _closed and stop tracks after shutdown to avoid tearing down CUDA while model execution is ongoing.

maxkahan added 2 commits April 20, 2026 15:59

refactor transformers detection processor

fc0561d

add segmentation processor

59048cc

github-actions Bot added the plugins label Apr 20, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Nash0x7E2 approved these changes Apr 20, 2026

View reviewed changes

simplify plugin

1fe0f14

coderabbitai Bot reviewed Apr 21, 2026

View reviewed changes

Conversation

maxkahan commented Apr 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nash0x7E2 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Nash0x7E2 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxkahan commented Apr 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

coderabbitai Bot Apr 20, 2026 •

edited

Loading