Skip to content

Add hf segmentation processor#498

Open
maxkahan wants to merge 3 commits intomainfrom
add-hf-segmentation
Open

Add hf segmentation processor#498
maxkahan wants to merge 3 commits intomainfrom
add-hf-segmentation

Conversation

@maxkahan
Copy link
Copy Markdown
Contributor

@maxkahan maxkahan commented Apr 20, 2026

Summary by CodeRabbit

  • New Features

    • Local image/video segmentation with mask generation, configurable thresholds, optional annotation, and async agent integration.
    • Example CLI demonstrating agent-based segmentation workflows and call-join flow.
    • Public API additions for segmentation results/events and a new segmentation processor export.
  • Tests

    • New unit and integration tests covering segmentation inference, mask post-processing, annotation behavior, and detection→segmentation chaining.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

Adds a SAM2-based segmentation processor and associated event type, thread-safe model loading helpers, an example CLI demonstrating agent-based segmentation, and a comprehensive test suite for segmentation/detection integration and edge cases.

Changes

Cohort / File(s) Summary
Segmentation Processor Core
plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py
New TransformersSegmentationProcessor: warmup/load SAM2 model, resolve device/dtype, attach to agent, subscribe to detections, run threaded per-frame segmentation, optional annotation, emit SegmentationCompletedEvent, and lifecycle (stop/close/unload).
Event Types
plugins/huggingface/vision_agents/plugins/huggingface/events.py
Added SegmentedObject TypedDict and SegmentationCompletedEvent dataclass with per-object mask_area, masks, and image dimensions; updates detection_count in post-init.
Module Exports
plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
Export list extended to include SegmentationCompletedEvent and TransformersSegmentationProcessor.
Model Loading & Device Resolution
plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py, .../transformers_detection.py
Added resolve_device(...) helper and _model_load_lock to serialize from_pretrained loads; detection loader updated to use resolved device and perform model.to(device) inside the lock.
Example Script
plugins/huggingface/examples/transformers/transformers_segmentation_example.py
New CLI example that constructs an Agent with segmentation and detection processors, subscribes to SegmentationCompletedEvent, creates/joins a call, and runs via Runner(AgentLauncher(...)).cli().
Tests
plugins/huggingface/tests/test_transformers_segmentation.py
New test module with async fixtures, unit tests (mocked model/processor behavior, mask shapes, dtype/cpu checks, empty-detections behavior), and integration tests (warmup/detection→segmentation chaining, annotation effects, no-detections path).

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant SegProc as TransformersSegmentationProcessor
    participant Track as QueuedVideoTrack
    participant Detector as DetectionProcessor / EventPublisher
    participant Model as SAM2 Model
    participant Events as EventManager

    Agent->>SegProc: process_video(track, shared_forwarder?)
    SegProc->>Detector: subscribe to DetectionCompletedEvent
    SegProc->>Track: attach _process_frame handler

    loop per frame
        Track->>SegProc: _process_frame(frame)
        SegProc->>SegProc: check resources & _latest_detections
        alt detections present
            SegProc->>Model: run inference on ThreadPool (_segment)
            Model-->>SegProc: masks, iou_scores
            SegProc->>SegProc: post-process masks, build SegmentedObject list
            alt annotate enabled
                SegProc->>SegProc: annotate image (async thread)
            end
            SegProc->>Events: emit SegmentationCompletedEvent(objects, masks, metadata)
            SegProc->>Track: append annotated/original frame
        else no detections
            SegProc->>Track: forward original frame
        end
    end

    Agent->>SegProc: close()
    SegProc->>SegProc: shutdown executor, unload model, clear caches
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I knifed the light; the masks bloom under skin—
the tiny dead geometry of an eye, precise, obscene.
Each pixel counts its small betrayals, a census of holes.
The model breathes and does not apologize, only names.
I keep watch while the silence learns its shapes.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add hf segmentation processor' directly and clearly describes the main change: introducing a new TransformersSegmentationProcessor for the Hugging Face plugin.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-hf-segmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py (2)

164-167: Hoist the transformers import to the module top.

Per the coding guidelines, avoid function-local imports. The transformers dependency is already handled by the optional-dependency try/except ImportError in vision_agents/plugins/huggingface/__init__.py, so nothing is gained by importing AutoModelForMaskGeneration / AutoProcessor lazily inside _load_model_sync.

♻️ Proposed change
 import torch
 from PIL import Image
+from transformers import AutoModelForMaskGeneration, AutoProcessor
 from vision_agents.core import Agent
@@
-    def _load_model_sync(self) -> SegmentationResources:
-        from transformers import AutoModelForMaskGeneration, AutoProcessor
-
-        device = self._resolve_device()
+    def _load_model_sync(self) -> SegmentationResources:
+        device = self._resolve_device()

As per coding guidelines: "Do not use local imports; import at the top of the module".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 164 - 167, Move the transformers imports out of the function and
into the module top: remove the local import of AutoModelForMaskGeneration and
AutoProcessor inside _load_model_sync and instead add "from transformers import
AutoModelForMaskGeneration, AutoProcessor" at the top of the file; keep using
the same symbols in the _load_model_sync method (and ensure any
optional-dependency try/except logic remains in
vision_agents/plugins/huggingface/__init__.py so no additional runtime checks
are needed).

309-319: masks=[m for m in masks] can just be list(masks).

masks here is already the (N, H, W) numpy array returned from _segment, and iterating it yields the per-object 2D mask arrays. The comprehension is equivalent to list(masks) but heavier-looking.

♻️ Proposed tweak
-                masks=[m for m in masks],
+                masks=list(masks),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 309 - 319, The code creates a list of masks using a comprehension
masks=[m for m in masks], which is redundant and less clear; replace that
comprehension with list(masks) when constructing the SegmentationCompletedEvent
in the call to SegmentationCompletedEvent (reference: variable masks, event
sender self.events.send, and class SegmentationCompletedEvent) so the masks are
converted to a list more succinctly and clearly (the upstream _segment return
remains unchanged).
plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py (1)

142-160: Consider also guarding AutoImageProcessor.from_pretrained with the same lock.

The comment on _model_load_lock warns that concurrent from_pretrained calls are not thread-safe. You serialize the model load on lines 146-148 but AutoImageProcessor.from_pretrained(...) on line 152 is still outside the lock. In practice image-processor loading is lightweight, but if two detection/segmentation processors warm up in parallel threads it's still a (small) concurrent from_pretrained call that could hit the same underlying hub/cache code paths.

♻️ Optional tightening
-        device = self._resolve_device()
-        with _model_load_lock:
-            model = AutoModelForObjectDetection.from_pretrained(self.model_id)
-            model = model.to(device)
-
-        model.eval()
-
-        image_processor = AutoImageProcessor.from_pretrained(self.model_id)
+        device = self._resolve_device()
+        with _model_load_lock:
+            model = AutoModelForObjectDetection.from_pretrained(self.model_id)
+            model = model.to(device)
+            image_processor = AutoImageProcessor.from_pretrained(self.model_id)
+
+        model.eval()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py`
around lines 142 - 160, In _load_model_sync move the
AutoImageProcessor.from_pretrained call inside the same _model_load_lock used
for AutoModelForObjectDetection so both from_pretrained calls are serialized;
specifically, acquire _model_load_lock around the model load
(AutoModelForObjectDetection.from_pretrained) and the image processor load
(AutoImageProcessor.from_pretrained) in the _load_model_sync method so the
hub/cache code paths are not invoked concurrently, then release the lock before
calling model.eval and returning DetectionResources (retain using
first_param.device/first_param.dtype).
plugins/huggingface/vision_agents/plugins/huggingface/events.py (1)

70-88: Prefer a concrete mask type over list[Any].

The docstring already commits to "numpy arrays... of shape (H, W)", and the processor emits exactly that (see transformers_segmentation.py lines 373, 391). Tightening the annotation to list[np.ndarray] gives callers proper type info and aligns with the coding guideline to avoid Any in annotations.

♻️ Proposed refactor
+import numpy as np
 from dataclasses import dataclass, field
 from typing import Any, Optional, TypedDict
@@
-    objects: list[SegmentedObject] = field(default_factory=list)
-    masks: list[Any] = field(default_factory=list)
+    objects: list[SegmentedObject] = field(default_factory=list)
+    masks: list[np.ndarray] = field(default_factory=list)

As per coding guidelines: "Avoid using Any type in type annotations".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/vision_agents/plugins/huggingface/events.py` around lines
70 - 88, The masks field in SegmentationCompletedEvent is annotated as list[Any]
but the docstring and the producer (transformers_segmentation) emit numpy arrays
of shape (H, W); change the annotation on SegmentationCompletedEvent.masks from
list[Any] to list[np.ndarray] (import numpy as np if necessary) so callers get
concrete typing and update any related type imports/usages to avoid Any while
preserving default_factory=list and runtime behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 59-194: Move all unit tests from TestSegmentWithMockedModel and
TestSegmentationAnnotation into the TestTransformersSegmentationProcessor test
class so all unit tests for TransformersSegmentationProcessor live in one place;
specifically relocate methods _make_resources,
test_segment_returns_objects_and_masks,
test_segment_passes_correct_boxes_to_processor,
test_segment_extracts_sizes_before_device_move,
test_segment_casts_float_inputs_to_model_dtype, test_segment_empty_detections,
test_set_detections_updates_latest, and test_annotate_changes_image into
TestTransformersSegmentationProcessor (you can use nested classes or method name
prefixes to separate "segment" vs "annotation" groups), remove the now-empty
TestSegmentWithMockedModel and TestSegmentationAnnotation classes, and ensure
the class-level `@pytest.mark.integration` is removed and any integration tests
keep `@pytest.mark.integration` at the method level only.
- Line 7: Replace all uses of unittest.mock.MagicMock in this module (notably
the agent_mock fixture, the _make_resources helper, and tests in
TestSegmentWithMockedModel and TestSegmentationAnnotation) with either: (A) a
small deterministic fake class that mimics the SAM2 model/processor API by
exposing attributes and methods .pred_masks, .iou_scores, and
post_process_masks(...) returning correctly shaped arrays/values matching SAM2
expectations; or (B) convert those tests to `@pytest.mark.integration` and run the
real tiny HF/SAM2 model as the existing integration test does. Update places
referencing agent_mock and _make_resources to instantiate the fake object (or
call the real model) so tests no longer import or use MagicMock.
- Around line 268-295: The test creates a QueuedVideoTrack but never injects a
real frame, so original_frame is the default placeholder; fix by adding a real
VideoFrame to input_track before calling segmentation.process_video (follow the
existing cat_video_track pattern or synthesize a VideoFrame and push it into
QueuedVideoTrack via its add/enqueue method), then call await input_track.recv()
to capture that real frame and continue the existing assertions (references:
test_no_detections_passes_through, QueuedVideoTrack, input_track.recv(),
segmentation.process_video, output_track.recv()).

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 106-120: The constructor currently validates mask_opacity but not
fps; add a one-line guard in __init__ (in transformers_segmentation.py) to
validate fps (e.g. if not isinstance(fps, int) or fps <= 0: raise
ValueError("fps must be a positive integer")) before assigning self.fps so
invalid values can't flow into QueuedVideoTrack(fps=self.fps,
max_queue_size=self.fps) or VideoForwarder(..., fps=self.fps).
- Around line 331-391: The _segment function assumes SAM/SAM2 output shapes and
currently indexes outputs.pred_masks, masks_tensor, and outputs.iou_scores as if
they are (batch, num_masks, H, W) and (batch, num_masks) respectively; add a
defensive shape check after running the model (inspect outputs.pred_masks.shape
and outputs.iou_scores.shape and masks_tensor.shape) to assert they match the
expected 4D/2D layouts (e.g., batch_size x num_masks x H x W and batch_size x
num_masks) and raise a clear error mentioning _segment and the model class if an
unexpected extra dimension like point_batch_size is present (suggested message:
unsupported model output layout for EdgeTAM/perceiver-based models); update the
docstring for _segment to state explicitly which model output layouts are
supported (SAM/SAM2/SAM-HQ) so callers know the compatibility.
- Around line 237-243: attach_agent defines an on_detection closure via
self._events.subscribe that captures self but is never unsubscribed, so store
the returned handler reference (e.g., self._on_detection_handler) when
subscribing in attach_agent and then unsubscribe it in close by calling
self._events.unsubscribe(self._on_detection_handler) (ensure you check for None
and that self._events is set), before setting self._closed; reference symbols:
attach_agent, on_detection, self._on_detection_handler, close,
self._events.unsubscribe.

---

Nitpick comments:
In `@plugins/huggingface/vision_agents/plugins/huggingface/events.py`:
- Around line 70-88: The masks field in SegmentationCompletedEvent is annotated
as list[Any] but the docstring and the producer (transformers_segmentation) emit
numpy arrays of shape (H, W); change the annotation on
SegmentationCompletedEvent.masks from list[Any] to list[np.ndarray] (import
numpy as np if necessary) so callers get concrete typing and update any related
type imports/usages to avoid Any while preserving default_factory=list and
runtime behavior.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py`:
- Around line 142-160: In _load_model_sync move the
AutoImageProcessor.from_pretrained call inside the same _model_load_lock used
for AutoModelForObjectDetection so both from_pretrained calls are serialized;
specifically, acquire _model_load_lock around the model load
(AutoModelForObjectDetection.from_pretrained) and the image processor load
(AutoImageProcessor.from_pretrained) in the _load_model_sync method so the
hub/cache code paths are not invoked concurrently, then release the lock before
calling model.eval and returning DetectionResources (retain using
first_param.device/first_param.dtype).

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 164-167: Move the transformers imports out of the function and
into the module top: remove the local import of AutoModelForMaskGeneration and
AutoProcessor inside _load_model_sync and instead add "from transformers import
AutoModelForMaskGeneration, AutoProcessor" at the top of the file; keep using
the same symbols in the _load_model_sync method (and ensure any
optional-dependency try/except logic remains in
vision_agents/plugins/huggingface/__init__.py so no additional runtime checks
are needed).
- Around line 309-319: The code creates a list of masks using a comprehension
masks=[m for m in masks], which is redundant and less clear; replace that
comprehension with list(masks) when constructing the SegmentationCompletedEvent
in the call to SegmentationCompletedEvent (reference: variable masks, event
sender self.events.send, and class SegmentationCompletedEvent) so the masks are
converted to a list more succinctly and clearly (the upstream _segment return
remains unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ece20194-20e8-4ed6-877c-feebe480a5e9

📥 Commits

Reviewing files that changed from the base of the PR and between d887697 and 59048cc.

📒 Files selected for processing (7)
  • plugins/huggingface/examples/transformers/transformers_segmentation_example.py
  • plugins/huggingface/tests/test_transformers_segmentation.py
  • plugins/huggingface/vision_agents/plugins/huggingface/__init__.py
  • plugins/huggingface/vision_agents/plugins/huggingface/events.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py

import io
import os
import pathlib
from unittest.mock import MagicMock
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

unittest.mock.MagicMock usage contradicts the repo's testing rules.

Every MagicMock(...) in this module (the agent_mock fixture, the _make_resources helper, and all tests inside TestSegmentWithMockedModel / TestSegmentationAnnotation that rely on mocked HF model/processor return values) violates the testing guideline. Since SAM2 output shapes are quite specific, the cleanest replacement is a hand-rolled fake class exposing .pred_masks, .iou_scores, .post_process_masks(...) — no MagicMock, just small deterministic Python objects — or moving these cases under @pytest.mark.integration and running the real tiny model as the integration block already does.

As per coding guidelines: "Framework: pytest. Never mock." and "Never mock in tests; use pytest for testing".

Also applies to: 47-50, 59-163

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_segmentation.py` at line 7,
Replace all uses of unittest.mock.MagicMock in this module (notably the
agent_mock fixture, the _make_resources helper, and tests in
TestSegmentWithMockedModel and TestSegmentationAnnotation) with either: (A) a
small deterministic fake class that mimics the SAM2 model/processor API by
exposing attributes and methods .pred_masks, .iou_scores, and
post_process_masks(...) returning correctly shaped arrays/values matching SAM2
expectations; or (B) convert those tests to `@pytest.mark.integration` and run the
real tiny HF/SAM2 model as the existing integration test does. Update places
referencing agent_mock and _make_resources to instantiate the fake object (or
call the real model) so tests no longer import or use MagicMock.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +59 to +194
class TestSegmentWithMockedModel:
"""Test _segment() with mocked SAM model and processor."""

def _make_resources(
self, num_objects: int = 2, img_h: int = 480, img_w: int = 640
) -> SegmentationResources:
model = MagicMock()
pred_masks = torch.randn(1, num_objects, 1, 256, 256)
iou_scores = torch.tensor([[[0.85], [0.92]]])[:, :num_objects, :]
model.return_value = MagicMock(pred_masks=pred_masks, iou_scores=iou_scores)

processor = MagicMock()
processor.return_value = {
"pixel_values": torch.zeros(1, 3, 1024, 1024),
"original_sizes": torch.tensor([[img_h, img_w]]),
"reshaped_input_sizes": torch.tensor([[1024, 1024]]),
"input_boxes": torch.zeros(1, num_objects, 4),
}
binary_masks = torch.ones(num_objects, 1, img_h, img_w, dtype=torch.bool)
processor.post_process_masks.return_value = [binary_masks]

return SegmentationResources(
model=model,
processor=processor,
device=torch.device("cpu"),
dtype=torch.float32,
)

def test_segment_returns_objects_and_masks(self):
resources = self._make_resources(num_objects=2)
processor = TransformersSegmentationProcessor()
processor.on_warmed_up(resources)

image = np.zeros((480, 640, 3), dtype=np.uint8)
objects, masks = processor._segment(image, FAKE_DETECTIONS)

assert len(objects) == 2
assert objects[0]["label"] == "cat"
assert objects[1]["label"] == "dog"
assert objects[0]["confidence"] == 0.85
assert objects[1]["confidence"] == 0.92
assert objects[0]["mask_area"] == 480 * 640
assert masks.shape == (2, 480, 640)

def test_segment_passes_correct_boxes_to_processor(self):
resources = self._make_resources(num_objects=2)
processor = TransformersSegmentationProcessor()
processor.on_warmed_up(resources)

image = np.zeros((480, 640, 3), dtype=np.uint8)
processor._segment(image, FAKE_DETECTIONS)

call_kwargs = resources.processor.call_args
input_boxes = call_kwargs.kwargs.get(
"input_boxes", call_kwargs.args[0] if call_kwargs.args else None
)
assert input_boxes == [[[10, 20, 100, 200], [150, 60, 300, 250]]]

def test_segment_extracts_sizes_before_device_move(self):
resources = self._make_resources(num_objects=1)
processor = TransformersSegmentationProcessor()
processor.on_warmed_up(resources)

image = np.zeros((480, 640, 3), dtype=np.uint8)
processor._segment(image, FAKE_DETECTIONS[:1])

post_process_call = resources.processor.post_process_masks.call_args
original_sizes = post_process_call.args[1]
assert isinstance(original_sizes, torch.Tensor)
assert original_sizes.device == torch.device("cpu")

def test_segment_casts_float_inputs_to_model_dtype(self):
resources = self._make_resources(num_objects=1)
resources.dtype = torch.float16
processor = TransformersSegmentationProcessor()
processor.on_warmed_up(resources)

image = np.zeros((480, 640, 3), dtype=np.uint8)
processor._segment(image, FAKE_DETECTIONS[:1])

call_kwargs = resources.model.call_args
pixel_values = call_kwargs.kwargs.get(
"pixel_values", call_kwargs.args[0] if call_kwargs.args else None
)
if pixel_values is not None and isinstance(pixel_values, torch.Tensor):
assert pixel_values.dtype == torch.float16

def test_segment_empty_detections(self):
resources = self._make_resources(num_objects=0)
processor = TransformersSegmentationProcessor()
processor.on_warmed_up(resources)

image = np.zeros((480, 640, 3), dtype=np.uint8)
objects, masks = processor._segment(image, [])

assert objects == []
resources.model.assert_not_called()

def test_set_detections_updates_latest(self):
processor = TransformersSegmentationProcessor()
assert processor._latest_detections == []

processor.set_detections(FAKE_DETECTIONS)
assert len(processor._latest_detections) == 2
assert processor._latest_detections[0]["label"] == "cat"


class TestSegmentationAnnotation:
"""Test that annotation draws masks on the image."""

def test_annotate_changes_image(self):
processor = TransformersSegmentationProcessor(mask_opacity=0.5)

image = np.full((100, 100, 3), 128, dtype=np.uint8)
objects = [
FAKE_DETECTIONS[0],
]
masks = np.ones((1, 100, 100), dtype=bool)

annotated = processor._annotate(image, objects, masks)

assert annotated.shape == image.shape
assert (annotated != image).any()


SEGMENTATION_MODEL_ID = os.getenv(
"TRANSFORMERS_TEST_SEGMENTATION_MODEL", "facebook/sam2-hiera-tiny"
)
DETECTION_MODEL_ID = os.getenv(
"TRANSFORMERS_TEST_DETECTION_MODEL", "PekingU/rtdetr_v2_r18vd"
)


@pytest.mark.integration
@skip_blockbuster
class TestTransformersSegmentationProcessor:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Consolidate unit tests under TestTransformersSegmentationProcessor.

Unit tests for TransformersSegmentationProcessor are currently split across TestSegmentWithMockedModel, TestSegmentationAnnotation, and the integration-marked TestTransformersSegmentationProcessor. The guideline asks for a single class per subject under test. Suggest merging the unit cases into TestTransformersSegmentationProcessor (using nested classes or method naming to distinguish segment / annotation / integration groups) and keeping @pytest.mark.integration at the method level where appropriate instead of at class level.

As per coding guidelines: "Keep unit tests for the class under the same test class. For example, tests for Agent must be inside TestAgent. Do not spread them around different test classes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 59
- 194, Move all unit tests from TestSegmentWithMockedModel and
TestSegmentationAnnotation into the TestTransformersSegmentationProcessor test
class so all unit tests for TransformersSegmentationProcessor live in one place;
specifically relocate methods _make_resources,
test_segment_returns_objects_and_masks,
test_segment_passes_correct_boxes_to_processor,
test_segment_extracts_sizes_before_device_move,
test_segment_casts_float_inputs_to_model_dtype, test_segment_empty_detections,
test_set_detections_updates_latest, and test_annotate_changes_image into
TestTransformersSegmentationProcessor (you can use nested classes or method name
prefixes to separate "segment" vs "annotation" groups), remove the now-empty
TestSegmentWithMockedModel and TestSegmentationAnnotation classes, and ensure
the class-level `@pytest.mark.integration` is removed and any integration tests
keep `@pytest.mark.integration` at the method level only.

Comment on lines +268 to +295
async def test_no_detections_passes_through(self, agent_mock, events_manager):
"""Without detection events, frames pass through unmodified."""
segmentation = TransformersSegmentationProcessor(
model=SEGMENTATION_MODEL_ID, fps=10
)
await segmentation.warmup()
segmentation.attach_agent(agent_mock)

seg_future: asyncio.Future[SegmentationCompletedEvent] = asyncio.Future()

@events_manager.subscribe
async def on_seg_event(event: SegmentationCompletedEvent):
if not seg_future.done():
seg_future.set_result(event)

input_track = QueuedVideoTrack()
original_frame = await input_track.recv()
assert isinstance(original_frame, VideoFrame)

output_track = segmentation.publish_video_track()
await segmentation.process_video(input_track, "user_id")

with pytest.raises(asyncio.TimeoutError):
await asyncio.wait_for(seg_future, 5)

output_frame = await output_track.recv()
assert isinstance(output_frame, VideoFrame)
assert (original_frame.to_ndarray() == output_frame.to_ndarray()).all()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm QueuedVideoTrack.recv() behavior on an empty queue.
fd -t f video_track.py
rg -nP -C10 '(class QueuedVideoTrack|async def recv)' --type=py

Repository: GetStream/Vision-Agents

Length of output: 25353


🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '63,85p'

Repository: GetStream/Vision-Agents

Length of output: 1026


🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '29,45p'

Repository: GetStream/Vision-Agents

Length of output: 750


🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '29,52p'

Repository: GetStream/Vision-Agents

Length of output: 1111


🏁 Script executed:

cat -n agents-core/vision_agents/core/utils/video_track.py | sed -n '63,95p'

Repository: GetStream/Vision-Agents

Length of output: 1297


🏁 Script executed:

cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '29,38p'

Repository: GetStream/Vision-Agents

Length of output: 565


🏁 Script executed:

cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '268,295p'

Repository: GetStream/Vision-Agents

Length of output: 1436


🏁 Script executed:

cat -n plugins/huggingface/tests/test_transformers_segmentation.py | sed -n '268,300p'

Repository: GetStream/Vision-Agents

Length of output: 1529


test_no_detections_passes_through compares empty placeholder frames instead of real video content.

QueuedVideoTrack() on line 283 is created empty. When await input_track.recv() is called on line 284, it will wait for up to 1 second for a frame, then catch the timeout and return self.last_frame—which is initialized as an empty blue placeholder frame, not real input data.

The test does not hang, but original_frame becomes this placeholder rather than actual video content. Comparing this empty frame at line 295 does not test whether frames pass through unmodified. Add frames to input_track before processing (following the pattern from cat_video_track), or synthesize and add a real VideoFrame to the track.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 268
- 295, The test creates a QueuedVideoTrack but never injects a real frame, so
original_frame is the default placeholder; fix by adding a real VideoFrame to
input_track before calling segmentation.process_video (follow the existing
cat_video_track pattern or synthesize a VideoFrame and push it into
QueuedVideoTrack via its add/enqueue method), then call await input_track.recv()
to capture that real frame and continue the existing assertions (references:
test_no_detections_passes_through, QueuedVideoTrack, input_track.recv(),
segmentation.process_video, output_track.recv()).

Comment on lines +106 to +120
def __init__(
self,
model: str = "facebook/sam2.1-hiera-tiny",
fps: int = 5,
device: DeviceType = "auto",
torch_dtype: TorchDtypeType = "auto",
annotate: bool = True,
mask_opacity: float = 0.35,
mask_threshold: float = 0.0,
):
if not 0 <= mask_opacity <= 1.0:
raise ValueError("mask_opacity must be between 0 and 1.")

self.model_id = model
self.fps = fps
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Also validate fps.

mask_opacity is range-checked in the constructor, but fps flows directly into QueuedVideoTrack(fps=self.fps, max_queue_size=self.fps) and VideoForwarder(..., fps=self.fps). A zero or negative fps will silently produce a broken track / zero-sized queue. A one-line guard mirrors the existing input validation style.

🛡️ Proposed guard
         if not 0 <= mask_opacity <= 1.0:
             raise ValueError("mask_opacity must be between 0 and 1.")
+        if fps <= 0:
+            raise ValueError("fps must be positive.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __init__(
self,
model: str = "facebook/sam2.1-hiera-tiny",
fps: int = 5,
device: DeviceType = "auto",
torch_dtype: TorchDtypeType = "auto",
annotate: bool = True,
mask_opacity: float = 0.35,
mask_threshold: float = 0.0,
):
if not 0 <= mask_opacity <= 1.0:
raise ValueError("mask_opacity must be between 0 and 1.")
self.model_id = model
self.fps = fps
def __init__(
self,
model: str = "facebook/sam2.1-hiera-tiny",
fps: int = 5,
device: DeviceType = "auto",
torch_dtype: TorchDtypeType = "auto",
annotate: bool = True,
mask_opacity: float = 0.35,
mask_threshold: float = 0.0,
):
if not 0 <= mask_opacity <= 1.0:
raise ValueError("mask_opacity must be between 0 and 1.")
if fps <= 0:
raise ValueError("fps must be positive.")
self.model_id = model
self.fps = fps
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 106 - 120, The constructor currently validates mask_opacity but not
fps; add a one-line guard in __init__ (in transformers_segmentation.py) to
validate fps (e.g. if not isinstance(fps, int) or fps <= 0: raise
ValueError("fps must be a positive integer")) before assigning self.fps so
invalid values can't flow into QueuedVideoTrack(fps=self.fps,
max_queue_size=self.fps) or VideoForwarder(..., fps=self.fps).

Comment on lines +237 to +243
def attach_agent(self, agent: Agent) -> None:
self._events = agent.events
self._events.register(DetectionCompletedEvent, SegmentationCompletedEvent)

@self._events.subscribe
async def on_detection(event: DetectionCompletedEvent) -> None:
self._latest_detections = list(event.objects)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check what unsubscribe mechanism EventManager offers.
fd -t f events.py --full-path 'agents-core'
rg -nP -C3 '(def subscribe|def unsubscribe|def remove_handler)' --type=py -g '**/events.py' -g '**/event_manager*.py'

Repository: GetStream/Vision-Agents

Length of output: 396


🏁 Script executed:

cat -n agents-core/vision_agents/core/agents/events.py | head -100

Repository: GetStream/Vision-Agents

Length of output: 2219


🏁 Script executed:

rg -n "class EventManager|def subscribe|def unsubscribe|def remove" agents-core/vision_agents/core/agents/events.py -A 5

Repository: GetStream/Vision-Agents

Length of output: 49


🏁 Script executed:

cat -n plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py | sed -n '200,300p'

Repository: GetStream/Vision-Agents

Length of output: 4480


🏁 Script executed:

rg "class EventManager" --type=py -B 2 -A 15

Repository: GetStream/Vision-Agents

Length of output: 1564


🏁 Script executed:

fd -type f "event_manager.py"

Repository: GetStream/Vision-Agents

Length of output: 237


🏁 Script executed:

rg "def subscribe|def unsubscribe" --type=py -B 1 -A 10 agents-core/vision_agents/core/agents/events.py

Repository: GetStream/Vision-Agents

Length of output: 49


🏁 Script executed:

cat agents-core/vision_agents/core/events/manager.py | head -200

Repository: GetStream/Vision-Agents

Length of output: 7296


🏁 Script executed:

rg "def subscribe|def unsubscribe|def remove_handler|def unregister_handler" agents-core/vision_agents/core/events/manager.py -A 8

Repository: GetStream/Vision-Agents

Length of output: 682


🏁 Script executed:

rg "def subscribe|def unsubscribe" agents-core/vision_agents/core/events/manager.py -A 25

Repository: GetStream/Vision-Agents

Length of output: 1940


🏁 Script executed:

rg "def subscribe" agents-core/vision_agents/core/events/manager.py -A 40 | head -60

Repository: GetStream/Vision-Agents

Length of output: 1712


🏁 Script executed:

cat agents-core/vision_agents/core/events/manager.py | sed -n '290,360p'

Repository: GetStream/Vision-Agents

Length of output: 2754


🏁 Script executed:

rg "@self._events.subscribe|@.*\.events\.subscribe" plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py -B 2 -A 5

Repository: GetStream/Vision-Agents

Length of output: 1146


🏁 Script executed:

cat agents-core/vision_agents/core/events/manager.py | sed -n '340,380p'

Repository: GetStream/Vision-Agents

Length of output: 1802


🏁 Script executed:

cat plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py | sed -n '1,50p'

Repository: GetStream/Vision-Agents

Length of output: 1630


on_detection handler should be unsubscribed in close().

The attach_agent method registers an event handler as a local closure that captures self. Since EventManager.subscribe() returns the function, you can store the reference and call self._events.unsubscribe(handler) in close(). Without explicit unsubscription, the handler remains in the event manager's handler list indefinitely, preventing garbage collection of the processor and its resources if the agent outlives this processor instance (e.g., hot-reload or re-attach scenarios).

Although _process_frame includes a _closed guard, this doesn't remove the handler from memory—it only prevents it from processing. Store the handler reference during attach_agent and unsubscribe it in close():

Fix approach
def attach_agent(self, agent: Agent) -> None:
    self._events = agent.events
    self._events.register(DetectionCompletedEvent, SegmentationCompletedEvent)

    `@self._events.subscribe`
    async def on_detection(event: DetectionCompletedEvent) -> None:
        self._latest_detections = list(event.objects)
    
    self._on_detection_handler = on_detection

async def close(self) -> None:
    await self.stop_processing()
    if self._on_detection_handler is not None:
        self._events.unsubscribe(self._on_detection_handler)
    self._closed = True
    # ... rest of close() ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 237 - 243, attach_agent defines an on_detection closure via
self._events.subscribe that captures self but is never unsubscribed, so store
the returned handler reference (e.g., self._on_detection_handler) when
subscribing in attach_agent and then unsubscribe it in close by calling
self._events.unsubscribe(self._on_detection_handler) (ensure you check for None
and that self._events is set), before setting self._closed; reference symbols:
attach_agent, on_detection, self._on_detection_handler, close,
self._events.unsubscribe.

Comment on lines +331 to +391
def _segment(
self,
image: np.ndarray,
detections: list[DetectedObject],
) -> tuple[list[SegmentedObject], np.ndarray]:
"""Run segmentation on a single frame (called in thread pool)."""
resources = self._resources
if resources is None or not detections:
return [], np.empty(0)

pil_image = Image.fromarray(image)
boxes = [[d["x1"], d["y1"], d["x2"], d["y2"]] for d in detections]

inputs = resources.processor(
images=pil_image, input_boxes=[boxes], return_tensors="pt"
)

original_sizes = inputs.pop("original_sizes")
reshaped_input_sizes = inputs.pop("reshaped_input_sizes")

inputs = {
k: (
v.to(device=resources.device, dtype=resources.dtype)
if v.is_floating_point()
else v.to(device=resources.device)
)
if isinstance(v, torch.Tensor)
else v
for k, v in inputs.items()
}

with torch.no_grad():
outputs = resources.model(**inputs, multimask_output=False)

masks_tensor = resources.processor.post_process_masks(
outputs.pred_masks.cpu(),
original_sizes,
reshaped_input_sizes=reshaped_input_sizes,
binarize=True,
mask_threshold=self._mask_threshold,
)

per_object_masks = masks_tensor[0].squeeze(1).numpy()
iou_scores = outputs.iou_scores[0].squeeze(-1).cpu()

objects: list[SegmentedObject] = []
for i, detection in enumerate(detections):
mask = per_object_masks[i]
objects.append(
SegmentedObject(
label=detection["label"],
confidence=round(iou_scores[i].item(), 4),
x1=detection["x1"],
y1=detection["y1"],
x2=detection["x2"],
y2=detection["y2"],
mask_area=int(mask.sum()),
)
)

return objects, per_object_masks
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Do Hugging Face SAM-HQ and EdgeTAM checkpoints return pred_masks and iou_scores in the same shape as SAM/SAM2 via AutoModelForMaskGeneration?

💡 Result:

No, Hugging Face SAM-HQ and EdgeTAM checkpoints do not return pred_masks and iou_scores in exactly the same shape as SAM/SAM2 via AutoModelForMaskGeneration. SAM/SAM2 (SamModel/Sam2Model): pred_masks shape (batch_size, num_masks, height, width), iou_scores (batch_size, num_masks). For single point prompt, num_masks=3 (or 1 if multimask_output=False), no point_batch_size dimension. SAM-HQ (SamHQModel): Similar usage and code examples (outputs.pred_masks, outputs.iou_scores post-processed identically), but source code shows potential for (batch_size, num_prompts, num_masks, height, width) due to HQ token modifications. Docs/examples treat as compatible. EdgeTAM (EdgeTamModel): pred_masks (batch_size, point_batch_size, num_masks, height, width), iou_scores (batch_size, point_batch_size, num_masks). Includes explicit point_batch_size dimension (e.g., tensor shapes in examples show 3 masks post-processed, but raw includes batching dim), differing from SAM/SAM2 for batched points. All use compatible processors (SamProcessor/Sam2Processor) and post_process_masks, but raw output shapes vary slightly due to architectural differences (SAM-HQ enhancements, EdgeTAM SAM2 adaptation with perceiver). For single prompts, post-processed masks/iou_scores align, but raw pred_masks/iou_scores shapes differ for EdgeTAM.

Citations:


_segment assumes SAM/SAM2 output shapes — not compatible with all advertised models.

The code directly indexes raw outputs — outputs.pred_masks[0], masks_tensor[0].squeeze(1), and outputs.iou_scores[0].squeeze(-1) — which assumes batch-first layout matching SAM/SAM2. However, EdgeTAM outputs pred_masks with shape (batch_size, point_batch_size, num_masks, H, W) and iou_scores with (batch_size, point_batch_size, num_masks) due to its perceiver-based architecture, introducing an extra point_batch_size dimension that breaks the squeeze chain. Since the docstring advertises support for "SAM2, SAM-HQ, EdgeTAM, etc.", either narrow the docstring to SAM/SAM-HQ only or add a defensive shape assertion before indexing/squeezing to clarify supported models and fail clearly on unsupported variants.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 331 - 391, The _segment function assumes SAM/SAM2 output shapes and
currently indexes outputs.pred_masks, masks_tensor, and outputs.iou_scores as if
they are (batch, num_masks, H, W) and (batch, num_masks) respectively; add a
defensive shape check after running the model (inspect outputs.pred_masks.shape
and outputs.iou_scores.shape and masks_tensor.shape) to assert they match the
expected 4D/2D layouts (e.g., batch_size x num_masks x H x W and batch_size x
num_masks) and raise a clear error mentioning _segment and the model class if an
unexpected extra dimension like point_batch_size is present (suggested message:
unsupported model output layout for EdgeTAM/perceiver-based models); update the
docstring for _segment to state explicitly which model output layouts are
supported (SAM/SAM2/SAM-HQ) so callers know the compatibility.

import io
import os
import pathlib
from unittest.mock import MagicMock
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugins/huggingface/tests/test_transformers_segmentation.py (1)

231-232: Redundant is not None on inference_time_ms.

SegmentationCompletedEvent.inference_time_ms is a float populated unconditionally in _process_frame, so assert result.inference_time_ms is not None followed by > 0 is slightly belt-and-braces. Dropping the first line keeps the intent (> 0) and avoids implying the field is nullable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 231
- 232, Remove the redundant null-check assertion on inference_time_ms: in the
test in plugins/huggingface/tests/test_transformers_segmentation.py drop the
line asserting result.inference_time_ms is not None and keep only the numeric
check (assert result.inference_time_ms > 0), since
SegmentationCompletedEvent.inference_time_ms is always populated by
_process_frame.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 222-226: The test races two independent forwarders over the same
QueuedVideoTrack because detection.process_video(cat_video_track, ...) and
segmentation.process_video(cat_video_track, ...) are called without a
shared_forwarder; each call creates its own VideoForwarder and both call recv(),
causing frames to be dequeued by only one consumer. Fix by wiring both
processors to the same forwarder: create or reuse a single
VideoForwarder/shared_forwarder (or use the Agent-managed forwarder) and pass
that shared_forwarder into both detection.process_video(...) and
segmentation.process_video(...) so detection and segmentation deterministically
see the same frames in order.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`:
- Around line 212-218: The close() method races with in-flight _segment calls
because self._executor.shutdown(wait=False) returns immediately while worker
threads may still be executing model(**inputs); change the shutdown logic in
transformers_segmentation.py::close (and the analogous close in
transformers_detection.py) to wait for running tasks to finish before unloading
resources: either call self._executor.shutdown(wait=True) or explicitly
await/track the current in-flight Future(s) and ensure they complete prior to
calling self.unload()/torch.cuda.empty_cache(), then set _closed and stop tracks
after shutdown to avoid tearing down CUDA while model execution is ongoing.

---

Nitpick comments:
In `@plugins/huggingface/tests/test_transformers_segmentation.py`:
- Around line 231-232: Remove the redundant null-check assertion on
inference_time_ms: in the test in
plugins/huggingface/tests/test_transformers_segmentation.py drop the line
asserting result.inference_time_ms is not None and keep only the numeric check
(assert result.inference_time_ms > 0), since
SegmentationCompletedEvent.inference_time_ms is always populated by
_process_frame.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94525557-3869-47d2-9175-b962e52cab37

📥 Commits

Reviewing files that changed from the base of the PR and between 59048cc and 1fe0f14.

📒 Files selected for processing (4)
  • plugins/huggingface/tests/test_transformers_segmentation.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_detection.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/huggingface/vision_agents/plugins/huggingface/transformers_llm.py

Comment on lines +222 to +226
seg_output_track = segmentation.publish_video_track()
detection.publish_video_track()
await detection.process_video(cat_video_track, "user_id")
await segmentation.process_video(cat_video_track, "user_id")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Two forwarders racing on one cat_video_track.

detection.process_video(cat_video_track, ...) and segmentation.process_video(cat_video_track, ...) are called without a shared_forwarder, so each processor spins up its own VideoForwarder, and both end up calling recv() on the same QueuedVideoTrack. Since recv() dequeues, every frame goes to only one of the two forwarders — detection and segmentation see disjoint interleavings of the 100 queued frames. The test likely still passes because 100 frames is enough for both to eventually process one each, but the cache-and-chain contract (detection produces a box, then the segmenter segments that same frame) isn't actually being exercised deterministically. Consider sharing a single VideoForwarder between the two processors, or wiring them through the Agent-managed forwarder the way production does.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/huggingface/tests/test_transformers_segmentation.py` around lines 222
- 226, The test races two independent forwarders over the same QueuedVideoTrack
because detection.process_video(cat_video_track, ...) and
segmentation.process_video(cat_video_track, ...) are called without a
shared_forwarder; each call creates its own VideoForwarder and both call recv(),
causing frames to be dequeued by only one consumer. Fix by wiring both
processors to the same forwarder: create or reuse a single
VideoForwarder/shared_forwarder (or use the Agent-managed forwarder) and pass
that shared_forwarder into both detection.process_video(...) and
segmentation.process_video(...) so detection and segmentation deterministically
see the same frames in order.

Comment on lines +212 to +218
async def close(self) -> None:
await self.stop_processing()
self._closed = True
self._executor.shutdown(wait=False)
self._video_track.stop()
self.unload()
logger.info("Transformers segmentation processor closed")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

close() races with in-flight _segment calls.

self._executor.shutdown(wait=False) returns immediately, so a running _segment task keeps executing in the worker thread while self.unload() drops self._resources and calls gc.collect() / torch.cuda.empty_cache(). The in-flight call already captured resources locally on line 326 so it won't NPE, but tearing down CUDA caches while a worker thread is still doing model(**inputs) is a recipe for intermittent CUDA errors / device-side asserts in production shutdown paths. Consider shutdown(wait=True) (or awaiting the current future) before unload().

🛡️ Proposed fix
     async def close(self) -> None:
         await self.stop_processing()
         self._closed = True
-        self._executor.shutdown(wait=False)
+        # Drain in-flight inference before releasing GPU memory.
+        await asyncio.to_thread(self._executor.shutdown, True)
         self._video_track.stop()
         self.unload()
         logger.info("Transformers segmentation processor closed")

Same pattern exists in transformers_detection.py::close (line 189) — worth fixing both together.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@plugins/huggingface/vision_agents/plugins/huggingface/transformers_segmentation.py`
around lines 212 - 218, The close() method races with in-flight _segment calls
because self._executor.shutdown(wait=False) returns immediately while worker
threads may still be executing model(**inputs); change the shutdown logic in
transformers_segmentation.py::close (and the analogous close in
transformers_detection.py) to wait for running tasks to finish before unloading
resources: either call self._executor.shutdown(wait=True) or explicitly
await/track the current in-flight Future(s) and ensure they complete prior to
calling self.unload()/torch.cuda.empty_cache(), then set _closed and stop tracks
after shutdown to avoid tearing down CUDA while model execution is ongoing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants