feat(encoder): add Qwen3-VL vision encoder modeling by sungwook-son · Pull Request #729 · ai-dynamo/aiconfigurator

sungwook-son · 2026-04-13T08:06:06Z

Overview:

Adds end-to-end latency estimation for Qwen3-VL family vision-language models. This includes the full ViT encoder pipeline attention, FFN, PatchMerger MLP, and projection to LLM and correctly feeds post-merge image tokens into the LLM backbone ISL so that TTFT accounts for both encoder and prefill compute.

Details:

common.py: Add VisionEncoderConfig dataclass to carry ViT architecture parameters (patch_size, depth, heads, hidden_size, spatial_merge_size, deepstack_visual_indexes). Register Qwen3VLForConditionalGeneration / Qwen3VLMoeForConditionalGeneration in MULTIMODAL_TEXT_CONFIG_KEY and ARCHITECTURE_TO_MODEL_FAMILY.
config.py: Add image_height, image_width, num_images_per_request, num_image_tokens to RuntimeConfig.
utils.py: Capture vision_config before the text_config unwrap so ViT parameters are not lost. Parse into VisionEncoderConfig and store in extra_params for Qwen3VL architectures, including deepstack_visual_indexes.
models.py: Add Qwen3VLModel with 11 encoder ops covering the full ViT + PatchMerger pipeline. Fix PatchMerger from a single incorrect GEMM (k=1152, n=1) to the correct 3-op MLP (fc1/act/fc2, k=4608, n_mergers=4 for 32B).
base_backend.py: Add _run_encoder() inside run_static() with a pre-merge / post-merge token split — ViT transformer ops use (H // patch_size)² tokens, the projection op uses post-merge count. img_ctx_tokens (post-merge × n_images) is added to LLM ISL so context and generation phases reflect actual sequence length. TTFT = encoder latency + context latency.
inference_summary.py: Surface encoder latency, energy, and power as first-class outputs via get_encoder_latency_dict().
Model configs: Add cached HF config.json for Qwen3-VL 2B / 4B / 8B / 32B / 32B-Thinking / 30B-A3B / 235B-A22B.
Tests: Add tests/unit/sdk/backends/test_encoder.py covering token count formula, PatchMerger shape, and encoder latency
integration (55 tests total, all passing).

Where should the reviewer start?

src/aiconfigurator/sdk/backends/base_backend.py — _run_encoder() and the TTFT aggregation
src/aiconfigurator/sdk/models.py — Qwen3VLModel op list and PatchMerger shape
tests/unit/sdk/backends/test_encoder.py — validates token formulas and latency outputs

Summary by CodeRabbit

Release Notes

New Features
- Added support for Qwen3-VL multimodal models (2B, 4B, 8B, 30B, 32B, and 235B variants)
- Introduced vision encoder configuration and performance metrics for vision-language model inference
- Added image and video input parameters to runtime configuration
Improvements
- Enhanced disaggregation reporting with improved MoE parallelism metadata
- Extended inference summary to track encoder performance alongside context and generation phases
Tests
- Added comprehensive test coverage for vision encoder functionality and multimodal model configurations

copy-pr-bot · 2026-04-13T08:06:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-13T08:14:51Z

Walkthrough

This pull request adds comprehensive Vision-Language (VL) model support by introducing a vision encoder infrastructure with configurable VisionEncoderConfig, new model implementations for Qwen3-VL variants, encoder phase execution in backends, extended inference tracking, seven new Qwen3-VL model configurations, and attention benchmarking updates to include head_dim=72.

Changes

Cohort / File(s)	Summary
Attention Benchmarking `collector/sglang/collect_attn.py`, `collector/trtllm/collect_attn.py`, `collector/vllm/collect_attn.py`	Updated test-case generation to iterate over `head_dim_list = [72, 128]` instead of fixed `128`, replacing hardcoded head dimensions in test tuples and memory-constraint calculations with loop variable `h`.
Vision Encoder Configuration `src/aiconfigurator/model_configs/Qwen--Qwen3-VL-*_config.json`	Added 7 new JSON model configuration files for Qwen3-VL variants (2B, 4B, 8B, 30B, 32B, 32B-Thinking, 235B), defining architectures (`Qwen3VLForConditionalGeneration`/`Qwen3VLMoeForConditionalGeneration`), text and vision configs, and multimodal token IDs.
Core VL Infrastructure `src/aiconfigurator/sdk/common.py`, `src/aiconfigurator/sdk/config.py`, `src/aiconfigurator/sdk/utils.py`, `src/aiconfigurator/sdk/models.py`	Added `VisionEncoderConfig` frozen dataclass for vision encoder hyperparameters, extended `RuntimeConfig` with image dimensions and count fields, added Qwen3-VL model IDs to defaults, updated architecture-to-family mappings, introduced `Qwen3VLModel` and `Qwen3VLMoEModel` classes with encoder ops pipelines, and extended HF config parsing to extract vision configs.
Backend Encoder Integration `src/aiconfigurator/sdk/backends/base_backend.py`, `src/aiconfigurator/sdk/backends/sglang_backend.py`, `src/aiconfigurator/sdk/backends/trtllm_backend.py`	Implemented encoder phase execution via `_run_encoder()` helper in base backend, added image context token contributions to ISL for context/generation phases, incorporated encoder latency/energy into total accounting and TTFT computation, and extended weight memory tracking to include encoder ops in both backend implementations.
Inference Tracking `src/aiconfigurator/sdk/inference_session.py`, `src/aiconfigurator/sdk/inference_summary.py`, `src/aiconfigurator/sdk/pareto_analysis.py`, `src/aiconfigurator/sdk/picking.py`, `src/aiconfigurator/sdk/task.py`	Added optional encoder database/backend support to `DisaggInferenceSession`, introduced encoder latency correction scaling, added encoder latency/energy tracking to `InferenceSummary` with accessors/mutators, wired encoder database/backend through pareto analysis, extended disagg summary metadata with encoder-related fields, and plumbed encoder parameters through task runner.
Report Generation & UI `src/aiconfigurator/cli/report_and_save.py`, `src/aiconfigurator/webapp/events/event_fn.py`, `src/aiconfigurator/webapp/events/event_handler.py`	Updated table rendering to derive and format additional parallelism components from optional dataframe fields for MoE/disagg displays, extended inference UI to destructure encoder info and render encoder breakdown markdown, and added encoder breakdown box to event handler outputs.
Performance Data `src/aiconfigurator/systems/data/a100_sxm/vllm/0.14.0/context_attention_perf.txt`, `src/aiconfigurator/systems/data/a100_sxm/vllm/0.14.0/generation_attention_perf.txt`	Updated Git LFS pointer hashes and file sizes, indicating underlying performance artifacts were regenerated (likely due to head_dim=72 test cases).
Test Coverage `tests/unit/sdk/backends/test_encoder.py`, `tests/unit/sdk/models/test_model_config.py`, `tests/unit/sdk/test_common.py`, `tests/unit/sdk/test_utils.py`	Added comprehensive encoder unit tests covering image token counting, merger operations, varlen attention, runtime config fields, and `InferenceSummary` encoder accessors; extended model config tests to validate Qwen3VL registration, config loading, and model instantiation; added column ordering test for encoder latency; introduced Qwen3VL HF config parsing tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A vision emerges from patches and prose,
Where encoders now bloom and the latency flows,
Through Qwen's multimodal maze we now stride,
With mergers and heads at the encoder's side,
The VL dreams take shape—hop, skip, and collide! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat(encoder): add Qwen3-VL vision encoder modeling' clearly and concisely summarizes the main change: adding vision encoder support for Qwen3-VL models. It uses conventional commit format and is specific enough to understand the primary contribution.
Description check	✅ Passed	The PR description comprehensively covers all required sections: Overview explains the vision encoder addition and TTFT integration; Details list specific changes across multiple files with technical depth; Where should the reviewer start section provides clear guidance on critical files and review order.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

collector/trtllm/collect_attn.py (1)

298-316: ⚠️ Potential issue | 🟡 Minor

Use h in the INT32 overflow guard to avoid under-collecting head_dim=72 cases.

At Line 315, the guard still uses a fixed 128, so newly added 72 configs can be skipped unnecessarily.

Suggested fix

-                        if b * s * num_kv_heads * 128 * 2 >= 2147483647:
+                        if b * s * num_kv_heads * h * 2 >= 2147483647:
                             continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@collector/trtllm/collect_attn.py` around lines 298 - 316, The INT32 overflow
check currently multiplies by a hard-coded 128 which ignores the loop variable
head size and causes configurations with head_dim=72 to be skipped; update the
overflow guard in the nested loops (the block iterating head_dim, n, s, b, n_kv
and the check that currently reads something like b * s * num_kv_heads * 128 * 2
>= 2147483647) to use the loop variable h instead of the constant 128 (i.e., use
b * s * num_kv_heads * h * 2 >= 2147483647) so all head_dim values are evaluated
correctly.

src/aiconfigurator/webapp/events/event_fn.py (1)

230-236: ⚠️ Potential issue | 🟡 Minor

Keep the error message on the summary output.

After the new encoder box was inserted, output #1 is the summary and output #2 is the encoder breakdown, but the error tuple still puts "ERROR!!!" in slot 2. On failures the summary box stays blank and the encoder box shows the error instead.

Suggested fix

         if is_error:
             return (
-                gr.update(value=""),
-                gr.update(value="ERROR!!!"),
+                gr.update(value="ERROR!!!"),
+                gr.update(value=""),
                 gr.update(value=""),
                 gr.update(value=""),
                 gr.update(value=record_df),
                 gr.update(value=stdout_text + stderr_text + traceback_log),
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/webapp/events/event_fn.py` around lines 230 - 236, The
tuple of gr.update outputs in the handler puts the error message ("ERROR!!!")
into the second output (encoder) rather than the first (summary); update the
return tuple in the function in event_fn.py so that the error string
(stdout_text + stderr_text + traceback_log or "ERROR!!!") is assigned to the
first gr.update (summary) and the encoder breakdown gr.update receives the
appropriate encoder value (or empty string), i.e., swap/reorder the
gr.update(...) entries so summary output is populated with the error and the
encoder output is the second element.

src/aiconfigurator/sdk/inference_summary.py (1)

248-266: ⚠️ Potential issue | 🟡 Minor

Docstring copy-paste error: says "context" but should say "encoder".

Both set_encoder_power_avg and get_encoder_power_avg have docstrings that incorrectly say "context phase average power" instead of "encoder phase average power".

📝 Proposed fix

     def set_encoder_power_avg(self, power_avg: float) -> None:
-        """Set context phase average power (watts)."""
+        """Set encoder phase average power (watts)."""
         self._encoder_power_avg = power_avg

     # ... other methods ...

     def get_encoder_power_avg(self) -> float:
-        """Get context phase average power (watts)."""
+        """Get encoder phase average power (watts)."""
         return self._encoder_power_avg

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/inference_summary.py` around lines 248 - 266,
Docstrings for set_encoder_power_avg and get_encoder_power_avg incorrectly say
"context phase average power"; update both docstrings to read "encoder phase
average power (watts)". Locate the methods set_encoder_power_avg and
get_encoder_power_avg in inference_summary.py and change their docstring text to
accurately reference the encoder phase so it matches the method purpose.

src/aiconfigurator/sdk/backends/base_backend.py (1)

296-319: ⚠️ Potential issue | 🟠 Major

Account for image context tokens in memory calculation for VL models.

The context phase processes isl + img_ctx_tokens tokens, but _get_memory_usage is called with the original isl. Since activation memory scales with token count (activations = 2 * num_tokens * h * constants), the memory estimate underestimates the activation footprint during context processing. Update memory calls in static_ctx, static_gen, and static modes to pass the augmented sequence length (or adjust num_tokens parameter explicitly).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/backends/base_backend.py` around lines 296 - 319, The
memory calculation underestimates activation usage for vision-language models
because _get_memory_usage is called with isl instead of the augmented token
count that includes img_ctx_tokens; update calls in the "static_ctx",
"static_gen", and "static" branches so _get_memory_usage receives isl +
img_ctx_tokens (or explicitly set num_tokens to include img_ctx_tokens, e.g.,
num_tokens=(isl + img_ctx_tokens) or num_tokens=batch_size*beam_width where
appropriate), ensuring any prefix/osl arguments remain unchanged; reference the
functions/variables _get_memory_usage, _run_encoder (img_ctx_tokens), and the
mode branches ("static_ctx", "static_gen", "static") to locate and modify the
calls.

🧹 Nitpick comments (8)

collector/vllm/collect_attn.py (1)

517-521: Move SM100 GQA-ratio validation to runtime error handling instead of preventive test case filtering.

The continue statement at line 520 pre-filters test cases during generation, reducing coverage and masking potential compatibility changes. The collector framework is designed to catch version-incompatible test cases at runtime (non-fatally) rather than preventively gate them during test case construction. Let the kernel fail at runtime and be caught by the collection error handler.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@collector/vllm/collect_attn.py` around lines 517 - 521, The current
preventive filter in collect_attn.py uses get_sm_version() and the n // n_kv >
16 check with a continue to skip test cases on SM100; remove that early gating
so test cases are generated regardless and let runtime errors be handled by the
collector's error handler (i.e., delete the conditional continue that references
get_sm_version(), n and n_kv), ensuring we do not pre-filter cases based on GQA
ratio and allowing trtllm_batch_decode_with_kv_cache failures to be caught
non-fatally at collection time.

src/aiconfigurator/sdk/task.py (1)

1334-1335: Drop placeholder encoder locals until they carry real values.

These variables are always None and only forwarded, which adds noise in run_disagg. Prefer passing real config-derived values or omitting these kwargs for now.

Also applies to: 1369-1370

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/task.py` around lines 1334 - 1335, Remove the
always-None placeholder locals encoder_database and encoder_backend_name and
stop forwarding them into run_disagg unless they carry real config values;
update the code that sets/defines these locals (encoder_database,
encoder_backend_name) to either derive actual values from the configuration or
omit the keyword args when calling run_disagg, and change any call sites that
currently pass encoder_database=encoder_database or
encoder_backend_name=encoder_backend_name to conditionally include those kwargs
only if not None (or remove them entirely) so run_disagg only receives
meaningful encoder-related arguments.

src/aiconfigurator/sdk/backends/trtllm_backend.py (1)

592-596: Encoder weights are added after PP normalization.

Line 592 normalizes to per-GPU weight memory, but Lines 594-595 add unnormalized encoder weights. This can inflate memory estimates for pipeline-parallel setups and bias OOM checks.

♻️ Proposed fix

-        # count weights on a single GPU
-        weights /= model.config.pp_size
-
-        for op in model.encoder_ops:
-            weights += op.get_weights()
+        for op in model.encoder_ops:
+            weights += op.get_weights()
+
+        # count weights on a single GPU
+        weights /= model.config.pp_size

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/backends/trtllm_backend.py` around lines 592 - 596,
The current calculation divides weights by model.config.pp_size and then adds
unnormalized encoder weights from model.encoder_ops (op.get_weights()), which
yields incorrect per-GPU estimates; fix by ensuring encoder op weights are
normalized by the same pipeline-parallel factor—either add encoder weights to
weights before dividing by model.config.pp_size, or divide each op.get_weights()
by model.config.pp_size when adding (refer to the variables/functions weights,
model.config.pp_size, model.encoder_ops, and op.get_weights()) so the final
weights value reflects per-GPU memory correctly.

src/aiconfigurator/sdk/pareto_analysis.py (1)

216-229: Document the new encoder kwargs in disagg_pareto docstring.

encoder_database and encoder_backend_name are now accepted and used, but they’re missing from the **kwargs contract. Adding them will make this API easier to use correctly.

Also applies to: 255-262

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/pareto_analysis.py` around lines 216 - 229, The
disagg_pareto docstring's kwargs contract is missing the new encoder kwargs;
update the docstring for disagg_pareto to include entries for encoder_database
and encoder_backend_name (both where kwargs are listed around the current block
and the repeated section later), stating they are accepted, the expected types
(e.g., encoder_database: str or DB object, encoder_backend_name: str), and a
brief note about their effect (which encoder to use and where to load it from);
reference the disagg_pareto function and ensure both occurrences (the block
around lines 216-229 and the repeated block around 255-262) are updated
consistently.

src/aiconfigurator/sdk/backends/sglang_backend.py (1)

522-526: Scale encoder weights consistently in per-GPU memory calculation.

On Line 522, weights are converted to per-GPU (/ pp_size), but Lines 524-525 add full encoder weights afterward. For pp_size > 1, this can overestimate memory and incorrectly trigger OOM filtering.

♻️ Proposed fix

-        # Count weights on a single GPU
-        weights /= model.config.pp_size
-
-        for op in model.encoder_ops:
-            weights += op.get_weights()
+        for op in model.encoder_ops:
+            weights += op.get_weights()
+
+        # Count weights on a single GPU
+        weights /= model.config.pp_size

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/aiconfigurator/sdk/backends/sglang_backend.py` around lines 522 - 526,
The per-GPU scaling is inconsistent: you divide weights by model.config.pp_size
but then add full encoder weights from model.encoder_ops (via op.get_weights()),
which overcounts when pp_size>1; fix by applying the same per-GPU scaling to
encoder weights (e.g., add op.get_weights() / model.config.pp_size or sum
encoder weights first then divide), updating the block that manipulates weights,
model.config.pp_size, model.encoder_ops, and op.get_weights() so all components
are scaled consistently.

tests/unit/sdk/test_utils.py (1)

663-699: Add a regression assert for deepstack_visual_indexes.

The new parser branch preserves this field on VisionEncoderConfig, but the fixture omits it and none of the new tests verify tuple conversion. A regression there would still pass even though the VL encoder wiring depends on it.

Suggested test addition

 _QWEN3VL_HF_CONFIG = {
     "architectures": [_QWEN3VL_ARCH],
@@
     "vision_config": {
+        "deepstack_visual_indexes": [8, 16, 24],
         "depth": 27,
         "hidden_size": 1152,
         "num_heads": 16,
         "intermediate_size": 4304,
         "patch_size": 16,
         "spatial_merge_size": 2,
         "temporal_patch_size": 2,
         "out_hidden_size": 5120,
     },
 }
@@
 class TestQwen3VLVisionEncoderParsing:
@@
     def test_vision_config_not_lost_after_text_config_unwrap(self):
         """Regression: vision_config must be captured before the text_config overwrite."""
         result = _parse_hf_config_json(_QWEN3VL_HF_CONFIG)
         assert result["extra_params"] is not None
+
+    def test_vision_encoder_deepstack_visual_indexes(self):
+        result = _parse_hf_config_json(_QWEN3VL_HF_CONFIG)
+        assert result["extra_params"].deepstack_visual_indexes == (8, 16, 24)

Also applies to: 743-793

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/sdk/test_utils.py` around lines 663 - 699, The fixture
_QWEN3VL_HF_CONFIG is missing the deepstack_visual_indexes check; add an
assertion in tests/unit/sdk/test_utils.py that after parsing the
_QWEN3VL_HF_CONFIG into a VisionEncoderConfig (or using the parser code path
that produces VisionEncoderConfig) the deepstack_visual_indexes field exists and
is of type tuple (or converted to tuple) and contains the expected values;
specifically, locate where VisionEncoderConfig is constructed/validated in the
test utility handling for _QWEN3VL_HF_CONFIG and add a short regression assert
verifying isinstance(parsed_config.deepstack_visual_indexes, tuple) and that its
contents match the fixture expectation so tuple conversion is enforced.

tests/unit/sdk/models/test_model_config.py (1)

403-413: Consider adding test for encoder_proj_to_llm_gemm op.

The PR summary mentions the projection to LLM as a key encoder component, but this op name isn't asserted in test_encoder_op_names. If this op exists, adding it would improve coverage.

💡 Suggested addition

     def test_encoder_op_names(self, vl_model):
         """All expected encoder op names must be present."""
         names = [op._name for op in vl_model.encoder_ops]
         assert "encoder_qkv_gemm" in names
         assert "encoder_attention" in names
         assert "encoder_proj_gemm" in names
         assert "encoder_ffn1_gemm" in names
         assert "encoder_ffn2_gemm" in names
         assert "encoder_merger_fc1" in names
         assert "encoder_merger_act" in names
         assert "encoder_merger_fc2" in names
+        assert "encoder_proj_to_llm_gemm" in names  # projection from ViT to LLM hidden dim

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/sdk/models/test_model_config.py` around lines 403 - 413, Add an
assertion for the projection-to-LLM encoder op in the test_encoder_op_names
test: check that "encoder_proj_to_llm_gemm" is present in the list of op names
generated from vl_model.encoder_ops (names = [op._name for op in
vl_model.encoder_ops]) alongside the existing assertions to ensure the
projection-to-LLM op is covered.

tests/unit/sdk/backends/test_encoder.py (1)

207-217: Add at least one tp_size > 1 encoder instantiation here.

All current model-level coverage uses the default config, so it never exercises the new TP-sharded encoder path. A small VL case with tp_size > 1 would catch regressions like invalid local ViT head counts or missing encoder collectives much earlier.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/sdk/backends/test_encoder.py` around lines 207 - 217, Add a new
test that exercises the TP-sharded encoder path by creating a ModelConfig with
tp_size > 1 (e.g., set model_config.tp_size = 2 or instantiate a new config with
tp_size=2), then call get_model for a visual-LM model (e.g.,
"Qwen/Qwen3-VL-32B-Instruct") using that config and backend "trtllm", and assert
encoder_ops is non-empty; reference the existing test helpers and symbols
get_model and model.encoder_ops so the test runs alongside
test_vl_model_has_non_empty_encoder_ops and will catch TP-sharded encoder
regressions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/aiconfigurator/cli/report_and_save.py`:
- Around line 175-184: The code fails on Pandas NaN for encoder fields and omits
encoder contribution from the replica GPU math; update the parsing of encoder
fields (e_tp, e_pp and any encoder worker/replica counts like "(e)workers" and
"(e)dp") to safely coerce NaN to 1 (e.g., use pandas.isna(row.get(...)) or a
small helper that returns int(value) if valid else 1) before computing e_gpus
and e_gpus_str, and include the encoder term in gpus_replica_str arithmetic and
string composition (add the encoder part = (e)workers x ((e)pp * (e)tp * (e)dp)
and reflect it in the total GPU count) so replica composition and total_gpus
account for encoders consistently with the existing p/d terms.

In `@src/aiconfigurator/sdk/models.py`:
- Around line 938-945: The vision encoder TP split isn't validated before doing
integer division (e.g., computing head_size_vit = h_vit // n_vit and other
usages of // tp_size), so TP sizes that don't evenly divide vision dims can drop
channels or yield zero; add the same divisibility checks used for the text
backbone to validate tp_size against the vision encoder dimensions (h_vit,
n_vit, inter_vit and any merger/FFN widths derived from h_vit and h_llm) before
performing floor-division, and raise a clear exception or assert (with context
mentioning tp_size and the offending dimension) if the split is invalid so the
code doesn't silently continue with truncated/zero sizes.
- Around line 951-1025: The encoder GEMMs ("encoder_proj_gemm",
"encoder_ffn2_gemm", "encoder_merger_fc2") are TP-sharded but lack the
corresponding TP collectives; add matching all-reduce (or the same collective
primitive used elsewhere for row-parallel GEMMs) immediately after each of these
ops when tp_size > 1 so the encoder path mirrors the LLM row-parallel pattern
and yields correct TTFT/energy estimates; ensure you use the same collective op
type and semantics (datatype/scale/axis) as used for other GEMM collectives in
the file and honor low_precision_input for "encoder_proj_gemm" and
"encoder_ffn2_gemm" when constructing the collective.

---

Outside diff comments:
In `@collector/trtllm/collect_attn.py`:
- Around line 298-316: The INT32 overflow check currently multiplies by a
hard-coded 128 which ignores the loop variable head size and causes
configurations with head_dim=72 to be skipped; update the overflow guard in the
nested loops (the block iterating head_dim, n, s, b, n_kv and the check that
currently reads something like b * s * num_kv_heads * 128 * 2 >= 2147483647) to
use the loop variable h instead of the constant 128 (i.e., use b * s *
num_kv_heads * h * 2 >= 2147483647) so all head_dim values are evaluated
correctly.

In `@src/aiconfigurator/sdk/backends/base_backend.py`:
- Around line 296-319: The memory calculation underestimates activation usage
for vision-language models because _get_memory_usage is called with isl instead
of the augmented token count that includes img_ctx_tokens; update calls in the
"static_ctx", "static_gen", and "static" branches so _get_memory_usage receives
isl + img_ctx_tokens (or explicitly set num_tokens to include img_ctx_tokens,
e.g., num_tokens=(isl + img_ctx_tokens) or num_tokens=batch_size*beam_width
where appropriate), ensuring any prefix/osl arguments remain unchanged;
reference the functions/variables _get_memory_usage, _run_encoder
(img_ctx_tokens), and the mode branches ("static_ctx", "static_gen", "static")
to locate and modify the calls.

In `@src/aiconfigurator/sdk/inference_summary.py`:
- Around line 248-266: Docstrings for set_encoder_power_avg and
get_encoder_power_avg incorrectly say "context phase average power"; update both
docstrings to read "encoder phase average power (watts)". Locate the methods
set_encoder_power_avg and get_encoder_power_avg in inference_summary.py and
change their docstring text to accurately reference the encoder phase so it
matches the method purpose.

In `@src/aiconfigurator/webapp/events/event_fn.py`:
- Around line 230-236: The tuple of gr.update outputs in the handler puts the
error message ("ERROR!!!") into the second output (encoder) rather than the
first (summary); update the return tuple in the function in event_fn.py so that
the error string (stdout_text + stderr_text + traceback_log or "ERROR!!!") is
assigned to the first gr.update (summary) and the encoder breakdown gr.update
receives the appropriate encoder value (or empty string), i.e., swap/reorder the
gr.update(...) entries so summary output is populated with the error and the
encoder output is the second element.

---

Nitpick comments:
In `@collector/vllm/collect_attn.py`:
- Around line 517-521: The current preventive filter in collect_attn.py uses
get_sm_version() and the n // n_kv > 16 check with a continue to skip test cases
on SM100; remove that early gating so test cases are generated regardless and
let runtime errors be handled by the collector's error handler (i.e., delete the
conditional continue that references get_sm_version(), n and n_kv), ensuring we
do not pre-filter cases based on GQA ratio and allowing
trtllm_batch_decode_with_kv_cache failures to be caught non-fatally at
collection time.

In `@src/aiconfigurator/sdk/backends/sglang_backend.py`:
- Around line 522-526: The per-GPU scaling is inconsistent: you divide weights
by model.config.pp_size but then add full encoder weights from model.encoder_ops
(via op.get_weights()), which overcounts when pp_size>1; fix by applying the
same per-GPU scaling to encoder weights (e.g., add op.get_weights() /
model.config.pp_size or sum encoder weights first then divide), updating the
block that manipulates weights, model.config.pp_size, model.encoder_ops, and
op.get_weights() so all components are scaled consistently.

In `@src/aiconfigurator/sdk/backends/trtllm_backend.py`:
- Around line 592-596: The current calculation divides weights by
model.config.pp_size and then adds unnormalized encoder weights from
model.encoder_ops (op.get_weights()), which yields incorrect per-GPU estimates;
fix by ensuring encoder op weights are normalized by the same pipeline-parallel
factor—either add encoder weights to weights before dividing by
model.config.pp_size, or divide each op.get_weights() by model.config.pp_size
when adding (refer to the variables/functions weights, model.config.pp_size,
model.encoder_ops, and op.get_weights()) so the final weights value reflects
per-GPU memory correctly.

In `@src/aiconfigurator/sdk/pareto_analysis.py`:
- Around line 216-229: The disagg_pareto docstring's kwargs contract is missing
the new encoder kwargs; update the docstring for disagg_pareto to include
entries for encoder_database and encoder_backend_name (both where kwargs are
listed around the current block and the repeated section later), stating they
are accepted, the expected types (e.g., encoder_database: str or DB object,
encoder_backend_name: str), and a brief note about their effect (which encoder
to use and where to load it from); reference the disagg_pareto function and
ensure both occurrences (the block around lines 216-229 and the repeated block
around 255-262) are updated consistently.

In `@src/aiconfigurator/sdk/task.py`:
- Around line 1334-1335: Remove the always-None placeholder locals
encoder_database and encoder_backend_name and stop forwarding them into
run_disagg unless they carry real config values; update the code that
sets/defines these locals (encoder_database, encoder_backend_name) to either
derive actual values from the configuration or omit the keyword args when
calling run_disagg, and change any call sites that currently pass
encoder_database=encoder_database or encoder_backend_name=encoder_backend_name
to conditionally include those kwargs only if not None (or remove them entirely)
so run_disagg only receives meaningful encoder-related arguments.

In `@tests/unit/sdk/backends/test_encoder.py`:
- Around line 207-217: Add a new test that exercises the TP-sharded encoder path
by creating a ModelConfig with tp_size > 1 (e.g., set model_config.tp_size = 2
or instantiate a new config with tp_size=2), then call get_model for a visual-LM
model (e.g., "Qwen/Qwen3-VL-32B-Instruct") using that config and backend
"trtllm", and assert encoder_ops is non-empty; reference the existing test
helpers and symbols get_model and model.encoder_ops so the test runs alongside
test_vl_model_has_non_empty_encoder_ops and will catch TP-sharded encoder
regressions.

In `@tests/unit/sdk/models/test_model_config.py`:
- Around line 403-413: Add an assertion for the projection-to-LLM encoder op in
the test_encoder_op_names test: check that "encoder_proj_to_llm_gemm" is present
in the list of op names generated from vl_model.encoder_ops (names = [op._name
for op in vl_model.encoder_ops]) alongside the existing assertions to ensure the
projection-to-LLM op is covered.

In `@tests/unit/sdk/test_utils.py`:
- Around line 663-699: The fixture _QWEN3VL_HF_CONFIG is missing the
deepstack_visual_indexes check; add an assertion in tests/unit/sdk/test_utils.py
that after parsing the _QWEN3VL_HF_CONFIG into a VisionEncoderConfig (or using
the parser code path that produces VisionEncoderConfig) the
deepstack_visual_indexes field exists and is of type tuple (or converted to
tuple) and contains the expected values; specifically, locate where
VisionEncoderConfig is constructed/validated in the test utility handling for
_QWEN3VL_HF_CONFIG and add a short regression assert verifying
isinstance(parsed_config.deepstack_visual_indexes, tuple) and that its contents
match the fixture expectation so tuple conversion is enforced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ce9b18c5-a8c1-4781-b439-ac3ccd08afc5

📥 Commits

Reviewing files that changed from the base of the PR and between e7dc958 and 98011da.

📒 Files selected for processing (31)

collector/sglang/collect_attn.py
collector/trtllm/collect_attn.py
collector/vllm/collect_attn.py
src/aiconfigurator/cli/report_and_save.py
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-235B-A22B-Instruct_config.json
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-2B-Instruct_config.json
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-30B-A3B-Instruct_config.json
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-32B-Instruct_config.json
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-32B-Thinking_config.json
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-4B-Instruct_config.json
src/aiconfigurator/model_configs/Qwen--Qwen3-VL-8B-Instruct_config.json
src/aiconfigurator/sdk/backends/base_backend.py
src/aiconfigurator/sdk/backends/sglang_backend.py
src/aiconfigurator/sdk/backends/trtllm_backend.py
src/aiconfigurator/sdk/common.py
src/aiconfigurator/sdk/config.py
src/aiconfigurator/sdk/inference_session.py
src/aiconfigurator/sdk/inference_summary.py
src/aiconfigurator/sdk/models.py
src/aiconfigurator/sdk/pareto_analysis.py
src/aiconfigurator/sdk/picking.py
src/aiconfigurator/sdk/task.py
src/aiconfigurator/sdk/utils.py
src/aiconfigurator/systems/data/a100_sxm/vllm/0.14.0/context_attention_perf.txt
src/aiconfigurator/systems/data/a100_sxm/vllm/0.14.0/generation_attention_perf.txt
src/aiconfigurator/webapp/events/event_fn.py
src/aiconfigurator/webapp/events/event_handler.py
tests/unit/sdk/backends/test_encoder.py
tests/unit/sdk/models/test_model_config.py
tests/unit/sdk/test_common.py
tests/unit/sdk/test_utils.py

coderabbitai · 2026-04-13T08:14:54Z

+            e_tp = int(row.get("(e)tp", 1) or 1)
+            e_pp = int(row.get("(e)pp", 1) or 1)
+            e_gpus = e_tp * e_pp
+            e_gpus_str = f"{e_gpus} (=\033[4m{e_tp}\033[0mx\033[4m{e_pp}\033[0m)"
+            e_parallel_str = f"tp\033[4m{e_tp}\033[0mpp\033[4m{e_pp}\033[0m"
+            gpus_replica_str = (
+                f"{row['num_total_gpus']} "
+                f"(={row['(p)workers']}x{row['(p)pp'] * row['(p)tp'] * row['(p)dp']}"
+                f"+{row['(d)workers']}x{row['(d)pp'] * row['(d)tp'] * row['(d)dp']})"
+            )


⚠️ Potential issue | 🟠 Major

Handle missing encoder fields safely and include them in the replica GPU math.

These encoder columns are optional, but int(row.get("(e)tp", 1) or 1) will still blow up on Pandas NaN, and the rendered gpus/replica string still ignores encoder workers entirely. That makes the new disagg summary unreliable for non-VL rows and undercounts replica composition for VL rows.

Suggested fix

- e_tp = int(row.get("(e)tp", 1) or 1) - e_pp = int(row.get("(e)pp", 1) or 1) + e_workers = 0 if pd.isna(row.get("(e)workers", 0)) else int(row.get("(e)workers", 0)) + e_tp = 1 if pd.isna(row.get("(e)tp", 1)) else int(row.get("(e)tp", 1)) + e_pp = 1 if pd.isna(row.get("(e)pp", 1)) else int(row.get("(e)pp", 1)) e_gpus = e_tp * e_pp - e_gpus_str = f"{e_gpus} (=\033[4m{e_tp}\033[0mx\033[4m{e_pp}\033[0m)" - e_parallel_str = f"tp\033[4m{e_tp}\033[0mpp\033[4m{e_pp}\033[0m" - gpus_replica_str = ( - f"{row['num_total_gpus']} " - f"(={row['(p)workers']}x{row['(p)pp'] * row['(p)tp'] * row['(p)dp']}" - f"+{row['(d)workers']}x{row['(d)pp'] * row['(d)tp'] * row['(d)dp']})" - ) + replica_terms = [ + f"{row['(p)workers']}x{row['(p)pp'] * row['(p)tp'] * row['(p)dp']}", + f"{row['(d)workers']}x{row['(d)pp'] * row['(d)tp'] * row['(d)dp']}", + ] + if e_workers: + replica_terms.append(f"{e_workers}x{e_gpus}") + gpus_replica_str = f"{row['num_total_gpus']} (={' + '.join(replica_terms)})"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

e_tp = int(row.get("(e)tp", 1) or 1)

e_pp = int(row.get("(e)pp", 1) or 1)

e_gpus = e_tp * e_pp

e_gpus_str = f"{e_gpus} (=\033[4m{e_tp}\033[0mx\033[4m{e_pp}\033[0m)"

e_parallel_str = f"tp\033[4m{e_tp}\033[0mpp\033[4m{e_pp}\033[0m"

gpus_replica_str = (

f"{row['num_total_gpus']} "

f"(={row['(p)workers']}x{row['(p)pp'] * row['(p)tp'] * row['(p)dp']}"

f"+{row['(d)workers']}x{row['(d)pp'] * row['(d)tp'] * row['(d)dp']})"

)

e_workers = 0 if pd.isna(row.get("(e)workers", 0)) else int(row.get("(e)workers", 0))

e_tp = 1 if pd.isna(row.get("(e)tp", 1)) else int(row.get("(e)tp", 1))

e_pp = 1 if pd.isna(row.get("(e)pp", 1)) else int(row.get("(e)pp", 1))

e_gpus = e_tp * e_pp

replica_terms = [

f"{row['(p)workers']}x{row['(p)pp'] * row['(p)tp'] * row['(p)dp']}",

f"{row['(d)workers']}x{row['(d)pp'] * row['(d)tp'] * row['(d)dp']}",

]

if e_workers:

replica_terms.append(f"{e_workers}x{e_gpus}")

gpus_replica_str = f"{row['num_total_gpus']} (={' + '.join(replica_terms)})"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiconfigurator/cli/report_and_save.py` around lines 175 - 184, The code fails on Pandas NaN for encoder fields and omits encoder contribution from the replica GPU math; update the parsing of encoder fields (e_tp, e_pp and any encoder worker/replica counts like "(e)workers" and "(e)dp") to safely coerce NaN to 1 (e.g., use pandas.isna(row.get(...)) or a small helper that returns int(value) if valid else 1) before computing e_gpus and e_gpus_str, and include the encoder term in gpus_replica_str arithmetic and string composition (add the encoder part = (e)workers x ((e)pp * (e)tp * (e)dp) and reflect it in the total GPU count) so replica composition and total_gpus account for encoders consistently with the existing p/d terms.

coderabbitai · 2026-04-13T08:14:54Z

+        tp_size = self.config.tp_size
+        depth = encoder_config.depth
+        h_vit = encoder_config.hidden_size
+        n_vit = encoder_config.num_heads
+        inter_vit = encoder_config.intermediate_size
+        h_llm = encoder_config.out_hidden_size
+        head_size_vit = h_vit // n_vit  # 1152 // 16 = 72
+


⚠️ Potential issue | 🟠 Major

Validate the vision encoder TP split before floor-dividing it.

The new encoder uses // tp_size for ViT heads, FFN width, and merger width, but only the text backbone is validated for TP divisibility. A TP size can be valid for the LLM and still be invalid for the vision encoder, which would silently drop channels or even produce 0 local ViT heads.

Suggested fix

tp_size = self.config.tp_size depth = encoder_config.depth h_vit = encoder_config.hidden_size n_vit = encoder_config.num_heads inter_vit = encoder_config.intermediate_size h_llm = encoder_config.out_hidden_size head_size_vit = h_vit // n_vit + merger_dim = h_vit * encoder_config.spatial_merge_size ** 2 + + if n_vit % tp_size != 0: + raise ValueError(f"vision num_heads ({n_vit}) must be divisible by tp_size ({tp_size})") + if inter_vit % tp_size != 0: + raise ValueError( + f"vision intermediate_size ({inter_vit}) must be divisible by tp_size ({tp_size})" + ) + if merger_dim % tp_size != 0: + raise ValueError( + f"vision merger dim ({merger_dim}) must be divisible by tp_size ({tp_size})" + )

Also applies to: 979-1023, 1368-1378, 1408-1449

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiconfigurator/sdk/models.py` around lines 938 - 945, The vision encoder TP split isn't validated before doing integer division (e.g., computing head_size_vit = h_vit // n_vit and other usages of // tp_size), so TP sizes that don't evenly divide vision dims can drop channels or yield zero; add the same divisibility checks used for the text backbone to validate tp_size against the vision encoder dimensions (h_vit, n_vit, inter_vit and any merger/FFN widths derived from h_vit and h_llm) before performing floor-division, and raise a clear exception or assert (with context mentioning tp_size and the offending dimension) if the split is invalid so the code doesn't silently continue with truncated/zero sizes.

coderabbitai · 2026-04-13T08:14:54Z

+        self.encoder_ops.extend(
+            [
+                ops.ElementWise("encoder_add_norm_1", depth, 2 * h_vit, 2 * h_vit, 0.8),
+                ops.GEMM(
+                    "encoder_qkv_gemm",
+                    depth,
+                    3 * n_vit * head_size_vit // tp_size,
+                    h_vit,
+                    vit_gemm_mode,
+                ),
+                ops.ContextAttention(
+                    "encoder_attention",
+                    depth,
+                    n_vit // tp_size,
+                    n_vit // tp_size,  # ViT has no GQA: n_kv == n
+                    vit_kvcache_mode,
+                    vit_fmha_mode,
+                    head_size=head_size_vit,
+                ),
+                ops.GEMM(
+                    "encoder_proj_gemm",
+                    depth,
+                    h_vit,
+                    n_vit * head_size_vit // tp_size,
+                    vit_gemm_mode,
+                    low_precision_input=True,
+                ),
+                ops.ElementWise("encoder_add_norm_2", depth, 2 * h_vit, 2 * h_vit, 0.8),
+                ops.GEMM(
+                    "encoder_ffn1_gemm",
+                    depth,
+                    inter_vit // tp_size,
+                    h_vit,
+                    vit_gemm_mode,
+                ),
+                ops.ElementWise(
+                    "encoder_act",
+                    depth,
+                    inter_vit // tp_size,
+                    inter_vit // tp_size,
+                    0.8,
+                ),
+                ops.GEMM(
+                    "encoder_ffn2_gemm",
+                    depth,
+                    h_vit,
+                    inter_vit // tp_size,
+                    vit_gemm_mode,
+                    low_precision_input=True,
+                ),
+                # PatchMerger MLP: runs n_mergers times (1 final + deepstack instances)
+                # Each merger: Linear(4*h_vit→4*h_vit) + GELU + Linear(4*h_vit→h_llm)
+                # Operates on post-merge tokens (spatial_merge_size² patches fused per token)
+                ops.GEMM(
+                    "encoder_merger_fc1",
+                    1 + len(encoder_config.deepstack_visual_indexes),
+                    (h_vit * encoder_config.spatial_merge_size ** 2) // tp_size,
+                    h_vit * encoder_config.spatial_merge_size ** 2,
+                    vit_gemm_mode,
+                ),
+                ops.ElementWise(
+                    "encoder_merger_act",
+                    1 + len(encoder_config.deepstack_visual_indexes),
+                    (h_vit * encoder_config.spatial_merge_size ** 2) // tp_size,
+                    (h_vit * encoder_config.spatial_merge_size ** 2) // tp_size,
+                    0.8,
+                ),
+                ops.GEMM(
+                    "encoder_merger_fc2",
+                    1 + len(encoder_config.deepstack_visual_indexes),
+                    h_llm,
+                    (h_vit * encoder_config.spatial_merge_size ** 2) // tp_size,
+                    vit_gemm_mode,
+                ),
+            ]


⚠️ Potential issue | 🟠 Major

Model the encoder TP collectives too.

These encoder GEMMs are TP-sharded, but unlike the LLM paths there is no matching all-reduce after encoder_proj_gemm, encoder_ffn2_gemm, or encoder_merger_fc2. For tp_size > 1 that will systematically under-estimate encoder TTFT/energy and make the encoder path inconsistent with the row-parallel pattern used everywhere else.

Suggested fix

ops.GEMM( "encoder_proj_gemm", depth, h_vit, n_vit * head_size_vit // tp_size, vit_gemm_mode, low_precision_input=True, ), + ops.CustomAllReduce("encoder_ar_1", depth, h_vit, tp_size), ops.ElementWise("encoder_add_norm_2", depth, 2 * h_vit, 2 * h_vit, 0.8), ops.GEMM( "encoder_ffn1_gemm", depth, inter_vit // tp_size, @@ ops.GEMM( "encoder_ffn2_gemm", depth, h_vit, inter_vit // tp_size, vit_gemm_mode, low_precision_input=True, ), + ops.CustomAllReduce("encoder_ar_2", depth, h_vit, tp_size), @@ ops.GEMM( "encoder_merger_fc2", 1 + len(encoder_config.deepstack_visual_indexes), h_llm, (h_vit * encoder_config.spatial_merge_size ** 2) // tp_size, vit_gemm_mode, ), + ops.CustomAllReduce( + "encoder_merger_ar", + 1 + len(encoder_config.deepstack_visual_indexes), + h_llm, + tp_size, + ),

Also applies to: 1380-1452

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/aiconfigurator/sdk/models.py` around lines 951 - 1025, The encoder GEMMs ("encoder_proj_gemm", "encoder_ffn2_gemm", "encoder_merger_fc2") are TP-sharded but lack the corresponding TP collectives; add matching all-reduce (or the same collective primitive used elsewhere for row-parallel GEMMs) immediately after each of these ops when tp_size > 1 so the encoder path mirrors the LLM row-parallel pattern and yields correct TTFT/energy estimates; ensure you use the same collective op type and semantics (datatype/scale/axis) as used for other GEMM collectives in the file and honor low_precision_input for "encoder_proj_gemm" and "encoder_ffn2_gemm" when constructing the collective.

Signed-off-by: Son, Sungwook <sungwook.son@intel.com>

- Replace ambiguous multiplication sign (×) with x in docstrings - Break long lines >120 chars in base_backend.py and inference_summary.py - Remove trailing whitespace in models.py and inference_summary.py - Auto-format pareto_analysis.py function call arguments Signed-off-by: Son, Sungwook <sungwook.son@intel.com>

sungwook-son requested review from Arsene12358, Harrilee, YijiaZhao, ilyasher, jasonqinzhou and tianhaox as code owners April 13, 2026 08:06

sungwook-son requested a review from simone-chen as a code owner April 13, 2026 08:06

sungwook-son requested a review from a team April 13, 2026 08:06

sungwook-son requested review from AichenF, Ethan-ES, davilu-nvidia and xutizhou as code owners April 13, 2026 08:06

github-actions Bot added the feat label Apr 13, 2026

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

sungwook-son force-pushed the sson/qwen3_vl branch from 98011da to 1937c88 Compare April 14, 2026 14:27

sungwook-son requested a review from a team as a code owner April 14, 2026 14:27

sungwook-son force-pushed the sson/qwen3_vl branch from 1937c88 to 7b220d0 Compare April 14, 2026 14:47

feat(encoder): add Qwen3-VL vision encoder support

e3aaaf7

Signed-off-by: Son, Sungwook <sungwook.son@intel.com>

sungwook-son force-pushed the sson/qwen3_vl branch from 7b220d0 to e3aaaf7 Compare April 14, 2026 16:20

sungwook-son force-pushed the sson/qwen3_vl branch from 8405ddf to a519a6e Compare April 20, 2026 06:07

ftian1 mentioned this pull request Apr 21, 2026

[Roadmap] Intel XPU to NV Dynamo ftian1/dynamo#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(encoder): add Qwen3-VL vision encoder modeling#729

feat(encoder): add Qwen3-VL vision encoder modeling#729
sungwook-son wants to merge 2 commits intoai-dynamo:mainfrom
sungwook-son:sson/qwen3_vl

sungwook-son commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Apr 13, 2026

Uh oh!

coderabbitai Bot commented Apr 13, 2026

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

coderabbitai Bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sungwook-son commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Apr 13, 2026

Uh oh!

coderabbitai Bot commented Apr 13, 2026

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sungwook-son commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading