Skip to content

FudanCVL/ROSE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ROSE: Retrieval-Oriented Segmentation Enhancement

              

Song Tang*, Guangquan Jie*, Henghui Ding, Yu-Gang Jiang

* Equal Contribution, ✉ Corresponding Author

Fudan University

🎉 News

  • [2026/03/26] Release the NEST dataset.
  • [2026/02/21] ROSE is accepted to CVPR 2026 Findings. 👏👏

😊 Introduction

Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model's knowledge but demand up-to-date external information for accurate recognition.

teaser

We propose ROSE: Retrieval-Oriented Segmentation Enhancement, a plug-and-play framework designed to augment any MLLM-based segmentation model. ROSE enables effective segmentation of both novel and emerging objects by leveraging the latest online multimodal information, outperforming a strong Gemini-2.0 Flash-based retrieval baseline by 19.2% in gIoU.

🔧 Key Features

  • NEST Benchmark: A continuously updated benchmark containing over 1,500 image-question-answer-mask pairs for evaluating novel emerging segmentation capabilities.
  • ROSE Framework: A plug-and-play method that augments any MLLM-based segmentation model with the ability to segment novel and emerging entities.
  • Automated Data Engine: A scalable pipeline that continuously retrieves and updates the latest image-news pairs from the web for constructing evaluation data.

🏗️ Architecture Overview

Given a user input (image and question), ROSE first employs the WebSense module to determine whether internet retrieval is needed. If so, the Internet Retrieval-Augmented Generation (IRAG) module retrieves relevant textual and visual data from the web. The retrieved content is then processed by the Textual Prompt Enhancer (TPE) and Visual Prompt Enhancer (VPE) to generate enriched prompts for the MLLM-based segmentation model, which ultimately produces accurate segmentation masks for novel and emerging entities.

architecture

📊 NEST Data Engine

We introduce an automated annotation pipeline that efficiently generates high-quality evaluation samples for the novel emerging segmentation task. The pipeline leverages time-specific queries to continuously collect news content and corresponding relevant images for constructing VQA pairs and automatically generating mask annotations, enabling a comprehensive and reliable evaluation of models' abilities to segment emerging entities.

data-engine

🖥️ Visual Results

We present qualitative results comparing ROSE with LISA and READ on novel and emerging entities. ROSE accurately segments unseen and newly emerging targets, while existing methods struggle due to a lack of up-to-date knowledge or inability to recognize new entities.

qualitative-results

📄 TODO List

  • Uploading the code of ROSE
  • Uploading the code of Automated NEST Data Engine

💗 Citation

@inproceedings{tang2026rose,
    title={{ROSE}: Retrieval-Oriented Segmentation Enhancement},
    author={Tang, Song and Jie, Guangquan and Ding, Henghui and Jiang, Yu-Gang},
    booktitle={CVPR Findings},
    year={2026}
}

About

[CVPR 2026 Findings] ROSE: Retrieval-Oriented Segmentation Enhancement

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors