Song Tang*, Guangquan Jie*, Henghui Ding✉, Yu-Gang Jiang
* Equal Contribution, ✉ Corresponding Author
Fudan University
- [2026/03/26] Release the NEST dataset.
- [2026/02/21] ROSE is accepted to CVPR 2026 Findings. 👏👏
Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model's knowledge but demand up-to-date external information for accurate recognition.
We propose ROSE: Retrieval-Oriented Segmentation Enhancement, a plug-and-play framework designed to augment any MLLM-based segmentation model. ROSE enables effective segmentation of both novel and emerging objects by leveraging the latest online multimodal information, outperforming a strong Gemini-2.0 Flash-based retrieval baseline by 19.2% in gIoU.
- NEST Benchmark: A continuously updated benchmark containing over 1,500 image-question-answer-mask pairs for evaluating novel emerging segmentation capabilities.
- ROSE Framework: A plug-and-play method that augments any MLLM-based segmentation model with the ability to segment novel and emerging entities.
- Automated Data Engine: A scalable pipeline that continuously retrieves and updates the latest image-news pairs from the web for constructing evaluation data.
Given a user input (image and question), ROSE first employs the WebSense module to determine whether internet retrieval is needed. If so, the Internet Retrieval-Augmented Generation (IRAG) module retrieves relevant textual and visual data from the web. The retrieved content is then processed by the Textual Prompt Enhancer (TPE) and Visual Prompt Enhancer (VPE) to generate enriched prompts for the MLLM-based segmentation model, which ultimately produces accurate segmentation masks for novel and emerging entities.
We introduce an automated annotation pipeline that efficiently generates high-quality evaluation samples for the novel emerging segmentation task. The pipeline leverages time-specific queries to continuously collect news content and corresponding relevant images for constructing VQA pairs and automatically generating mask annotations, enabling a comprehensive and reliable evaluation of models' abilities to segment emerging entities.
We present qualitative results comparing ROSE with LISA and READ on novel and emerging entities. ROSE accurately segments unseen and newly emerging targets, while existing methods struggle due to a lack of up-to-date knowledge or inability to recognize new entities.
- Uploading the code of ROSE
- Uploading the code of Automated NEST Data Engine
@inproceedings{tang2026rose,
title={{ROSE}: Retrieval-Oriented Segmentation Enhancement},
author={Tang, Song and Jie, Guangquan and Ding, Henghui and Jiang, Yu-Gang},
booktitle={CVPR Findings},
year={2026}
}


