Generalized Referring Expression Segmentation
8 papers with code • 1 benchmarks • 1 datasets
Generalized Referring Expression Segmentation (GRES), introduced by Liu et al in CVPR 2023, allows expressions indicating any number of target objects. GRES takes an image and a referring expression as input, and requires mask prediction of the target object(s).
Most implemented papers
GRES: Generalized Referring Expression Segmentation
Existing classic RES datasets and methods commonly support single-target expressions only, i. e., one expression refers to one target object.
MAttNet: Modular Attention Network for Referring Expression Comprehension
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression.
Vision-Language Transformer and Query Generation for Referring Segmentation
We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.
CRIS: CLIP-Driven Referring Image Segmentation
In addition, we present text-to-pixel contrastive learning to explicitly enforce the text feature similar to the related pixel-level features and dissimilar to the irrelevances.
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Referring image segmentation is a fundamental vision-language task that aims to segment out an object referred to by a natural language expression from an image.
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.
HDC: Hierarchical Semantic Decoding with Counting Assistance for Generalized Referring Expression Segmentation
The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving multiple/non-target scenarios.
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Referring Expression Segmentation (RES) has attracted rising attention, aiming to identify and segment objects based on natural language expressions.