Unsupervised Semantic Segmentation with Language-image Pre-training

8 papers with code • 11 benchmarks • 7 datasets

A segmentation task which does not utilise any human-level supervision for semantic segmentation except for a backbone which is initialised with features pre-trained with image-level labels.

Benchmarks

Add a Result

These leaderboards are used to track progress in Unsupervised Semantic Segmentation with Language-image Pre-training

Dataset	Best Model	Compare
Cityscapes val	TTD (MaskCLIP)	See all
ADE20K	TagAlign	See all
COCO-Object	TTD (TCL)	See all
PASCAL Context-59	TagAlign	See all
COCO-Stuff-171	TagAlign	See all
PASCAL VOC	CLS-SEG	See all
PascalVOC-20	TagAlign	See all
COCO-Stuff-27	ReCo+	See all
KITTI-STEP	ReCo+	See all
MS COCO	CLIPpy ViT-B	See all
PASCAL VOC 2007	CLIPpy ViT-B	See all

Show all 11 benchmarks

Collapse benchmarks

Datasets

Most implemented papers

Most implemented Social Latest No code

GroupViT: Semantic Segmentation Emerges from Text Supervision

NVlabs/GroupViT • • CVPR 2022

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Paper
Code

ReCo: Retrieve and Co-segment for Zero-shot Transfer

NoelShin/reco • • 14 Jun 2022

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment.

Paper
Code

Extract Free Dense Labels from CLIP

chongzhou96/maskclip • • 2 Dec 2021

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition.

Paper
Code

Perceptual Grouping in Contrastive Vision-Language Models

kahnchana/clippy • • ICCV 2023

In this work we examine how well vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.

Paper
Code

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

kakaobrain/tcl • • CVPR 2023

Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task.

Paper
Code