Search Results for author: Chunhua Shen

Found 375 papers, 166 papers with code

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations • ECCV 2020 • Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

Paper
Add Code

Floating Anchor Diffusion Model for Multi-motif Scaffolding

1 code implementation • 5 Jun 2024 • Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes.

Paper
Code

Generative Active Learning for Long-tailed Instance Segmentation

1 code implementation • 4 Jun 2024 • Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen

However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data.

Active Learning Instance Segmentation +2

Paper
Code

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

1 code implementation • 22 May 2024 • Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen

Experiments show that our method's produced images are consistent with the given concepts and better aligned with the input text.

Image Generation

Paper
Code

On the Trajectory Regularity of ODE-based Diffusion Sampling

2 code implementations • 18 May 2024 • Defang Chen, Zhenyu Zhou, Can Wang, Chunhua Shen, Siwei Lyu

Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution.

Denoising Image Generation

Paper
Code

DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

1 code implementation • 16 May 2024 • Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen

Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy.

Ranked #7 on Instance Segmentation on LVIS v1.0 val

Data Augmentation Instance Segmentation +2

Paper
Code

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

1 code implementation • 30 Apr 2024 • Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.

Domain Generalization Text Spotting

Paper
Code

Deepfake Generation and Detection: A Benchmark and Survey

1 code implementation • 26 Mar 2024 • Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao

Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.

Attribute Face Reenactment +2

126

Paper
Code

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

1 code implementation • Under review for Transaction 2024 • Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen

For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training.

Ranked #1 on Surface Normals Estimation on NYU Depth v2 (using extra training data)

Depth Estimation Surface Normal Estimation +1

865

Paper
Code

Zippo: Zipping Color and Transparency Distributions into a Single Diffusion Model

no code implementations • 17 Mar 2024 • Kangyang Xie, BinBin Yang, Hao Chen, Meng Wang, Cheng Zou, Hui Xue, Ming Yang, Chunhua Shen

Beyond the superiority of the text-to-image diffusion model in generating high-quality images, recent studies have attempted to uncover its potential for adapting the learned semantic knowledge to visual perception tasks.

Image Generation

Paper
Add Code

3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models

no code implementations • 17 Mar 2024 • Yongtao Ge, Wenjia Wang, Yongfan Chen, Hao Chen, Chunhua Shen

In this work, we show that synthetic data created by generative models is complementary to computer graphics (CG) rendered data for achieving remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation (HPS).

Ranked #10 on 3D Human Pose Estimation on 3DPW

3D human pose and shape estimation 3D Human Reconstruction

Paper
Add Code

Diffusion Models Trained with Large Data Are Transferable Visual Models

no code implementations • 10 Mar 2024 • Guangkai Xu, Yongtao Ge, MingYu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen

We show that, simply initializing image understanding models using a pre-trained UNet (or transformer) of diffusion models, it is possible to achieve remarkable transferable performance on fundamental vision perception tasks using a moderate amount of target data (even synthetic data only), including monocular depth, surface normal, image segmentation, matting, human pose estimation, among virtually many others.

Image Matting Image Segmentation +2

Paper
Add Code

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

1 code implementation • 1 Mar 2024 • Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen

Large language models are built on top of a transformer-based architecture to process textual inputs.

Image Classification Image Generation +2

303

Paper
Code

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

1 code implementation • 6 Feb 2024 • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.

AutoML Language Modelling

844

Paper
Code

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices

1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen

We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.

AutoML Language Modelling

844

Paper
Code

GenDeF: Learning Generative Deformation Field for Video Generation

no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen

We offer a new perspective on approaching the task of video generation.

Disentanglement Video Editing +3

Paper
Add Code

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation • 24 Nov 2023 • Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

Paper
Code

DA-STC: Domain Adaptive Video Semantic Segmentation via Spatio-Temporal Consistency

1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai

To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.

Representation Learning Segmentation +2

Paper
Code

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

no code implementations • 19 Nov 2023 • Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen

We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.

Image Generation Story Visualization

Paper
Add Code

De novo protein design using geometric vector field networks

no code implementations • 18 Oct 2023 • Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen

Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context.

Protein Design

Paper
Add Code

RGM: A Robust Generalizable Matching Model

1 code implementation • 18 Oct 2023 • Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen

Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications.

Optical Flow Estimation

Paper
Code

Object-aware Inversion and Reassembly for Image Editing

no code implementations • 18 Oct 2023 • Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen

Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.

Benchmarking Denoising +1

Paper
Add Code

Self-Supervised 3D Scene Flow Estimation and Motion Prediction using Local Rigidity Prior

1 code implementation • 17 Oct 2023 • Ruibo Li, Chi Zhang, Zhe Wang, Chunhua Shen, Guosheng Lin

By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels.

Motion Estimation motion prediction +2

Paper
Code

PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm

1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang

In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.

Ranked #2 on Semantic Segmentation on ScanNet (using extra training data)

3D Object Detection 3D Reconstruction +5

304

Paper
Code

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

no code implementations • ICCV 2023 • Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.

Monocular Depth Estimation

Paper
Add Code

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei

However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.

Ranked #69 on Visual Question Answering on MM-Vet

Visual Question Answering

Paper
Code

Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval

1 code implementation • 13 Aug 2023 • Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen

In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion.

Ranked #1 on Supervised Anomaly Detection on BTAD

Supervised Anomaly Detection

Paper
Code

SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning

1 code implementation • ICCV 2023 • Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen

In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.

Open-World Instance Segmentation Segmentation +1

108

Paper
Code

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Decoder Depth Estimation +6

288

Paper
Code

FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models

no code implementations • ICCV 2023 • Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Zhao

3D scene reconstruction is a long-standing vision task.

3D Scene Reconstruction Monocular Depth Estimation

Paper
Add Code

CTVIS: Consistent Training for Online Video Instance Segmentation

1 code implementation • ICCV 2023 • Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen

The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS).

Ranked #2 on Video Instance Segmentation on Youtube-VIS 2022 Validation (using extra training data)

Instance Segmentation Semantic Segmentation +1

Paper
Code

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

1 code implementation • ICCV 2023 • Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.

Ranked #19 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Image Reconstruction Monocular Depth Estimation +1

865

Paper
Code

Generative Prompt Model for Weakly Supervised Object Localization

1 code implementation • ICCV 2023 • Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.

Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

Image Denoising Language Modelling +2

Paper
Code

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

1 code implementation • 9 Jun 2023 • BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.

Ranked #16 on Semantic Segmentation on ADE20K

Continual Learning Continual Semantic Segmentation +3

187

Paper
Code

A Dynamic Feature Interaction Framework for Multi-task Visual Perception

no code implementations • 8 Jun 2023 • Yuling Xi, Hao Chen, Ning Wang, Peng Wang, Yanning Zhang, Chunhua Shen, Yifan Liu

In particular, one feature merge branch is designed for instance-level recognition the other for dense predictions.

Autonomous Driving Depth Estimation +3

Paper
Add Code

Efficient Anomaly Detection with Budget Annotation Using Semi-Supervised Residual Transformer

no code implementations • 6 Jun 2023 • Hanxi Li, Jingqi Wu, Hao Chen, Mingwen Wang, Chunhua Shen

Thus the sliding transformer can attain even higher accuracy with much less annotation labor.

Ranked #1 on Anomaly Detection on MVTec AD (Segmentation AUROC metric)

Supervised Anomaly Detection Unsupervised Anomaly Detection

Paper
Add Code

A Geometric Perspective on Diffusion Models

no code implementations • 31 May 2023 • Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang

Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models.

Denoising

Paper
Add Code

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

2 code implementations • 30 May 2023 • Chi Zhang, YiWen Chen, Yijun Fu, Zhenglin Zhou, Gang Yu, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen

The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.

3D Generation Attribute +1

500

Paper
Code

Learning Conditional Attributes for Compositional Zero-Shot Learning

1 code implementation • CVPR 2023 • Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen

Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.

Ranked #1 on Compositional Zero-Shot Learning on MIT-States

Attribute Compositional Zero-Shot Learning

Paper
Code

LoRAPrune: Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

no code implementations • 28 May 2023 • Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang

This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead.

Model Compression Network Pruning

Paper
Add Code

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

1 code implementation • 22 May 2023 • Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen

In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.

Ranked #3 on Few-Shot Semantic Segmentation on COCO-20i (5-shot)

Few-Shot Semantic Segmentation Segmentation +1

380

Paper
Code

SegGPT: Segmenting Everything In Context

1 code implementation • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

2,445

Paper
Code

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

1 code implementation • 30 Mar 2023 • Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen

Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.

Image Generation Video Alignment +1

326

Paper
Code

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

1 code implementation • ICCV 2023 • Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.

Ranked #7 on 3D Human Pose Estimation on 3DPW

3D Human Pose Estimation 3D Reconstruction

Paper
Code

DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models

1 code implementation • ICCV 2023 • Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen

In contrast, synthetic data can be freely available using a generative model (e. g., DALL-E, Stable Diffusion).

Image Generation Semantic Segmentation

Paper
Code

Background Matters: Enhancing Out-of-distribution Detection with Domain Features

no code implementations • 15 Mar 2023 • Choubo Ding, Guansong Pang, Chunhua Shen

To this end, we propose a novel generic framework that can learn the domain features from the ID training samples by a dense prediction approach, with which different existing semantic-feature-based OOD detection methods can be seamlessly combined to jointly learn the in-distribution features from both the semantic and domain dimensions.

Object Recognition Out-of-Distribution Detection

Paper
Add Code

Traffic Scene Parsing through the TSP6K Dataset

1 code implementation • 6 Mar 2023 • Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen

To date, most existing datasets focus on autonomous driving scenes.

Autonomous Driving Decoder +4

Paper
Code

A Survey on Efficient Training of Transformers

no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.

Paper
Add Code

SPTS v2: Single-Point Scene Text Spotting

3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Ranked #15 on Text Spotting on ICDAR 2015

Decoder Text Detection +1

132

Paper
Code

SegGPT: Towards Segmenting Everything in Context

no code implementations • ICCV 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Few-Shot Semantic Segmentation In-Context Learning +4

Paper
Add Code

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

1 code implementation • CVPR 2023 • Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

Ranked #6 on Personalized Segmentation on PerSeg

In-Context Learning Keypoint Detection +2

2,445

Paper
Code

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation • 1 Dec 2022 • Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

Paper
Code

Learning from partially labeled data for multi-organ and tumor segmentation

1 code implementation • 13 Nov 2022 • Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen

To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

168

Paper
Code

SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes

2 code implementations • 7 Nov 2022 • Libo Sun, Jia-Wang Bian, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen

Self-supervised monocular depth estimation has shown impressive results in static scenes.

Indoor Monocular Depth Estimation Monocular Depth Estimation +1

400

Paper
Code

Hierarchical Normalization for Robust Monocular Depth Estimation

no code implementations • 18 Oct 2022 • Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen

In this paper, we address monocular depth estimation with deep neural networks.

Monocular Depth Estimation

Paper
Add Code

Adv-Attribute: Inconspicuous and Transferable Adversarial Attack on Face Recognition

no code implementations • 13 Oct 2022 • Shuai Jia, Bangjie Yin, Taiping Yao, Shouhong Ding, Chunhua Shen, Xiaokang Yang, Chao Ma

For face recognition attacks, existing methods typically generate the l_p-norm perturbations on pixels, however, resulting in low attack transferability and high vulnerability to denoising defense models.

Adversarial Attack Attribute +2

Paper
Add Code

SegViT: Semantic Segmentation with Plain Vision Transformers

1 code implementation • 12 Oct 2022 • BoWen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, Yifan Liu

We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit.

Ranked #4 on Semantic Segmentation on COCO-Stuff test

Segmentation Semantic Segmentation

187

Paper
Code

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

no code implementations • 27 Sep 2022 • Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen

To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.

Cross-Modal Retrieval Retrieval +2

Paper
Add Code

Multi-dataset Training of Transformers for Robust Action Recognition

1 code implementation • 26 Sep 2022 • Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen

We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.

Action Recognition Temporal Action Localization

Paper
Code

Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image

1 code implementation • 28 Aug 2022 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen

To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.

Depth Estimation Depth Prediction

1,035

Paper
Code

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo

Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.

Contrastive Learning Representation Learning +2

Paper
Code

Efficient Decoder-free Object Detection with Transformers

2 code implementations • 14 Jun 2022 • Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Decoder Object +1

Paper
Code

Point-Teaching: Weakly Semi-Supervised Object Detection with Point Annotations

no code implementations • 1 Jun 2022 • Yongtao Ge, Qiang Zhou, Xinlong Wang, Zhibin Wang, Hao Li, Chunhua Shen

Point annotations are considerably more time-efficient than bounding box annotations.

Data Augmentation Multiple Instance Learning +4

Paper
Add Code

Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images

no code implementations • 27 May 2022 • Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, Chunhua Shen

In this work, we tackle this challenging issue with a novel range view projection mechanism, and for the first time demonstrate the benefits of fusing multi-frame point clouds for a range-view based detector.

3D Object Detection Autonomous Driving +2

Paper
Add Code

Super Vision Transformer

1 code implementation • 23 May 2022 • Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao

Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.

Paper
Code

PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining

no code implementations • 29 Apr 2022 • Yuting Gao, Jinfeng Liu, Zihan Xu, Jun Zhang, Ke Li, Rongrong Ji, Chunhua Shen

Large-scale vision-language pre-training has achieved promising results on downstream tasks.

Image Classification Language Modelling +3

Paper
Add Code

PointInst3D: Segmenting 3D Instances by Points

no code implementations • 25 Apr 2022 • Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel

The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.

3D Instance Segmentation Clustering +2

Paper
Add Code

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

373

Paper
Code

Improving Monocular Visual Odometry Using Learned Depth

no code implementations • 4 Apr 2022 • Libo Sun, Wei Yin, Enze Xie, Zhengrong Li, Changming Sun, Chunhua Shen

The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.

Monocular Depth Estimation Monocular Visual Odometry

Paper
Add Code

Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection

1 code implementation • CVPR 2022 • Choubo Ding, Guansong Pang, Chunhua Shen

Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc.

Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Supervised Anomaly Detection

Paper
Code

End-to-End Video Text Spotting with Transformer

1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo

Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.

Text Detection Text Spotting

Paper
Code

PointAttN: You Only Need Attention for Point Cloud Completion

1 code implementation • 16 Mar 2022 • Jun Wang, Ying Cui, Dongyan Guo, Junxia Li, Qingshan Liu, Chunhua Shen

To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for processing point cloud in a per-point manner to eliminate kNNs.

Decoder Point Cloud Completion

Paper
Code

Training Protocol Matters: Towards Accurate Scene Text Recognition via Training Protocol Searching

2 code implementations • 13 Mar 2022 • Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu

The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.

Scene Text Recognition

Paper
Code

Efficient Video Segmentation Models with Per-frame Inference

no code implementations • 24 Feb 2022 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

To this end, we perform inference at each frame.

Image Matting Instance Segmentation +6

Paper
Add Code

FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

311

Paper
Code

Retrieval Augmented Classification for Long-Tail Visual Recognition

no code implementations • CVPR 2022 • Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton Van Den Hengel

We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module.

Ranked #6 on Long-tail Learning on iNaturalist 2018

Classification Image Classification +2

Paper
Add Code

Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings

no code implementations • 4 Feb 2022 • Wei Yin, Yifan Liu, Chunhua Shen, Baichuan Sun, Anton Van Den Hengel

The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom.

Ranked #1 on Semantic Segmentation on WildDash

Instance Segmentation Monocular Depth Estimation +4

Paper
Add Code

Towards 3D Scene Reconstruction from Locally Scale-Aligned Monocular Video Depth

no code implementations • 3 Feb 2022 • Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Wu, Feng Zhao

However, in some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.

3D Scene Reconstruction Depth Completion +1

Paper
Add Code

Poseur: Direct Human Pose Regression with Transformers

1 code implementation • 19 Jan 2022 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel

We propose a direct, regression-based approach to 2D human pose estimation from single images.

Ranked #2 on Keypoint Detection on MS COCO

2D Human Pose Estimation Keypoint Detection +1

172

Paper
Code

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation

no code implementations • CVPR 2022 • Yutong Dai, Brian Price, He Zhang, Chunhua Shen

Deep image matting methods have achieved increasingly better results on benchmarks (e. g., Composition-1k/alphamatting. com).

Data Augmentation Decoder +1

Paper
Add Code

RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior

no code implementations • CVPR 2022 • Ruibo Li, Chi Zhang, Guosheng Lin, Zhe Wang, Chunhua Shen

In this work, we focus on scene flow learning on point clouds in a self-supervised manner.

Motion Estimation Self-Supervised Learning

Paper
Add Code

DENSE: Data-Free One-Shot Federated Learning

1 code implementation • 23 Dec 2021 • Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round.

Federated Learning

Paper
Code

SPTS: Single-Point Text Spotting

1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Ranked #3 on Text Spotting on SCUT-CTW1500

Language Modelling Text Detection +1

132

Paper
Code

NAS-FCOS: Efficient Search for Object Detection Architectures

1 code implementation • 24 Oct 2021 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.

Neural Architecture Search Object +2

188

Paper
Code

TSGB: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

1 code implementation • 11 Oct 2021 • Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang

Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps.

Paper
Code

Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning

no code implementations • ICCV 2021 • Chi Zhang, Henghui Ding, Guosheng Lin, Ruibo Li, Changhu Wang, Chunhua Shen

Inspired by the recent success in Automated Machine Learning literature (AutoML), in this paper, we present Meta Navigator, a framework that attempts to solve the aforementioned limitation in few-shot learning by seeking a higher-level strategy and proffer to automate the selection from various few-shot learning designs.

AutoML Few-Shot Learning

Paper
Add Code

Explainable Deep Few-shot Anomaly Detection with Deviation Networks

1 code implementation • 1 Aug 2021 • Guansong Pang, Choubo Ding, Chunhua Shen, Anton Van Den Hengel

Here, we study the problem of few-shot anomaly detection, in which we aim at using a few labeled anomaly examples to train sample-efficient discriminative detection models.

Ranked #5 on Supervised Anomaly Detection on MVTec AD (using extra training data)

Multiple Instance Learning Supervised Anomaly Detection +1

Paper
Code

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

1 code implementation • NeurIPS 2021 • BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.

Decoder Segmentation +2

Paper
Code

Dynamic Convolution for 3D Point Cloud Instance Segmentation

1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel

The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.

Instance Segmentation Semantic Segmentation

118

Paper
Code

SOLO: A Simple Framework for Instance Segmentation

no code implementations • 30 Jun 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation.

Image Matting Instance Segmentation +4

Paper
Add Code

Learning Spatial-Semantic Relationship for Facial Attribute Recognition With Limited Labeled Data

no code implementations • CVPR 2021 • Ying Shu, Yan Yan, Si Chen, Jing-Hao Xue, Chunhua Shen, Hanzi Wang

First, three auxiliary tasks, consisting of a Patch Rotation Task (PRT), a Patch Segmentation Task (PST), and a Patch Classification Task (PCT), are jointly developed to learn the spatial-semantic relationship from large-scale unlabeled facial data.

Ranked #3 on Facial Attribute Classification on LFWA

Attribute Facial Attribute Classification +1

Paper
Add Code

FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions

3 code implementations • CVPR 2021 • Weian Mao, Zhi Tian, Xinlong Wang, Chunhua Shen

We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose.

Keypoint Estimation Multi-Person Pose Estimation

3,337

Paper
Code

Unsupervised Scale-consistent Depth Learning from Video

2 code implementations • 25 May 2021 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid

We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time.

Ranked #6 on Monocular Depth Estimation on NYU-Depth V2 self-supervised

Monocular Depth Estimation Monocular Visual Odometry +1

721

Paper
Code

HCRF-Flow: Scene Flow from Point Clouds with Continuous High-order CRFs and Position-aware Flow Embedding

no code implementations • CVPR 2021 • Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, Chunhua Shen

Scene flow in 3D point clouds plays an important role in understanding dynamic environments.

Position

Paper
Add Code

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Ranked #7 on Text Spotting on Inverse-Text

Text Spotting

Paper
Code

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

435

Paper
Code

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

9 code implementations • NeurIPS 2021 • Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.

Ranked #48 on Semantic Segmentation on ADE20K val

Image Classification Semantic Segmentation

30,332

Paper
Code

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

2 code implementations • 19 Apr 2021 • Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen

Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.

Contrastive Learning Representation Learning +1

Paper
Code

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

1 code implementation • ICCV 2021 • Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li

Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.

Ranked #14 on Semi-Supervised Semantic Segmentation on Cityscapes 25% labeled

Data Augmentation Image Classification +3

Paper
Code

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

no code implementations • CVPR 2021 • Delian Ruan, Yan Yan, Shenqi Lai, Zhenhua Chai, Chunhua Shen, Hanzi Wang

In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Add Code

TFPose: Direct Human Pose Estimation with Transformers

no code implementations • 29 Mar 2021 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang

We propose a human pose estimation framework that solves the task in the regression-based fashion.

Ranked #26 on Pose Estimation on MPII Human Pose (using extra training data)

Pose Estimation regression

Paper
Add Code

An Adversarial Human Pose Estimation Network Injected with Graph Structure

no code implementations • 29 Mar 2021 • Lei Tian, Guoqiang Liang, Peng Wang, Chunhua Shen

Because of the invisible human keypoints in images caused by illumination, occlusion and overlap, it is likely to produce unreasonable human pose prediction for most of the current human pose estimation methods.

Generative Adversarial Network Pose Estimation +1

Paper
Add Code

Generic Perceptual Loss for Modeling Structured Output Dependencies

no code implementations • CVPR 2021 • Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen

We hope that this simple, extended perceptual loss may serve as a generic structured-output loss that is applicable to most structured output learning tasks.

Depth Estimation Image Generation +4

Paper
Add Code

FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation

2 code implementations • 8 Mar 2021 • Lingtong Kong, Chunhua Shen, Jie Yang

Experiments on both synthetic Sintel data and real-world KITTI datasets demonstrate the effectiveness of the proposed approach, which needs only 1/10 computation of comparable networks to achieve on par accuracy.

Ranked #12 on Optical Flow Estimation on KITTI 2012

Decoder Optical Flow Estimation

237

Paper
Code

Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction

3 code implementations • 7 Mar 2021 • Wei Yin, Yifan Liu, Chunhua Shen

In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.

Depth Prediction Monocular Depth Estimation

1,035

Paper
Code

CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation

1 code implementation • 4 Mar 2021 • Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia

Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.

Image Segmentation Inductive Bias +4

285

Paper
Code

Conditional Positional Encodings for Vision Transformers

2 code implementations • 22 Feb 2021 • Xiangxiang Chu, Zhi Tian, Bo Zhang, Xinlong Wang, Chunhua Shen

Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT).

AutoML Classification +4

179

Paper
Code

Instance and Panoptic Segmentation Using Conditional Convolutions

no code implementations • 5 Feb 2021 • Zhi Tian, BoWen Zhang, Hao Chen, Chunhua Shen

In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance.

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

Object Detection Made Simpler by Eliminating Heuristic NMS

no code implementations • 28 Jan 2021 • Qiang Zhou, Chaohui Yu, Chunhua Shen, Zhibin Wang, Hao Li

On the COCO dataset, our simple design achieves superior performance compared to both the FCOS baseline detector with NMS post-processing and the recent end-to-end NMS-free detectors.

Object object-detection +1

Paper
Add Code

Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline

no code implementations • 24 Jan 2021 • Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li, Chunhua Shen

To address these issues, in this work we carefully design our settings and propose a new dataset including both synthetic and real traffic data in more complex scenarios.

Decoder

Paper
Add Code

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning +1

Paper
Add Code

BV-Person: A Large-Scale Dataset for Bird-View Person Re-Identification

no code implementations • ICCV 2021 • Cheng Yan, Guansong Pang, Lei Wang, Jile Jiao, Xuetao Feng, Chunhua Shen, Jingjing Li

In this work we introduce a new ReID task, bird-view person ReID, which aims at searching for a person in a gallery of horizontal-view images with the query images taken from a bird's-eye view, i. e., an elevated view of an object from above.

Person Re-Identification

Paper
Add Code

Occluded Person Re-Identification With Single-Scale Global Representations

no code implementations • ICCV 2021 • Cheng Yan, Guansong Pang, Jile Jiao, Xiao Bai, Xuetao Feng, Chunhua Shen

However, real-world ReID applications typically have highly diverse occlusions and involve a hybrid of occluded and non-occluded pedestrians.

Graph Matching Person Re-Identification +1

Paper
Add Code

Memory-Efficient Hierarchical Neural Architecture Search for Image Restoration

1 code implementation • 24 Dec 2020 • Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen

For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.

Image Denoising Image Restoration +2

Paper
Code

Diverse Knowledge Distillation for End-to-End Person Search

no code implementations • 21 Dec 2020 • Xinyu Zhang, Xinlong Wang, Jia-Wang Bian, Chunhua Shen, Mingyu You

Person search aims to localize and identify a specific person from a gallery of images.

Human Detection Knowledge Distillation +1

Paper
Add Code

Learning to Recover 3D Scene Shape from a Single Image

1 code implementation • CVPR 2021 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen

Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.

Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)

3D Scene Reconstruction Depth Prediction +3

1,035

Paper
Code

Hyperspectral Classification Based on Lightweight 3-D-CNN With Transfer Learning

2 code implementations • 7 Dec 2020 • Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen

In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.

Classification General Classification +1

Paper
Code

BoxInst: High-Performance Instance Segmentation with Box Annotations

2 code implementations • CVPR 2021 • Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen

We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.

Ranked #2 on Box-supervised Instance Segmentation on PASCAL VOC 2012 val

Box-supervised Instance Segmentation Segmentation +3

3,337

Paper
Code

End-to-End Video Instance Segmentation with Transformers

2 code implementations • CVPR 2021 • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Ranked #33 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +3

738

Paper
Code

Fully Quantized Image Super-Resolution Networks

1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen

With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.

Image Super-Resolution Quantization

Paper
Code

Learning Affinity-Aware Upsampling for Deep Image Matting

1 code implementation • CVPR 2021 • Yutong Dai, Hao Lu, Chunhua Shen

By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features.

Image Matting Image Reconstruction

Paper
Code

Channel-wise Knowledge Distillation for Dense Prediction

3 code implementations • ICCV 2021 • Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen

Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.

Knowledge Distillation Segmentation +1

1,393

Paper
Code

DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution

1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel

Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.

Instance Segmentation Semantic Segmentation

118

Paper
Code

PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation

no code implementations • 25 Nov 2020 • Yutong Xie, Jianpeng Zhang, Zehui Liao, Yong Xia, Chunhua Shen

In this paper, we propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.

Image Segmentation Medical Image Segmentation +3

Paper
Add Code

Graph Attention Tracking

no code implementations • CVPR 2021 • Dongyan Guo, Yanyan Shao, Ying Cui, Zhenhua Wang, Liyan Zhang, Chunhua Shen

We propose to establish part-to-part correspondence between the target and the search region with a complete bipartite graph, and apply the graph attention mechanism to propagate target information from the template feature to the search feature.

Graph Attention Object Tracking +1

Paper
Add Code

Robust Data Hiding Using Inverse Gradient Attention

1 code implementation • 21 Nov 2020 • Honglei Zhang, Hu Wang, Yuanzhouhan Cao, Chunhua Shen, Yidong Li

In deep data hiding models, to maximize the encoding capacity, each pixel of the cover image ought to be treated differently since they have different sensitivities w. r. t.

Paper
Code

DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

1 code implementation • CVPR 2021 • Jianpeng Zhang, Yutong Xie, Yong Xia, Chunhua Shen

To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labeled datasets.

Image Segmentation Medical Image Segmentation +4

168

Paper
Code

Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions

no code implementations • 19 Nov 2020 • Hao Chen, Chunhua Shen, Zhi Tian

To our knowledge, DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation by considering both efficacy and efficiency.

Instance Segmentation Multi-Task Learning +2

Paper
Add Code

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

6 code implementations • CVPR 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.

Contrastive Learning Image Classification +7

3,114

Paper
Code

Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data

2 code implementations • 15 Sep 2020 • Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao

We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Representative Graph Neural Network

no code implementations • ECCV 2020 • Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang

In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.

Graph Neural Network object-detection +2

Paper
Add Code

FATNN: Fast and Accurate Ternary Neural Networks

no code implementations • ICCV 2021 • Peng Chen, Bohan Zhuang, Chunhua Shen

Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.

Image Classification Quantization

Paper
Add Code

Pairwise Relation Learning for Semi-supervised Gland Segmentation

no code implementations • 6 Aug 2020 • Yutong Xie, Jianpeng Zhang, Zhibin Liao, Chunhua Shen, Johan Verjans, Yong Xia

In this paper, we propose the pairwise relation-based semi-supervised (PRS^2) model for gland segmentation on histology images.

Relation Relation Network +1

Paper
Add Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

Improving Generative Adversarial Networks with Local Coordinate Coding

1 code implementation • 28 Jul 2020 • Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan

In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.

Paper
Code

Soft Expert Reward Learning for Vision-and-Language Navigation

no code implementations • ECCV 2020 • Hu Wang, Qi Wu, Chunhua Shen

In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.

Reinforcement Learning (RL) Vision and Language Navigation

Paper
Add Code

Weighing Counts: Sequential Crowd Counting by Reinforcement Learning

1 code implementation • ECCV 2020 • Liang Liu, Hao Lu, Hongwei Zou, Haipeng Xiong, Zhiguo Cao, Chunhua Shen

Inspired by scale weighing, we propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight.

Crowd Counting reinforcement-learning +1

Paper
Code

AQD: Towards Accurate Fully-Quantized Object Detection

1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen

Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.

Image Classification Object +3

Paper
Code

Deep Learning for Anomaly Detection: A Review

no code implementations • 6 Jul 2020 • Guansong Pang, Chunhua Shen, Longbing Cao, Anton Van Den Hengel

This paper surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in three high-level categories and 11 fine-grained categories of the methods.

Anomaly Detection Novelty Detection +1

Paper
Add Code

FCOS: A simple and strong anchor-free object detector

no code implementations • 14 Jun 2020 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.

Object Object Detection +1

Paper
Add Code

A Robust Attentional Framework for License Plate Recognition in the Wild

no code implementations • 6 Jun 2020 • Linjiang Zhang, Peng Wang, Hui Li, Zhen Li, Chunhua Shen, Yanning Zhang

On the other hand, the 2D attentional based license plate recognizer with an Xception-based CNN encoder is capable of recognizing license plates with different patterns under various scenarios accurately and robustly.

Image Generation License Plate Recognition

Paper
Add Code

Auto-Rectify Network for Unsupervised Indoor Depth Estimation

1 code implementation • 4 Jun 2020 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid

However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices.

Ranked #49 on Monocular Depth Estimation on NYU-Depth V2

Monocular Depth Estimation Self-Supervised Learning +1

400

Paper
Code

Scope Head for Accurate Localization in Object Detection

no code implementations • 11 May 2020 • Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang

Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.

Object object-detection +2

Paper
Add Code

Scene Text Image Super-Resolution in the Wild

4 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai

For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.

Image Super-Resolution

427

Paper
Code

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.

Ranked #1 on Real-Time Semantic Segmentation on COCO-Stuff

Real-Time Semantic Segmentation Segmentation

8,355

Paper
Code

Context Prior for Scene Segmentation

2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang

Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.

Ranked #1 on Scene Understanding on ADE20K val

Scene Segmentation Scene Understanding +1

246

Paper
Code

Segmenting Transparent Objects in the Wild

1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Ranked #4 on Semantic Segmentation on Trans10K

Segmentation Semantic Segmentation +1

Paper
Code

Viral Pneumonia Screening on Chest X-ray Images Using Confidence-Aware Anomaly Detection

1 code implementation • 27 Mar 2020 • Jianpeng Zhang, Yutong Xie, Guansong Pang, Zhibin Liao, Johan Verjans, Wenxin Li, Zongji Sun, Jian He, Yi Li, Chunhua Shen, Yong Xia

In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module.

Binary Classification Classification +2

Paper
Code

Mask Encoding for Single Shot Instance Segmentation

7 code implementations • CVPR 2020 • Rufeng Zhang, Zhi Tian, Chunhua Shen, Mingyu You, Youliang Yan

To date, instance segmentation is dominated by twostage methods, as pioneered by Mask R-CNN.

Instance Segmentation Segmentation +1

3,337

Paper
Code

SOLOv2: Dynamic and Fast Instance Segmentation

18 code implementations • NeurIPS 2020 • Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen

Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.

Ranked #10 on Real-time Instance Segmentation on MSCOCO

object-detection Object Detection +4

28,266

Paper
Code

Self-trained Deep Ordinal Regression for End-to-End Video Anomaly Detection

no code implementations • CVPR 2020 • Guansong Pang, Cheng Yan, Chunhua Shen, Anton Van Den Hengel, Xiao Bai

Video anomaly detection is of critical practical importance to a variety of real applications because it allows human attention to be focused on events that are likely to be of interest, in spite of an otherwise overwhelming volume of video.

Anomaly Detection regression +2

Paper
Add Code

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning

5 code implementations • 15 Mar 2020 • Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen

We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance.

Classification Few-Shot Image Classification +4

567

Paper
Code

Conditional Convolutions for Instance Segmentation

7 code implementations • ECCV 2020 • Zhi Tian, Chunhua Shen, Hao Chen

We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).

Instance Segmentation Segmentation +1

3,337

Paper
Code

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

no code implementations • 11 Mar 2020 • Genshun Dong, Yan Yan, Chunhua Shen, Hanzi Wang

Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate high-resolution feature maps preserving the detailed spatial information.

Image Segmentation Segmentation +2

Paper
Add Code

Efficient Semantic Video Segmentation with Per-frame Inference

1 code implementation • ECCV 2020 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang

For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.

Ranked #2 on Video Semantic Segmentation on CamVid

Knowledge Distillation Optical Flow Estimation +4

294

Paper
Code

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang

Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.

Ranked #9 on Text Spotting on Inverse-Text

Scene Text Detection Text Detection +1

3,337

Paper
Code

On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering

no code implementations • CVPR 2020 • Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton Van Den Hengel, Liangwei Wang

Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize.

Question Answering Referring Expression +1

Paper
Add Code

Joint Deep Learning of Facial Expression Synthesis and Recognition

no code implementations • 6 Feb 2020 • Yan Yan, Ying Huang, Si Chen, Chunhua Shen, Hanzi Wang

Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.

Facial Expression Recognition Facial Expression Recognition (FER) +1

Paper
Add Code

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

218

Paper
Code

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

no code implementations • 13 Jan 2020 • Canjie Luo, Qingxiang Lin, Yuliang Liu, Lianwen Jin, Chunhua Shen

Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training.

Style Transfer

Paper
Add Code

Memorizing Comprehensively to Learn Adaptively: Unsupervised Cross-Domain Person Re-ID with Multi-level Memory

no code implementations • 13 Jan 2020 • Xin-Yu Zhang, Dong Gong, Jiewei Cao, Chunhua Shen

Due to the lack of supervision in the target domain, it is crucial to identify the underlying similarity-and-dissimilarity relationships among the unlabelled samples in the target domain.

Person Re-Identification

Paper
Add Code

From Open Set to Closed Set: Supervised Spatial Divide-and-Conquer for Object Counting

3 code implementations • 7 Jan 2020 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i. e., the number of population can vary in [0, inf) in theory.

Object Counting

132

Paper
Code

Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation

no code implementations • ECCV 2020 • Tong He, Dong Gong, Zhi Tian, Chunhua Shen

3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.

Ranked #28 on 3D Instance Segmentation on ScanNet(v2)

3D Instance Segmentation Scene Understanding +1

Paper
Add Code

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

9 code implementations • CVPR 2020 • Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, Youliang Yan

The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.

Ranked #13 on Real-time Instance Segmentation on MSCOCO

Real-time Instance Segmentation Segmentation +1

3,337

Paper
Code

Ordered or Orderless: A Revisit for Video based Person Re-Identification

no code implementations • 24 Dec 2019 • Le Zhang, Zenglin Shi, Joey Tianyi Zhou, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Zeng Zeng, Chunhua Shen

Specifically, with a diagnostic analysis, we show that the recurrent structure may not be effective to learn temporal dependencies than what we expected and implicitly yields an orderless representation.

Video-Based Person Re-Identification

Paper
Add Code

Unsupervised Representation Learning by Predicting Random Distances

2 code implementations • 22 Dec 2019 • Hu Wang, Guansong Pang, Chunhua Shen, Congbo Ma

To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space.

Anomaly Detection Clustering +1

318

Paper
Code

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Ranked #3 on Scene Text Detection on ICDAR 2017 MLT

Scene Text Detection Text Detection

272

Paper
Code

SOLO: Segmenting Objects by Locations

24 code implementations • ECCV 2020 • Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei LI

We present a new, embarrassingly simple approach to instance segmentation in images.

Ranked #67 on Instance Segmentation on COCO test-dev

Clustering General Classification +3

28,266

Paper
Code

To Balance or Not to Balance: A Simple-yet-Effective Approach for Learning with Long-Tailed Distributions

no code implementations • 10 Dec 2019 • Jun-Jie Zhang, Lingqiao Liu, Peng Wang, Chunhua Shen

Such imbalanced distribution causes a great challenge for learning a deep neural network, which can be boiled down into a dilemma: on the one hand, we prefer to increase the exposure of tail class samples to avoid the excessive dominance of head classes in the classifier training.

Auxiliary Learning Self-Supervised Learning

Paper
Add Code

Unified Multifaceted Feature Learning for Person Re-Identification

no code implementations • 20 Nov 2019 • Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen

The loss structures the augmented images resulted by the two types of image erasing in a two-level hierarchy and enforces multifaceted attention to different parts.

Person Re-Identification

Paper
Add Code

Deep Anomaly Detection with Deviation Networks

6 code implementations • 19 Nov 2019 • Guansong Pang, Chunhua Shen, Anton Van Den Hengel

Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e. g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail.

Ranked #1 on Network Intrusion Detection on NB15-Backdoor

Anomaly Detection Cyber Attack Detection +3

318

Paper
Code

DirectPose: Direct End-to-End Multi-Person Pose Estimation

8 code implementations • 18 Nov 2019 • Zhi Tian, Hao Chen, Chunhua Shen

We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose.

Ranked #13 on Keypoint Detection on COCO test-dev

Multi-Person Pose Estimation

3,337

Paper
Code

Multi-marginal Wasserstein GAN

3 code implementations • NeurIPS 2019 • Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.

Image Generation Translation

Paper
Code

Deep Weakly-supervised Anomaly Detection

3 code implementations • 30 Oct 2019 • Guansong Pang, Chunhua Shen, Huidong Jin, Anton Van Den Hengel

To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled.

Relation Semi-supervised Anomaly Detection +3

318

Paper
Code

PolarMask: Single Shot Instance Segmentation with Polar Representation

2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo

In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.

Ranked #100 on Instance Segmentation on COCO test-dev

Distance regression Instance Segmentation +4

869

Paper
Code

Structured Binary Neural Networks for Image Recognition

no code implementations • 22 Sep 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid

Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.

object-detection Object Detection +2

Paper
Add Code

Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising

1 code implementation • CVPR 2020 • Haokui Zhang, Ying Li, Hao Chen, Chunhua Shen

We also present analysis on the architectures found by NAS.

Image Denoising Image Restoration +1

Paper
Code

Task-Aware Monocular Depth Estimation for 3D Object Detection

1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen

In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.

3D Object Detection 3D Object Recognition +4

Paper
Code

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo

Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.

Scene Text Recognition Super-Resolution

Paper
Code

Part-Guided Attention Learning for Vehicle Instance Retrieval

1 code implementation • 13 Sep 2019 • Xin-Yu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen

Finally, we aggregate the global appearance and part features to improve the feature performance further.

Fine-Grained Image Classification Retrieval +1

Paper
Code

Auxiliary Learning for Deep Multi-task Learning

no code implementations • 5 Sep 2019 • Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin

The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.

Auxiliary Learning Depth Estimation +3

Paper
Add Code

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

2 code implementations • NeurIPS 2019 • Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid

To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.

Ranked #61 on Monocular Depth Estimation on KITTI Eigen split

Depth And Camera Motion Monocular Depth Estimation +1

721

Paper
Code

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen

Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.

Ranked #8 on Scene Text Detection on SCUT-CTW1500

Scene Text Detection Segmentation +1

4,143

Paper
Code

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

5 code implementations • ICCV 2019 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, Chunhua Shen

A dense region can always be divided until sub-region counts are within the previously observed closed set.

Ranked #3 on Crowd Counting on TRANCOS

Crowd Counting

132

Paper
Code

Index Network

6 code implementations • 11 Aug 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.

Ranked #2 on Grayscale Image Denoising on Set12 sigma30

Decoder Grayscale Image Denoising +4

387

Paper
Code

MobileFAN: Transferring Deep Hidden Representation for Face Alignment

no code implementations • 11 Aug 2019 • Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong

To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder.

Decoder Face Alignment +1

Paper
Add Code

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

no code implementations • 10 Aug 2019 • Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen

Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.

Knowledge Distillation Quantization

Paper
Add Code

Exploiting temporal consistency for real-time video depth estimation

2 code implementations • ICCV 2019 • Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan

The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion.

Ranked #5 on Monocular Depth Estimation on Mid-Air Dataset

Monocular Depth Estimation

Paper
Code

Indices Matter: Learning to Index for Deep Image Matting

1 code implementation • ICCV 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu

We show that existing upsampling operators can be unified with the notion of the index function.

Ranked #4 on Semantic Image Matting on Semantic Image Matting Dataset

Decoder Semantic Image Matting

387

Paper
Code

Self-training with progressive augmentation for unsupervised cross-domain person re-identification

1 code implementation • ICCV 2019 • Xin-Yu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You

In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset.

Ranked #12 on Unsupervised Domain Adaptation on Market to Duke

Person Re-Identification Unsupervised Domain Adaptation

Paper
Code

Enforcing geometric constraints of virtual normal for depth prediction

3 code implementations • ICCV 2019 • Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan

Monocular depth prediction plays a crucial role in understanding 3D scene geometry.

Ranked #10 on Depth Estimation on NYU-Depth V2

Depth Prediction Monocular Depth Estimation

1,035

Paper
Code

Regularizing Proxies with Multi-Adversarial Training for Unsupervised Domain-Adaptive Semantic Segmentation

1 code implementation • 29 Jul 2019 • Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei

To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Code

V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

no code implementations • 29 Jul 2019 • Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced.

Visual Reasoning

Paper
Add Code

Towards End-to-End Text Spotting in Natural Scenes

no code implementations • 14 Jun 2019 • Peng Wang, Hui Li, Chunhua Shen

Text spotting in natural scene images is of great importance for many image understanding tasks.

Image Cropping Text Detection +1

Paper
Add Code

NAS-FCOS: Fast Neural Architecture Search for Object Detection

3 code implementations • CVPR 2020 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang

The success of deep neural networks relies on significant architecture engineering.

Ranked #124 on Object Detection on COCO test-dev

Decoder Neural Architecture Search +3

28,266

Paper
Code

Attention-guided Network for Ghost-free High Dynamic Range Imaging

5 code implementations • CVPR 2019 • Qingsen Yan, Dong Gong, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang

Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for dynamic scenes.

Optical Flow Estimation Vocal Bursts Intensity Prediction

143

Paper
Code

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

106

Paper
Code

Template-Based Automatic Search of Compact Semantic Segmentation Architectures

1 code implementation • 4 Apr 2019 • Vladimir Nekrasov, Chunhua Shen, Ian Reid

Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest.

Ranked #13 on Semantic Segmentation on CamVid

General Classification Holdout Set +1

151

Paper
Code

Architecture Search of Dynamic Cells for Semantic Video Segmentation

no code implementations • 4 Apr 2019 • Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid

In semantic video segmentation the goal is to acquire consistent dense semantic labelling across image frames.

Neural Architecture Search Optical Flow Estimation +3

Paper
Add Code

FCOS: Fully Convolutional One-Stage Object Detection

86 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Ranked #4 on Pedestrian Detection on TJU-Ped-campus

Object Object Detection +2

28,266

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.