no code implementations • ECCV 2020 • Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun
However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.
1 code implementation • 5 Jun 2024 • Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen
Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes.
1 code implementation • 4 Jun 2024 • Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen
However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data.
1 code implementation • 22 May 2024 • Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen
Experiments show that our method's produced images are consistent with the given concepts and better aligned with the input text.
2 code implementations • 18 May 2024 • Defang Chen, Zhenyu Zhou, Can Wang, Chunhua Shen, Siwei Lyu
Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution.
1 code implementation • 16 May 2024 • Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen
Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy.
Ranked #7 on Instance Segmentation on LVIS v1.0 val
1 code implementation • 30 Apr 2024 • Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai
Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.
1 code implementation • 26 Mar 2024 • Gan Pei, Jiangning Zhang, Menghan Hu, Zhenyu Zhang, Chengjie Wang, Yunsheng Wu, Guangtao Zhai, Jian Yang, Chunhua Shen, DaCheng Tao
Deepfake is a technology dedicated to creating highly realistic facial images and videos under specific conditions, which has significant application potential in fields such as entertainment, movie production, digital human creation, to name a few.
1 code implementation • Under review for Transaction 2024 • Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, Shaojie Shen
For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training.
Ranked #1 on Surface Normals Estimation on NYU Depth v2 (using extra training data)
no code implementations • 17 Mar 2024 • Kangyang Xie, BinBin Yang, Hao Chen, Meng Wang, Cheng Zou, Hui Xue, Ming Yang, Chunhua Shen
Beyond the superiority of the text-to-image diffusion model in generating high-quality images, recent studies have attempted to uncover its potential for adapting the learned semantic knowledge to visual perception tasks.
no code implementations • 17 Mar 2024 • Yongtao Ge, Wenjia Wang, Yongfan Chen, Hao Chen, Chunhua Shen
In this work, we show that synthetic data created by generative models is complementary to computer graphics (CG) rendered data for achieving remarkable generalization performance on diverse real-world scenes for 3D human pose and shape estimation (HPS).
Ranked #10 on 3D Human Pose Estimation on 3DPW
no code implementations • 10 Mar 2024 • Guangkai Xu, Yongtao Ge, MingYu Liu, Chengxiang Fan, Kangyang Xie, Zhiyue Zhao, Hao Chen, Chunhua Shen
We show that, simply initializing image understanding models using a pre-trained UNet (or transformer) of diffusion models, it is possible to achieve remarkable transferable performance on fundamental vision perception tasks using a moderate amount of target data (even synthetic data only), including monocular depth, surface normal, image segmentation, matting, human pose estimation, among virtually many others.
1 code implementation • 1 Mar 2024 • Xiangxiang Chu, Jianlin Su, Bo Zhang, Chunhua Shen
Large language models are built on top of a transformer-based architecture to process textual inputs.
1 code implementation • 6 Feb 2024 • Xiangxiang Chu, Limeng Qiao, Xinyu Zhang, Shuang Xu, Fei Wei, Yang Yang, Xiaofei Sun, Yiming Hu, Xinyang Lin, Bo Zhang, Chunhua Shen
We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance.
1 code implementation • 28 Dec 2023 • Xiangxiang Chu, Limeng Qiao, Xinyang Lin, Shuang Xu, Yang Yang, Yiming Hu, Fei Wei, Xinyu Zhang, Bo Zhang, Xiaolin Wei, Chunhua Shen
We present MobileVLM, a competent multimodal vision language model (MMVLM) targeted to run on mobile devices.
no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen
We offer a new perspective on approaching the task of video generation.
1 code implementation • 24 Nov 2023 • Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang
In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.
1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai
To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.
no code implementations • 19 Nov 2023 • Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen
We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.
no code implementations • 18 Oct 2023 • Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen
Most prior encoders rely on atom-wise features, such as angles and distances between atoms, which are not available in this context.
1 code implementation • 18 Oct 2023 • Songyan Zhang, Xinyu Sun, Hao Chen, Bo Li, Chunhua Shen
Finding corresponding pixels within a pair of images is a fundamental computer vision task with various applications.
no code implementations • 18 Oct 2023 • Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen
Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.
1 code implementation • 17 Oct 2023 • Ruibo Li, Chi Zhang, Zhe Wang, Chunhua Shen, Guosheng Lin
By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels.
1 code implementation • 12 Oct 2023 • Haoyi Zhu, Honghui Yang, Xiaoyang Wu, Di Huang, Sha Zhang, Xianglong He, Hengshuang Zhao, Chunhua Shen, Yu Qiao, Tong He, Wanli Ouyang
In this paper, we introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation, thereby establishing a pathway to 3D foundational models.
Ranked #2 on Semantic Segmentation on ScanNet (using extra training data)
no code implementations • ICCV 2023 • Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen
In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
1 code implementation • 20 Aug 2023 • Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models.
Ranked #69 on Visual Question Answering on MM-Vet
1 code implementation • 13 Aug 2023 • Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen
In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion.
Ranked #1 on Supervised Anomaly Detection on BTAD
1 code implementation • ICCV 2023 • Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, Chunhua Shen
In this work, we propose a novel training mechanism termed SegPrompt that uses category information to improve the model's class-agnostic segmentation ability for both known and unknown categories.
1 code implementation • NeurIPS 2023 • Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen
To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.
no code implementations • ICCV 2023 • Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Zhao
3D scene reconstruction is a long-standing vision task.
1 code implementation • ICCV 2023 • Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen
The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS).
Ranked #2 on Video Instance Segmentation on Youtube-VIS 2022 Validation (using extra training data)
1 code implementation • ICCV 2023 • Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen
State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.
Ranked #19 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
1 code implementation • ICCV 2023 • Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan
During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings.
Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)
1 code implementation • 9 Jun 2023 • BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.
Ranked #16 on Semantic Segmentation on ADE20K
no code implementations • 8 Jun 2023 • Yuling Xi, Hao Chen, Ning Wang, Peng Wang, Yanning Zhang, Chunhua Shen, Yifan Liu
In particular, one feature merge branch is designed for instance-level recognition the other for dense predictions.
no code implementations • 6 Jun 2023 • Hanxi Li, Jingqi Wu, Hao Chen, Mingwen Wang, Chunhua Shen
Thus the sliding transformer can attain even higher accuracy with much less annotation labor.
Ranked #1 on Anomaly Detection on MVTec AD (Segmentation AUROC metric)
no code implementations • 31 May 2023 • Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
Recent years have witnessed significant progress in developing effective training and fast sampling techniques for diffusion models.
2 code implementations • 30 May 2023 • Chi Zhang, YiWen Chen, Yijun Fu, Zhenglin Zhou, Gang Yu, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen
The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models.
1 code implementation • CVPR 2023 • Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, Chunhua Shen
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts based on learned concepts such as attribute-object combinations.
Ranked #1 on Compositional Zero-Shot Learning on MIT-States
no code implementations • 28 May 2023 • Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang
This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead.
1 code implementation • 22 May 2023 • Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen
In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.
Ranked #3 on Few-Shot Semantic Segmentation on COCO-20i (5-shot)
1 code implementation • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang
We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.
Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)
1 code implementation • 30 Mar 2023 • Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.
1 code implementation • ICCV 2023 • Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura
As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.
Ranked #7 on 3D Human Pose Estimation on 3DPW
1 code implementation • ICCV 2023 • Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen
In contrast, synthetic data can be freely available using a generative model (e. g., DALL-E, Stable Diffusion).
no code implementations • 15 Mar 2023 • Choubo Ding, Guansong Pang, Chunhua Shen
To this end, we propose a novel generic framework that can learn the domain features from the ID training samples by a dense prediction approach, with which different existing semantic-feature-based OOD detection methods can be seamlessly combined to jointly learn the in-distribution features from both the semantic and domain dimensions.
1 code implementation • 6 Mar 2023 • Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen
To date, most existing datasets focus on autonomous driving scenes.
no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.
3 code implementations • 4 Jan 2023 • Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.
Ranked #15 on Text Spotting on ICDAR 2015
no code implementations • ICCV 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang
We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.
1 code implementation • CVPR 2023 • Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang
In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.
Ranked #6 on Personalized Segmentation on PerSeg
1 code implementation • 1 Dec 2022 • Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen
Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.
1 code implementation • 13 Nov 2022 • Yutong Xie, Jianpeng Zhang, Yong Xia, Chunhua Shen
To address this, we propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple partially labeled datasets.
2 code implementations • 7 Nov 2022 • Libo Sun, Jia-Wang Bian, Huangying Zhan, Wei Yin, Ian Reid, Chunhua Shen
Self-supervised monocular depth estimation has shown impressive results in static scenes.
Indoor Monocular Depth Estimation Monocular Depth Estimation +1
no code implementations • 18 Oct 2022 • Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen
In this paper, we address monocular depth estimation with deep neural networks.
no code implementations • 13 Oct 2022 • Shuai Jia, Bangjie Yin, Taiping Yao, Shouhong Ding, Chunhua Shen, Xiaokang Yang, Chao Ma
For face recognition attacks, existing methods typically generate the l_p-norm perturbations on pixels, however, resulting in low attack transferability and high vulnerability to denoising defense models.
1 code implementation • 12 Oct 2022 • BoWen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, Yifan Liu
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit.
Ranked #4 on Semantic Segmentation on COCO-Stuff test
no code implementations • 27 Sep 2022 • Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen
To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.
1 code implementation • 26 Sep 2022 • Junwei Liang, Enwei Zhang, Jun Zhang, Chunhua Shen
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
1 code implementation • 28 Aug 2022 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Simon Chen, Yifan Liu, Chunhua Shen
To do so, we propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image, and then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
1 code implementation • 18 Jul 2022 • Wejia Wu, Zhuang Li, Jiahong Li, Chunhua Shen, Hong Zhou, Size Li, Zhongyuan Wang, Ping Luo
Our contributions are three-fold: 1) CoText simultaneously address the three tasks (e. g., text detection, tracking, recognition) in a real-time end-to-end trainable framework.
2 code implementations • 14 Jun 2022 • Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen
A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.
no code implementations • 1 Jun 2022 • Yongtao Ge, Qiang Zhou, Xinlong Wang, Zhibin Wang, Hao Li, Chunhua Shen
Point annotations are considerably more time-efficient than bounding box annotations.
no code implementations • 27 May 2022 • Zhi Tian, Xiangxiang Chu, Xiaoming Wang, Xiaolin Wei, Chunhua Shen
In this work, we tackle this challenging issue with a novel range view projection mechanism, and for the first time demonstrate the benefits of fusing multi-frame point clouds for a range-view based detector.
1 code implementation • 23 May 2022 • Mingbao Lin, Mengzhao Chen, Yuxin Zhang, Chunhua Shen, Rongrong Ji, Liujuan Cao
Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase.
no code implementations • 29 Apr 2022 • Yuting Gao, Jinfeng Liu, Zihan Xu, Jun Zhang, Ke Li, Rongrong Ji, Chunhua Shen
Large-scale vision-language pre-training has achieved promising results on downstream tasks.
no code implementations • 25 Apr 2022 • Tong He, Wei Yin, Chunhua Shen, Anton Van Den Hengel
The current state-of-the-art methods in 3D instance segmentation typically involve a clustering step, despite the tendency towards heuristics, greedy algorithms, and a lack of robustness to the changes in data statistics.
3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen
Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.
no code implementations • 4 Apr 2022 • Libo Sun, Wei Yin, Enze Xie, Zhengrong Li, Changming Sun, Chunhua Shen
The core of our framework is a monocular depth estimation module with a strong generalization capability for diverse scenes.
1 code implementation • CVPR 2022 • Choubo Ding, Guansong Pang, Chunhua Shen
Despite most existing anomaly detection studies assume the availability of normal training samples only, a few labeled anomaly examples are often available in many real-world applications, such as defect samples identified during random quality inspection, lesion images confirmed by radiologists in daily medical screening, etc.
Ranked #4 on Supervised Anomaly Detection on MVTec AD (using extra training data)
1 code implementation • 20 Mar 2022 • Weijia Wu, Yuanqiang Cai, Chunhua Shen, Debing Zhang, Ying Fu, Hong Zhou, Ping Luo
Recent video text spotting methods usually require the three-staged pipeline, i. e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results.
1 code implementation • 16 Mar 2022 • Jun Wang, Ying Cui, Dongyan Guo, Junxia Li, Qingshan Liu, Chunhua Shen
To solve the problems, we leverage the cross-attention and self-attention mechanisms to design novel neural network for processing point cloud in a per-point manner to eliminate kNNs.
2 code implementations • 13 Mar 2022 • Xiaojie Chu, Yongtao Wang, Chunhua Shen, Jingdong Chen, Wei Chu
The development of scene text recognition (STR) in the era of deep learning has been mainly focused on novel architectures of STR models.
no code implementations • 24 Feb 2022 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
To this end, we perform inference at each frame.
1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez
FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.
no code implementations • CVPR 2022 • Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton Van Den Hengel
We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module.
Ranked #6 on Long-tail Learning on iNaturalist 2018
no code implementations • 4 Feb 2022 • Wei Yin, Yifan Liu, Chunhua Shen, Baichuan Sun, Anton Van Den Hengel
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets, despite not using any images therefrom.
Ranked #1 on Semantic Segmentation on WildDash
no code implementations • 3 Feb 2022 • Guangkai Xu, Wei Yin, Hao Chen, Chunhua Shen, Kai Cheng, Feng Wu, Feng Zhao
However, in some video-based scenarios such as video depth estimation and 3D scene reconstruction from a video, the unknown scale and shift residing in per-frame prediction may cause the depth inconsistency.
1 code implementation • 19 Jan 2022 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel
We propose a direct, regression-based approach to 2D human pose estimation from single images.
Ranked #2 on Keypoint Detection on MS COCO
no code implementations • CVPR 2022 • Yutong Dai, Brian Price, He Zhang, Chunhua Shen
Deep image matting methods have achieved increasingly better results on benchmarks (e. g., Composition-1k/alphamatting. com).
no code implementations • CVPR 2022 • Ruibo Li, Chi Zhang, Guosheng Lin, Zhe Wang, Chunhua Shen
In this work, we focus on scene flow learning on point clouds in a self-supervised manner.
1 code implementation • 23 Dec 2021 • Jie Zhang, Chen Chen, Bo Li, Lingjuan Lyu, Shuang Wu, Shouhong Ding, Chunhua Shen, Chao Wu
One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round.
1 code implementation • 15 Dec 2021 • Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin
For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.
Ranked #3 on Text Spotting on SCUT-CTW1500
1 code implementation • 24 Oct 2021 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang
Neural Architecture Search (NAS) has shown great potential in effectively reducing manual effort in network design by automatically discovering optimal architectures.
1 code implementation • 11 Oct 2021 • Lin Cheng, Pengfei Fang, Yanjie Liang, Liao Zhang, Chunhua Shen, Hanzi Wang
Inspired by those observations, we propose a novel visual saliency method, termed Target-Selective Gradient Backprop (TSGB), which leverages rectification operations to effectively emphasize target classes and further efficiently propagate the saliency to the image space, thereby generating target-selective and fine-grained saliency maps.
no code implementations • ICCV 2021 • Chi Zhang, Henghui Ding, Guosheng Lin, Ruibo Li, Changhu Wang, Chunhua Shen
Inspired by the recent success in Automated Machine Learning literature (AutoML), in this paper, we present Meta Navigator, a framework that attempts to solve the aforementioned limitation in few-shot learning by seeking a higher-level strategy and proffer to automate the selection from various few-shot learning designs.
1 code implementation • 1 Aug 2021 • Guansong Pang, Choubo Ding, Chunhua Shen, Anton Van Den Hengel
Here, we study the problem of few-shot anomaly detection, in which we aim at using a few labeled anomaly examples to train sample-efficient discriminative detection models.
Ranked #5 on Supervised Anomaly Detection on MVTec AD (using extra training data)
1 code implementation • NeurIPS 2021 • BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
1 code implementation • 18 Jul 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
no code implementations • 30 Jun 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI
Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation.
no code implementations • CVPR 2021 • Ying Shu, Yan Yan, Si Chen, Jing-Hao Xue, Chunhua Shen, Hanzi Wang
First, three auxiliary tasks, consisting of a Patch Rotation Task (PRT), a Patch Segmentation Task (PST), and a Patch Classification Task (PCT), are jointly developed to learn the spatial-semantic relationship from large-scale unlabeled facial data.
Ranked #3 on Facial Attribute Classification on LFWA
3 code implementations • CVPR 2021 • Weian Mao, Zhi Tian, Xinlong Wang, Chunhua Shen
We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose.
2 code implementations • 25 May 2021 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Zhichao Li, Le Zhang, Chunhua Shen, Ming-Ming Cheng, Ian Reid
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training and enables the scale-consistent prediction at inference time.
no code implementations • CVPR 2021 • Ruibo Li, Guosheng Lin, Tong He, Fayao Liu, Chunhua Shen
Scene flow in 3D point clouds plays an important role in understanding dynamic environments.
1 code implementation • 8 May 2021 • Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen
Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.
Ranked #7 on Text Spotting on Inverse-Text
1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.
9 code implementations • NeurIPS 2021 • Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.
Ranked #48 on Semantic Segmentation on ADE20K val
2 code implementations • 19 Apr 2021 • Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen
Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.
1 code implementation • ICCV 2021 • Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li
Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.
no code implementations • CVPR 2021 • Delian Ruan, Yan Yan, Shenqi Lai, Zhenhua Chai, Chunhua Shen, Hanzi Wang
In this paper, we propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition.
Facial Expression Recognition Facial Expression Recognition (FER) +1
no code implementations • 29 Mar 2021 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang
We propose a human pose estimation framework that solves the task in the regression-based fashion.
Ranked #26 on Pose Estimation on MPII Human Pose (using extra training data)
no code implementations • 29 Mar 2021 • Lei Tian, Guoqiang Liang, Peng Wang, Chunhua Shen
Because of the invisible human keypoints in images caused by illumination, occlusion and overlap, it is likely to produce unreasonable human pose prediction for most of the current human pose estimation methods.
no code implementations • CVPR 2021 • Yifan Liu, Hao Chen, Yu Chen, Wei Yin, Chunhua Shen
We hope that this simple, extended perceptual loss may serve as a generic structured-output loss that is applicable to most structured output learning tasks.
2 code implementations • 8 Mar 2021 • Lingtong Kong, Chunhua Shen, Jie Yang
Experiments on both synthetic Sintel data and real-world KITTI datasets demonstrate the effectiveness of the proposed approach, which needs only 1/10 computation of comparable networks to achieve on par accuracy.
Ranked #12 on Optical Flow Estimation on KITTI 2012
3 code implementations • 7 Mar 2021 • Wei Yin, Yifan Liu, Chunhua Shen
In this work, we show the importance of the high-order 3D geometric constraints for depth prediction.
1 code implementation • 4 Mar 2021 • Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
2 code implementations • 22 Feb 2021 • Xiangxiang Chu, Zhi Tian, Bo Zhang, Xinlong Wang, Chunhua Shen
Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT).
no code implementations • 5 Feb 2021 • Zhi Tian, BoWen Zhang, Hao Chen, Chunhua Shen
In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance.
no code implementations • 28 Jan 2021 • Qiang Zhou, Chaohui Yu, Chunhua Shen, Zhibin Wang, Hao Li
On the COCO dataset, our simple design achieves superior performance compared to both the FCOS baseline detector with NMS post-processing and the recent end-to-end NMS-free detectors.
no code implementations • 24 Jan 2021 • Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li, Chunhua Shen
To address these issues, in this work we carefully design our settings and propose a new dataset including both synthetic and real traffic data in more complex scenarios.
no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan
By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.
no code implementations • ICCV 2021 • Cheng Yan, Guansong Pang, Lei Wang, Jile Jiao, Xuetao Feng, Chunhua Shen, Jingjing Li
In this work we introduce a new ReID task, bird-view person ReID, which aims at searching for a person in a gallery of horizontal-view images with the query images taken from a bird's-eye view, i. e., an elevated view of an object from above.
no code implementations • ICCV 2021 • Cheng Yan, Guansong Pang, Jile Jiao, Xiao Bai, Xuetao Feng, Chunhua Shen
However, real-world ReID applications typically have highly diverse occlusions and involve a hybrid of occluded and non-occluded pedestrians.
1 code implementation • 24 Dec 2020 • Haokui Zhang, Ying Li, Hao Chen, Chengrong Gong, Zongwen Bai, Chunhua Shen
For the inner search space, we propose a layer-wise architecture sharing strategy (LWAS), resulting in more flexible architectures and better performance.
no code implementations • 21 Dec 2020 • Xinyu Zhang, Xinlong Wang, Jia-Wang Bian, Chunhua Shen, Mingyu You
Person search aims to localize and identify a specific person from a gallery of images.
1 code implementation • CVPR 2021 • Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen
Despite significant progress in monocular depth estimation in the wild, recent state-of-the-art methods cannot be used to recover accurate 3D scene shape due to an unknown depth shift induced by shift-invariant reconstruction losses used in mixed-data depth prediction training, and possible unknown camera focal length.
Ranked #1 on Indoor Monocular Depth Estimation on DIODE (using extra training data)
2 code implementations • 7 Dec 2020 • Haokui Zhang, Ying Li, Yenan Jiang, Peng Wang, Qiang Shen, Chunhua Shen
In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets.
2 code implementations • CVPR 2021 • Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen
We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.
2 code implementations • CVPR 2021 • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
Ranked #33 on Video Instance Segmentation on YouTube-VIS validation
1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen
With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.
1 code implementation • CVPR 2021 • Yutong Dai, Hao Lu, Chunhua Shen
By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features.
3 code implementations • ICCV 2021 • Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen
Observing that in semantic segmentation, some layers' feature activations of each channel tend to encode saliency of scene categories (analogue to class activation mapping), we propose to align features channel-wise between the student and teacher networks.
1 code implementation • CVPR 2021 • Tong He, Chunhua Shen, Anton Van Den Hengel
Previous top-performing approaches for point cloud instance segmentation involve a bottom-up strategy, which often includes inefficient operations or complex pipelines, such as grouping over-segmented components, introducing additional steps for refining, or designing complicated loss functions.
no code implementations • 25 Nov 2020 • Yutong Xie, Jianpeng Zhang, Zehui Liao, Yong Xia, Chunhua Shen
In this paper, we propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
no code implementations • CVPR 2021 • Dongyan Guo, Yanyan Shao, Ying Cui, Zhenhua Wang, Liyan Zhang, Chunhua Shen
We propose to establish part-to-part correspondence between the target and the search region with a complete bipartite graph, and apply the graph attention mechanism to propagate target information from the template feature to the search feature.
1 code implementation • 21 Nov 2020 • Honglei Zhang, Hu Wang, Yuanzhouhan Cao, Chunhua Shen, Yidong Li
In deep data hiding models, to maximize the encoding capacity, each pixel of the cover image ought to be treated differently since they have different sensitivities w. r. t.
1 code implementation • CVPR 2021 • Jianpeng Zhang, Yutong Xie, Yong Xia, Chunhua Shen
To address this, we propose a dynamic on-demand network (DoDNet) that learns to segment multiple organs and tumors on partially labeled datasets.
no code implementations • 19 Nov 2020 • Hao Chen, Chunhua Shen, Zhi Tian
To our knowledge, DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation by considering both efficacy and efficiency.
6 code implementations • CVPR 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.
2 code implementations • 15 Sep 2020 • Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
no code implementations • ECCV 2020 • Changqian Yu, Yifan Liu, Changxin Gao, Chunhua Shen, Nong Sang
In this paper, we present a Representative Graph (RepGraph) layer to dynamically sample a few representative features, which dramatically reduces redundancy.
no code implementations • ICCV 2021 • Peng Chen, Bohan Zhuang, Chunhua Shen
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
no code implementations • 6 Aug 2020 • Yutong Xie, Jianpeng Zhang, Zhibin Liao, Chunhua Shen, Johan Verjans, Yong Xia
In this paper, we propose the pairwise relation-based semi-supervised (PRS^2) model for gland segmentation on histology images.
2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo
Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.
1 code implementation • 28 Jul 2020 • Jiezhang Cao, Yong Guo, Qingyao Wu, Chunhua Shen, Junzhou Huang, Mingkui Tan
In this paper, rather than sampling from the predefined prior distribution, we propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
no code implementations • ECCV 2020 • Hu Wang, Qi Wu, Chunhua Shen
In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
1 code implementation • ECCV 2020 • Liang Liu, Hao Lu, Hongwei Zou, Haipeng Xiong, Zhiguo Cao, Chunhua Shen
Inspired by scale weighing, we propose a novel 'counting scale' termed LibraNet where the count value is analogized by weight.
1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen
Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.
no code implementations • 6 Jul 2020 • Guansong Pang, Chunhua Shen, Longbing Cao, Anton Van Den Hengel
This paper surveys the research of deep anomaly detection with a comprehensive taxonomy, covering advancements in three high-level categories and 11 fine-grained categories of the methods.
no code implementations • 14 Jun 2020 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications.
no code implementations • 6 Jun 2020 • Linjiang Zhang, Peng Wang, Hui Li, Zhen Li, Chunhua Shen, Yanning Zhang
On the other hand, the 2D attentional based license plate recognizer with an Xception-based CNN encoder is capable of recognizing license plates with different patterns under various scenarios accurately and robustly.
1 code implementation • 4 Jun 2020 • Jia-Wang Bian, Huangying Zhan, Naiyan Wang, Tat-Jun Chin, Chunhua Shen, Ian Reid
However, excellent results have mostly been obtained in street-scene driving scenarios, and such methods often fail in other settings, particularly indoor videos taken by handheld devices.
Ranked #49 on Monocular Depth Estimation on NYU-Depth V2
no code implementations • 11 May 2020 • Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang
Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.
4 code implementations • ECCV 2020 • Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua Shen, Xiang Bai
For example, it outperforms LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN.
7 code implementations • 5 Apr 2020 • Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang
We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for realtime semantic segmentation.
Ranked #1 on Real-Time Semantic Segmentation on COCO-Stuff
2 code implementations • CVPR 2020 • Changqian Yu, Jingbo Wang, Changxin Gao, Gang Yu, Chunhua Shen, Nong Sang
Given an input image and corresponding ground truth, Affinity Loss constructs an ideal affinity map to supervise the learning of Context Prior.
Ranked #1 on Scene Understanding on ADE20K val
1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo
To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.
Ranked #4 on Semantic Segmentation on Trans10K
1 code implementation • 27 Mar 2020 • Jianpeng Zhang, Yutong Xie, Guansong Pang, Zhibin Liao, Johan Verjans, Wenxin Li, Zongji Sun, Jian He, Yi Li, Chunhua Shen, Yong Xia
In this paper, we formulate the task of differentiating viral pneumonia from non-viral pneumonia and healthy controls into an one-class classification-based anomaly detection problem, and thus propose the confidence-aware anomaly detection (CAAD) model, which consists of a shared feature extractor, an anomaly detection module, and a confidence prediction module.
7 code implementations • CVPR 2020 • Rufeng Zhang, Zhi Tian, Chunhua Shen, Mingyu You, Youliang Yan
To date, instance segmentation is dominated by twostage methods, as pioneered by Mask R-CNN.
18 code implementations • NeurIPS 2020 • Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen
Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.
Ranked #10 on Real-time Instance Segmentation on MSCOCO
no code implementations • CVPR 2020 • Guansong Pang, Cheng Yan, Chunhua Shen, Anton Van Den Hengel, Xiao Bai
Video anomaly detection is of critical practical importance to a variety of real applications because it allows human attention to be focused on events that are likely to be of interest, in spite of an otherwise overwhelming volume of video.
5 code implementations • 15 Mar 2020 • Chi Zhang, Yujun Cai, Guosheng Lin, Chunhua Shen
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance.
7 code implementations • ECCV 2020 • Zhi Tian, Chunhua Shen, Hao Chen
We propose a simple yet effective instance segmentation framework, termed CondInst (conditional convolutions for instance segmentation).
no code implementations • 11 Mar 2020 • Genshun Dong, Yan Yan, Chunhua Shen, Hanzi Wang
Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate high-resolution feature maps preserving the detailed spatial information.
1 code implementation • ECCV 2020 • Yifan Liu, Chunhua Shen, Changqian Yu, Jingdong Wang
For semantic segmentation, most existing real-time deep models trained with each frame independently may produce inconsistent results for a video sequence.
Ranked #2 on Video Semantic Segmentation on CamVid
15 code implementations • CVPR 2020 • Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang
Our contributions are three-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve.
Ranked #9 on Text Spotting on Inverse-Text
no code implementations • CVPR 2020 • Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton Van Den Hengel, Liangwei Wang
Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize.
no code implementations • 6 Feb 2020 • Yan Yan, Ying Huang, Si Chen, Chunhua Shen, Hanzi Wang
Firstly, a facial expression synthesis generative adversarial network (FESGAN) is pre-trained to generate facial images with different facial expressions.
Facial Expression Recognition Facial Expression Recognition (FER) +1
2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin
Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.
no code implementations • 13 Jan 2020 • Canjie Luo, Qingxiang Lin, Yuliang Liu, Lianwen Jin, Chunhua Shen
Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training.
no code implementations • 13 Jan 2020 • Xin-Yu Zhang, Dong Gong, Jiewei Cao, Chunhua Shen
Due to the lack of supervision in the target domain, it is crucial to identify the underlying similarity-and-dissimilarity relationships among the unlabelled samples in the target domain.
3 code implementations • 7 Jan 2020 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Chunhua Shen, Zhiguo Cao
Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i. e., the number of population can vary in [0, inf) in theory.
no code implementations • ECCV 2020 • Tong He, Dong Gong, Zhi Tian, Chunhua Shen
3D point cloud semantic and instance segmentation is crucial and fundamental for 3D scene understanding.
Ranked #28 on 3D Instance Segmentation on ScanNet(v2)
9 code implementations • CVPR 2020 • Hao Chen, Kunyang Sun, Zhi Tian, Chunhua Shen, Yongming Huang, Youliang Yan
The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference.
Ranked #13 on Real-time Instance Segmentation on MSCOCO
no code implementations • 24 Dec 2019 • Le Zhang, Zenglin Shi, Joey Tianyi Zhou, Ming-Ming Cheng, Yun Liu, Jia-Wang Bian, Zeng Zeng, Chunhua Shen
Specifically, with a diagnostic analysis, we show that the recurrent structure may not be effective to learn temporal dependencies than what we expected and implicitly yields an orderless representation.
2 code implementations • 22 Dec 2019 • Hu Wang, Guansong Pang, Chunhua Shen, Congbo Ma
To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space.
1 code implementation • 20 Dec 2019 • Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin
More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.
Ranked #3 on Scene Text Detection on ICDAR 2017 MLT
24 code implementations • ECCV 2020 • Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei LI
We present a new, embarrassingly simple approach to instance segmentation in images.
Ranked #67 on Instance Segmentation on COCO test-dev
no code implementations • 10 Dec 2019 • Jun-Jie Zhang, Lingqiao Liu, Peng Wang, Chunhua Shen
Such imbalanced distribution causes a great challenge for learning a deep neural network, which can be boiled down into a dilemma: on the one hand, we prefer to increase the exposure of tail class samples to avoid the excessive dominance of head classes in the classifier training.
no code implementations • 20 Nov 2019 • Cheng Yan, Guansong Pang, Xiao Bai, Chunhua Shen
The loss structures the augmented images resulted by the two types of image erasing in a two-level hierarchy and enforces multifaceted attention to different parts.
6 code implementations • 19 Nov 2019 • Guansong Pang, Chunhua Shen, Anton Van Den Hengel
Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e. g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail.
Ranked #1 on Network Intrusion Detection on NB15-Backdoor
8 code implementations • 18 Nov 2019 • Zhi Tian, Hao Chen, Chunhua Shen
We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose.
Ranked #13 on Keypoint Detection on COCO test-dev
3 code implementations • NeurIPS 2019 • Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan
Multiple marginal matching problem aims at learning mappings to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation.
3 code implementations • 30 Oct 2019 • Guansong Pang, Chunhua Shen, Huidong Jin, Anton Van Den Hengel
To detect both seen and unseen anomalies, we introduce a novel deep weakly-supervised approach, namely Pairwise Relation prediction Network (PReNet), that learns pairwise relation features and anomaly scores by predicting the relation of any two randomly sampled training instances, in which the pairwise relation can be anomaly-anomaly, anomaly-unlabeled, or unlabeled-unlabeled.
2 code implementations • CVPR 2020 • Enze Xie, Peize Sun, Xiaoge Song, Wenhai Wang, Ding Liang, Chunhua Shen, Ping Luo
In this paper, we introduce an anchor-box free and single shot instance segmentation method, which is conceptually simple, fully convolutional and can be used as a mask prediction module for instance segmentation, by easily embedding it into most off-the-shelf detection methods.
Ranked #100 on Instance Segmentation on COCO test-dev
no code implementations • 22 Sep 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid
Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.
1 code implementation • CVPR 2020 • Haokui Zhang, Ying Li, Hao Chen, Chunhua Shen
We also present analysis on the architectures found by NAS.
1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen
In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.
1 code implementation • 16 Sep 2019 • Wenjia Wang, Enze Xie, Peize Sun, Wenhai Wang, Lixun Tian, Chunhua Shen, Ping Luo
Nonetheless, most of the previous methods may not work well in recognizing text with low resolution which is often seen in natural scene images.
1 code implementation • 13 Sep 2019 • Xin-Yu Zhang, Rufeng Zhang, Jiewei Cao, Dong Gong, Mingyu You, Chunhua Shen
Finally, we aggregate the global appearance and part features to improve the feature performance further.
no code implementations • 5 Sep 2019 • Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin
The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.
2 code implementations • NeurIPS 2019 • Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen, Ming-Ming Cheng, Ian Reid
To the best of our knowledge, this is the first work to show that deep networks trained using unlabelled monocular videos can predict globally scale-consistent camera trajectories over a long video sequence.
Ranked #61 on Monocular Depth Estimation on KITTI Eigen split
6 code implementations • ICCV 2019 • Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, Chunhua Shen
Recently, some methods have been proposed to tackle arbitrary-shaped text detection, but they rarely take the speed of the entire pipeline into consideration, which may fall short in practical applications. In this paper, we propose an efficient and accurate arbitrary-shaped text detector, termed Pixel Aggregation Network (PAN), which is equipped with a low computational-cost segmentation head and a learnable post-processing.
Ranked #8 on Scene Text Detection on SCUT-CTW1500
5 code implementations • ICCV 2019 • Haipeng Xiong, Hao Lu, Chengxin Liu, Liang Liu, Zhiguo Cao, Chunhua Shen
A dense region can always be divided until sub-region counts are within the previously observed closed set.
Ranked #3 on Crowd Counting on TRANCOS
6 code implementations • 11 Aug 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu
By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.
Ranked #2 on Grayscale Image Denoising on Set12 sigma30
no code implementations • 11 Aug 2019 • Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong
To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder.
no code implementations • 10 Aug 2019 • Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen
Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.
2 code implementations • ICCV 2019 • Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan
The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion.
Ranked #5 on Monocular Depth Estimation on Mid-Air Dataset
1 code implementation • ICCV 2019 • Hao Lu, Yutong Dai, Chunhua Shen, Songcen Xu
We show that existing upsampling operators can be unified with the notion of the index function.
1 code implementation • ICCV 2019 • Xin-Yu Zhang, Jiewei Cao, Chunhua Shen, Mingyu You
In this work, we develop a self-training method with progressive augmentation framework (PAST) to promote the model performance progressively on the target dataset.
Ranked #12 on Unsupervised Domain Adaptation on Market to Duke
3 code implementations • ICCV 2019 • Wei Yin, Yifan Liu, Chunhua Shen, Youliang Yan
Monocular depth prediction plays a crucial role in understanding 3D scene geometry.
Ranked #10 on Depth Estimation on NYU-Depth V2
1 code implementation • 29 Jul 2019 • Tong Shen, Dong Gong, Wei zhang, Chunhua Shen, Tao Mei
To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.
no code implementations • 29 Jul 2019 • Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced.
no code implementations • 14 Jun 2019 • Peng Wang, Hui Li, Chunhua Shen
Text spotting in natural scene images is of great importance for many image understanding tasks.
3 code implementations • CVPR 2020 • Ning Wang, Yang Gao, Hao Chen, Peng Wang, Zhi Tian, Chunhua Shen, Yanning Zhang
The success of deep neural networks relies on significant architecture engineering.
Ranked #124 on Object Detection on COCO test-dev
5 code implementations • CVPR 2019 • Qingsen Yan, Dong Gong, Qinfeng Shi, Anton Van Den Hengel, Chunhua Shen, Ian Reid, Yanning Zhang
Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for dynamic scenes.
1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel
One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.
1 code implementation • 4 Apr 2019 • Vladimir Nekrasov, Chunhua Shen, Ian Reid
Automatic search of neural architectures for various vision and natural language tasks is becoming a prominent tool as it allows to discover high-performing structures on any dataset of interest.
Ranked #13 on Semantic Segmentation on CamVid
no code implementations • 4 Apr 2019 • Vladimir Nekrasov, Hao Chen, Chunhua Shen, Ian Reid
In semantic video segmentation the goal is to acquire consistent dense semantic labelling across image frames.
86 code implementations • ICCV 2019 • Zhi Tian, Chunhua Shen, Hao Chen, Tong He
By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.
Ranked #4 on Pedestrian Detection on TJU-Ped-campus