1 code implementation • 14 Dec 2021 • Yi Li, Yiqun Duan, Zhanghui Kuang, Yimin Chen, Wayne Zhang, Xiaomeng Li
So we try to improve WSSS in the aspect of noise mitigation.
Ranked #23 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
2 code implementations • ICCV 2021 • Yi Li, Zhanghui Kuang, Liyang Liu, Yimin Chen, Wayne Zhang
For these matters, we propose the following designs to push the performance to new state-of-art: (i) Coefficient of Variation Smoothing to smooth the CAMs adaptively; (ii) Proportional Pseudo-mask Generation to project the expanded CAMs to pseudo-mask based on a new metric indicating the importance of each class on each location, instead of the scores trained from binary classifiers.
Ranked #27 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
1 code implementation • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin
We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.
1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin
As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.
2 code implementations • 2 Aug 2021 • Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang
Our method can be used to prune any structures including those with coupled channels.
Ranked #4 on Network Pruning on ImageNet
8 code implementations • CVPR 2021 2021 • Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, Wayne Zhang
One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances.
2 code implementations • 26 Mar 2021 • Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, Wayne Zhang
In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild.
2 code implementations • ICLR 2021 • Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang
Multi-task learning (MTL) has been widely used in representation learning.
3 code implementations • ECCV 2020 • Jianchao Wu, Zhanghui Kuang, Li-Min Wang, Wayne Zhang, Gangshan Wu
In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.
4 code implementations • ECCV 2020 • Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, Wayne Zhang
Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical.
1 code implementation • ICCV 2019 • Youjiang Xu, Jiaqi Duan, Zhanghui Kuang, Xiaoyu Yue, Hongbin Sun, Yue Guan, Wayne Zhang
Large geometry (e. g., orientation) variances are the key challenges in the scene text detection.
Ranked #10 on Scene Text Detection on ICDAR 2017 MLT
no code implementations • ICCV 2019 • Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang
To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.
Ranked #4 on Image Retrieval on DeepFashion - Consumer-to-shop (Rank-1 metric)
1 code implementation • CVPR 2019 • Yi Li, Zhanghui Kuang, Yimin Chen, Wayne Zhang
The most informative output neurons in each block are preserved while others are discarded, and thus neurons for multiple scales are competitively and adaptively allocated.
Ranked #703 on Image Classification on ImageNet
1 code implementation • 2 Jan 2019 • Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wei zhang, Yimin Chen
ADL enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples, enabling distillation to work on the single-stage detector first time, even if the student and the teacher are identical.
no code implementations • 15 Aug 2018 • Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang
Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.
no code implementations • 10 May 2018 • Xiaoyu Yue, Zhanghui Kuang, Zhaoyang Zhang, Zhenfang Chen, Pan He, Yu Qiao, Wei zhang
Deep CNNs have achieved great success in text detection.
1 code implementation • CVPR 2018 • Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang
In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.
Ranked #36 on Action Recognition on UCF101