Search Results for author: Zhanghui Kuang

Found 17 papers, 14 papers with code

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

1 code implementation • 14 Dec 2021 • Yi Li, Yiqun Duan, Zhanghui Kuang, Yimin Chen, Wayne Zhang, Xiaomeng Li

So we try to improve WSSS in the aspect of noise mitigation.

Ranked #23 on Weakly-Supervised Semantic Segmentation on COCO 2014 val

Saliency Detection Segmentation +2

Paper
Code

Pseudo-mask Matters in Weakly-supervised Semantic Segmentation

2 code implementations • ICCV 2021 • Yi Li, Zhanghui Kuang, Liyang Liu, Yimin Chen, Wayne Zhang

For these matters, we propose the following designs to push the performance to new state-of-art: (i) Coefficient of Variation Smoothing to smooth the CAMs adaptively; (ii) Proportional Pseudo-mask Generation to project the expanded CAMs to pseudo-mask based on a new metric indicating the importance of each class on each location, instead of the scores trained from binary classifiers.

Ranked #27 on Weakly-Supervised Semantic Segmentation on COCO 2014 val

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Code

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

1 code implementation • 14 Aug 2021 • Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin

We present MMOCR-an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction.

Key Information Extraction named-entity-recognition +4

4,142

Paper
Code

Vision Transformer with Progressive Sampling

1 code implementation • ICCV 2021 • Xiaoyu Yue, Shuyang Sun, Zhanghui Kuang, Meng Wei, Philip Torr, Wayne Zhang, Dahua Lin

As a typical example, the Vision Transformer (ViT) directly applies a pure transformer architecture on image classification, by simply splitting images into tokens with a fixed length, and employing transformers to learn relations between these tokens.

Image Classification

148

Paper
Code

Group Fisher Pruning for Practical Network Compression

2 code implementations • 2 Aug 2021 • Liyang Liu, Shilong Zhang, Zhanghui Kuang, Aojun Zhou, Jing-Hao Xue, Xinjiang Wang, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

Our method can be used to prune any structures including those with coupled channels.

Ranked #4 on Network Pruning on ImageNet

Image Classification Network Pruning +2

150

Paper
Code

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

8 code implementations • CVPR 2021 2021 • Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, Wayne Zhang

One of the main challenges for arbitrary-shaped text detection is to design a good text instance representation that allows networks to learn diverse text geometry variances.

Scene Text Detection Text Detection

39,605

Paper
Code

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

2 code implementations • 26 Mar 2021 • Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, Wayne Zhang

In order to roundly evaluate our proposed method as well as boost the future research, we release a new dataset named WildReceipt, which is collected and annotated tailored for the evaluation of key information extraction from document images of unseen templates in the wild.

Key Information Extraction Template Matching

39,605

Paper
Code

Towards Impartial Multi-task Learning

2 code implementations • ICLR 2021 • Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

Multi-task learning (MTL) has been widely used in representation learning.

Multi-Task Learning Representation Learning

195

Paper
Code

Context-Aware RCNN: A Baseline for Action Detection in Videos

3 code implementations • ECCV 2020 • Jianchao Wu, Zhanghui Kuang, Li-Min Wang, Wayne Zhang, Gangshan Wu

In this work, we first empirically find the recognition accuracy is highly correlated with the bounding box size of an actor, and thus higher resolution of actors contributes to better performance.

Action Detection Action Recognition

Paper
Code

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

4 code implementations • ECCV 2020 • Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, Wayne Zhang

Theoretically, our proposed method, dubbed \emph{RobustScanner}, decodes individual characters with dynamic ratio between context and positional clues, and utilizes more positional ones when the decoding sequences with scarce context, and thus is robust and practical.

Decoder Irregular Text Recognition +2

39,605

Paper
Code

Geometry Normalization Networks for Accurate Scene Text Detection

1 code implementation • ICCV 2019 • Youjiang Xu, Jiaqi Duan, Zhanghui Kuang, Xiaoyu Yue, Hongbin Sun, Yue Guan, Wayne Zhang

Large geometry (e. g., orientation) variances are the key challenges in the scene text detection.

Ranked #10 on Scene Text Detection on ICDAR 2017 MLT

Scene Text Detection Text Detection

Paper
Code

Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

no code implementations • ICCV 2019 • Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, Wayne Zhang

To address this issue, we propose a novel Graph Reasoning Network (GRNet) on a Similarity Pyramid, which learns similarities between a query and a gallery cloth by using both global and local representations in multiple scales.

Ranked #4 on Image Retrieval on DeepFashion - Consumer-to-shop (Rank-1 metric)

Image Retrieval Retrieval

Paper
Add Code

Data-Driven Neuron Allocation for Scale Aggregation Networks

1 code implementation • CVPR 2019 • Yi Li, Zhanghui Kuang, Yimin Chen, Wayne Zhang

The most informative output neurons in each block are preserved while others are discarded, and thus neurons for multiple scales are competitively and adaptively allocated.

Ranked #703 on Image Classification on ImageNet

Image Classification object-detection +1

Paper
Code

Learning Efficient Detector with Semi-supervised Adaptive Distillation

1 code implementation • 2 Jan 2019 • Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wei zhang, Yimin Chen

ADL enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples, enabling distillation to work on the single-stage detector first time, even if the student and the teacher are identical.

Image Classification Knowledge Distillation +1

Paper
Code

Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos

no code implementations • 15 Aug 2018 • Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang

Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.

Action Recognition In Videos Temporal Action Localization

Paper
Add Code

Boosting up Scene Text Detectors with Guided CNN

no code implementations • 10 May 2018 • Xiaoyu Yue, Zhanghui Kuang, Zhaoyang Zhang, Zhenfang Chen, Pan He, Yu Qiao, Wei zhang

Deep CNNs have achieved great success in text detection.

Text Detection

Paper
Add Code

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation • CVPR 2018 • Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Ranked #36 on Action Recognition on UCF101

Action Recognition In Videos Optical Flow Estimation +1

196

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.