no code implementations • ECCV 2020 • Kaihao Zhang, Wenhan Luo, Wenqi Ren, Jingwen Wang Fang Zhao, Lin Ma , Hongdong Li
Moreover, even for single image based monocular deraining, many current methods fail to complete the task satisfactorily because they mostly rely on per pixel loss functions and ignoring semantic information.
no code implementations • 5 Jun 2024 • Jingyun Xue, Hongfa Wang, Qi Tian, Yue Ma, Andong Wang, Zhiyuan Zhao, Shaobo Min, Wenzhe Zhao, Kaihao Zhang, Heung-Yeung Shum, Wei Liu, Mengyang Liu, Wenhan Luo
While existing character image animation methods using pose sequences and reference images have shown promising performance, they tend to struggle with incoherent animation in complex scenarios, such as multiple character animation and body occlusion.
no code implementations • 27 May 2024 • Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Qixun Zhang, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo
Here, we construct the audio ControlNet through a trainable copy of our pre-trained diffusion model.
no code implementations • 19 May 2024 • Peng Li, YuAn Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo
Specifically, these methods assume that the input images should comply with a predefined camera type, e. g. a perspective camera with a fixed focal length, leading to distorted shapes when the assumption fails.
1 code implementation • 18 May 2024 • Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang
Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities.
Ranked #85 on Visual Question Answering on MM-Vet
no code implementations • 9 May 2024 • Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma
We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task.
2 code implementations • ICCV 2023 • Shan Wang, Chuong Nguyen, Jiawei Liu, Kaihao Zhang, Wenhan Luo, Yanhao Zhang, Sundaram Muthu, Fahira Afzal Maken, Hongdong Li
Reliable segmentation of road lines and markings is critical to autonomous driving.
no code implementations • 29 Mar 2024 • Yanyan Shao, Shuting He, Qi Ye, Yuchao Feng, Wenhan Luo, Jiming Chen
Tracking by natural language specification (TNL) aims to consistently localize a target in a video sequence given a linguistic description in the initial frame.
1 code implementation • 16 Mar 2024 • Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, GuanYing Chen, Wei Liu, Wenhan Luo
We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout.
no code implementations • 11 Mar 2024 • Zhenbo Song, Wenhao Gao, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu
Extensive experiments demonstrate the efficacy of the degradation objective on state-of-the-art face restoration models.
no code implementations • 9 Mar 2024 • Jingyun Xue, Tao Wang, Jun Wang, Kaihao Zhang, Wenhan Luo, Wenqi Ren, Zikun Liu, Hyunhee Park, Xiaochun Cao
Specifically, we utilize sparse self-attention to filter out redundant information and noise, directing the model's attention to focus on the features more relevant to the degraded regions in need of reconstruction.
1 code implementation • 7 Mar 2024 • Tao Zhou, Wenhan Luo, Qi Ye, Zhiguo Shi, Jiming Chen
Recently, promptable segmentation models, such as the Segment Anything Model (SAM), have demonstrated robust zero-shot generalization capabilities on static images.
1 code implementation • 21 Feb 2024 • Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang
For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language model-based decoder to generate the product description.
no code implementations • 21 Feb 2024 • Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu
This study delves into the enhancement of Under-Display Camera (UDC) image restoration models, focusing on their robustness against adversarial attacks.
1 code implementation • 4 Feb 2024 • Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang
For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts.
no code implementations • 2 Jan 2024 • Zhe Kong, Wentian Zhang, Tao Wang, Kaihao Zhang, Yuexiang Li, Xiaoying Tang, Wenhan Luo
In this paper, we propose a domain adversarial attack (DAA) method to mitigate the training instability problem by adding perturbations to the input images, which makes them indistinguishable across domains and enables domain alignment.
1 code implementation • 25 Dec 2023 • Xiaoxu Chen, Jingfan Tan, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaochun Cao
We propose BFRffusion which is thoughtfully designed to effectively extract features from low-quality face images and could restore realistic and faithful facial details with the generative prior of the pretrained Stable Diffusion.
1 code implementation • 29 Nov 2023 • Xiaowei Chi, Rongyu Zhang, Zhengkai Jiang, Yijiang Liu, Yatian Wang, Xingqun Qi, Wenhan Luo, Peng Gao, Shanghang Zhang, Qifeng Liu, Yike Guo
Moreover, to further enhance the effectiveness of $M^{3}Adapter$ while preserving the coherence of semantic context comprehension, we introduce a two-stage $M^{3}FT$ fine-tuning strategy.
no code implementations • 29 Nov 2023 • Xingqun Qi, Jiahao Pan, Peng Li, Ruibin Yuan, Xiaowei Chi, Mengfei Li, Wenhan Luo, Wei Xue, Shanghang Zhang, Qifeng Liu, Yike Guo
In addition, the lack of large-scale available datasets with emotional transition speech and corresponding 3D human gestures also limits the addressing of this task.
no code implementations • 9 Sep 2023 • Xuanxi Chen, Tao Wang, Ziqian Shao, Kaihao Zhang, Wenhan Luo, Tong Lu, Zikun Liu, Tae-Kyun Kim, Hongdong Li
With the pipeline, we build the first large-scale UDC video restoration dataset called PexelsUDC, which includes two subsets named PexelsUDC-T and PexelsUDC-P corresponding to different displays for UDC.
1 code implementation • ICCV 2023 • Yuwei Qiu, Kaihao Zhang, Chenxi Wang, Wenhan Luo, Hongdong Li, Zhi Jin
To address this issue, we propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity.
no code implementations • 20 Aug 2023 • Jingfan Tan, Xiaoxu Chen, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaocun Cao
However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation.
no code implementations • 6 Aug 2023 • Yanyan Shao, Qi Ye, Wenhan Luo, Kaihao Zhang, Jiming Chen
Understanding human interaction with objects is an important research topic for embodied Artificial Intelligence and identifying the objects that humans are interacting with is a primary problem for interaction understanding.
1 code implementation • 1 Aug 2023 • Zhenyuan Zhang, Zhenbo Song, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu
To the best of our knowledge, these two datasets are the first largest-scale UHD datasets for SIRR.
1 code implementation • 27 Jul 2023 • Zhifeng Wang, Kaihao Zhang, Wenhan Luo, Ramesh Sankaranarayana
The transformer layer is used to focus on representing local minor muscle movement with local self-attention in each area.
Ranked #1 on Micro-Expression Recognition on CASME II
1 code implementation • 27 Jul 2023 • Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tae-Kyun Kim, Wei Liu, Hongdong Li
In this paper, we address this limitation by proposing a degradation-aware learning scheme for LLIE using diffusion models, which effectively integrates degradation and image priors into the diffusion process, resulting in improved image enhancement.
1 code implementation • ICCV 2023 • Pujin Cheng, Li Lin, Junyan Lyu, Yijin Huang, Wenhan Luo, Xiaoying Tang
In this paper, we present a prototype representation learning framework incorporating both global and local alignment between medical images and reports.
no code implementations • 20 Jul 2023 • Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang
There is a crucial limitation: these works assume the dataset used for training the target model to be known beforehand and leverage this dataset for model attribute attack.
no code implementations • 29 May 2023 • Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, Hongdong Li
Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer.
1 code implementation • 8 Mar 2023 • Puijin Cheng, Li Lin, Yijin Huang, Huaqing He, Wenhan Luo, Xiaoying Tang
In this paper, we introduce a novel diffusion model based framework, named Learning Enhancement from Degradation (LED), for enhancing fundus images.
no code implementations • ICCV 2023 • Tao Zhou, Qi Ye, Wenhan Luo, Kaihao Zhang, Zhiguo Shi, Jiming Chen
Multi-object tracking (MOT) aims to build moving trajectories for number-agnostic objects.
1 code implementation • CVPR 2023 • Zhenbo Song, Zhenyuan Zhang, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Wenqi Ren, Jianfeng Lu
This paper addresses the problem of robust deep single-image reflection removal (SIRR) against adversarial attacks.
Ranked #2 on Reflection Removal on Real20
1 code implementation • 22 Dec 2022 • Tao Wang, Kaihao Zhang, Tianrun Shen, Wenhan Luo, Bjorn Stenger, Tong Lu
In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.
no code implementations • 22 Dec 2022 • Tao Wang, Guangpin Tao, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Xiaoqin Zhang, Tong Lu
HCD consists of a hierarchical dehazing network (HDN) and a novel hierarchical contrastive loss (HCL).
1 code implementation • 5 Nov 2022 • Tao Wang, Kaihao Zhang, Xuanxi Chen, Wenhan Luo, Jiankang Deng, Tong Lu, Xiaochun Cao, Wei Liu, Hongdong Li, Stefanos Zafeiriou
Second, we discuss the challenges of face restoration.
no code implementations • 5 Sep 2022 • Zhiyuan You, Kai Yang, Wenhan Luo, Lei Cui, Yu Zheng, Xinyi Le
Second, CNN tends to reconstruct both normal samples and anomalies well, making them still hard to distinguish.
1 code implementation • 28 Jun 2022 • Yanjiang Yu, Puyang Zhang, Kaihao Zhang, Wenhan Luo, Changsheng Li, Ye Yuan, Guoren Wang
To this end, we propose a Face Restoration Searching Network (FRSNet) to adaptively search the suitable feature extraction architecture within our specified search space, which can directly contribute to the restoration quality.
2 code implementations • 8 Jun 2022 • Puyang Zhang, Kaihao Zhang, Wenhan Luo, Changsheng Li, Guoren Wang
To address this problem, we first synthesize two blind face restoration benchmark datasets called EDFace-Celeb-1M (BFR128) and EDFace-Celeb-150K (BFR512).
1 code implementation • CVPR 2022 • Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, Zhouhui Lian
To train and evaluate our approach, we construct a dataset named as TextLogo3K, consisting of about 3, 500 text logo images and their pixel-level annotations.
no code implementations • 26 Jan 2022 • Kaihao Zhang, Wenqi Ren, Wenhan Luo, Wei-Sheng Lai, Bjorn Stenger, Ming-Hsuan Yang, Hongdong Li
Image deblurring is a classic problem in low-level computer vision with the aim to recover a sharp image from a blurred input image.
1 code implementation • 22 Jan 2022 • Zhiyuan You, Kai Yang, Wenhan Luo, Xin Lu, Lei Cui, Xinyi Le
This work studies the problem of few-shot object counting, which counts the number of exemplar objects (i. e., described by one or several support images) occurring in the query image.
Ranked #2 on Object Counting on CARPK
2 code implementations • 1 Dec 2021 • Kaihao Zhang, Tao Wang, Wenhan Luo, Boheng Chen, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Blur artifacts can seriously degrade the visual quality of images, and numerous deblurring methods have been proposed for specific scenarios.
1 code implementation • 11 Oct 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Jingyu Liu, Jiankang Deng, Wei Liu, Stefanos Zafeiriou
It is thus unclear how these algorithms perform on public face hallucination datasets.
Ranked #1 on Image Super-Resolution on WLFW
1 code implementation • 9 Sep 2021 • Zhe Kong, Wentian Zhang, Feng Liu, Wenhan Luo, Haozhe Liu, Linlin Shen, Raghavendra Ramachandra
Even though there are numerous Presentation Attack Detection (PAD) techniques based on both deep learning and hand-crafted features, the generalization of PAD for unknown PAI is still a challenging problem.
no code implementations • 18 Jun 2021 • Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang
In active visual tracking, it is notoriously difficult when distracting objects appear, as distractors often mislead the tracker by occluding the target or bringing a confusing appearance.
no code implementations • 5 Jun 2021 • Lirong Zheng, Yanshan Li, Kaihao Zhang, Wenhan Luo
In order to reduce network parameters, the intra-stage recursive computation of ResNet is adopted in our Stack T-Net.
no code implementations • 27 May 2021 • Wenjia Niu, Kaihao Zhang, Wenhan Luo, Yiran Zhong
Single-image super-resolution (SR) and multi-frame SR are two ways to super resolve low-resolution images.
no code implementations • 9 May 2021 • Kaihao Zhang, Wenhan Luo, Yanjiang Yu, Wenqi Ren, Fang Zhao, Changsheng Li, Lin Ma, Wei Liu, Hongdong Li
We first use a coarse deraining network to reduce the rain streaks on the input images, and then adopt a pre-trained semantic segmentation network to extract semantic features from the coarse derained image.
1 code implementation • 23 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Wei Liu
Video deraining is an important task in computer vision as the unwanted rain hampers the visibility of videos and deteriorates the robustness of most outdoor vision systems.
no code implementations • 21 Mar 2021 • Kaihao Zhang, Rongqing Li, Yanjiang Yu, Wenhan Luo, Changsheng Li, Hongdong Li
Images captured in snowy days suffer from noticeable degradation of scene visibility, which degenerates the performance of current vision-based intelligent systems.
no code implementations • 12 Mar 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren
In addition, to further refine the result, a Differential-driven Dual Attention-in-Attention Model (D-DAiAM) is proposed with a "heavy-to-light" scheme to remove rain via addressing the unsatisfying deraining regions.
no code implementations • ICCV 2021 • Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Bjorn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang
Increasingly, modern mobile devices allow capturing images at Ultra-High-Definition (UHD) resolution, which includes 4K and 8K images.
2 code implementations • 18 Nov 2020 • Wen Liu, Zhixin Piao, Zhi Tu, Wenhan Luo, Lin Ma, Shenghua Gao
Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.
1 code implementation • CVPR 2020 • Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Bjorn Stenger, Wei Liu, Hongdong Li
To address this problem, we propose a new method which combines two GAN models, i. e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images.
Ranked #20 on Deblurring on HIDE (trained on GOPRO)
no code implementations • 25 Jan 2020 • Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong
In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.
no code implementations • CVPR 2020 • Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, Jiebo Luo
In this paper, we aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks.
no code implementations • 18 Dec 2019 • Tianrui Liu, Wenhan Luo, Lin Ma, Jun-Jie Huang, Tania Stathaki, Tianhong Dai
Ablation studies have validated the effectiveness of both the proposed gated multi-layer feature extraction sub-network and the deformable occlusion handling sub-network.
2 code implementations • ICCV 2019 • Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao
In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape.
1 code implementation • ACL 2019 • Zhenfang Chen, Lin Ma, Wenhan Luo, Kwan-Yee K. Wong
In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video.
no code implementations • ICLR 2019 • Fangwei Zhong, Peng Sun, Wenhan Luo, Tingyun Yan, Yizhou Wang
In AD-VAT, both the tracker and the target are approximated by end-to-end neural networks, and are trained via RL in a dueling/competitive manner: i. e., the tracker intends to lockup the target, while the target tries to escape from the tracker.
6 code implementations • CVPR 2019 • Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu
We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.
Ranked #7 on Panoptic Scene Graph Generation on PSG Dataset
1 code implementation • 4 Nov 2018 • Zechun Liu, Wenhan Luo, Baoyuan Wu, Xin Yang, Wei Liu, Kwang-Ting Cheng
To address the training difficulty, we propose a training algorithm using a tighter approximation to the derivative of the sign function, a magnitude-aware gradient for weight updating, a better initialization method, and a two-step scheme for training a deep network.
no code implementations • 10 Aug 2018 • Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang
We further propose an environment augmentation technique and a customized reward function, which are crucial for successful training.
4 code implementations • ECCV 2018 • Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, Kwang-Ting Cheng
In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary.
1 code implementation • 28 Mar 2018 • Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Wei Liu, Hongdong Li
To tackle the second challenge, we leverage the developed DBLRNet as a generator in the GAN (generative adversarial network) architecture, and employ a content loss in addition to an adversarial loss for efficient adversarial training.
3 code implementations • CVPR 2018 • Wei Xiong, Wenhan Luo, Lin Ma, Wei Liu, Jiebo Luo
The first stage generates videos of realistic contents for each frame.
no code implementations • CVPR 2017 • Hao-Zhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, Wei Liu
More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames.
no code implementations • ICML 2018 • Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang
We study active object tracking, where a tracker takes as input the visual observation (i. e., frame sequence) and produces the camera control signal (e. g., move forward, turn left, etc.).
no code implementations • 26 Sep 2014 • Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, Tae-Kyun Kim
We inspect the recent advances in various aspects and propose some interesting directions for future research.
no code implementations • CVPR 2014 • Wenhan Luo, Tae-Kyun Kim, Bjorn Stenger, Xiaowei Zhao, Roberto Cipolla
In this paper, we propose a label propagation framework to handle the multiple object tracking (MOT) problem for a generic object type (cf.
no code implementations • CVPR 2014 • Xiaowei Zhao, Tae-Kyun Kim, Wenhan Luo
In this paper, we present a unified method for joint face image analysis, i. e., simultaneously estimating head pose, facial expression and landmark positions in real-world face images.