Search Results for author: Xiaodong Cun

Found 46 papers, 37 papers with code

ZeroPur: Succinct Training-Free Adversarial Purification

1 code implementation • 5 Jun 2024 • Xiuli Bi, Zonglin Yang, Bo Liu, Xiaodong Cun, Chi-Man Pun, Pietro Lio, Bin Xiao

In this work, we suppose that adversarial images are outliers of the natural image manifold and the purification process can be considered as returning them to this manifold.

Adversarial Purification

618

Paper
Code

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

1 code implementation • 3 Jun 2024 • Shaoshu Yang, Yong Zhang, Xiaodong Cun, Ying Shan, Ran He

Previous methods promote the frame rate by either training a video interpolation model in pixel space as a postprocessing stage or training an interpolation model in latent space for a specific base video model.

Video Generation

Paper
Code

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

1 code implementation • 30 May 2024 • Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng

We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations.

Image Animation Video Generation

131

Paper
Code

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

1 code implementation • 30 May 2024 • Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, WenBo Hu, Ying Shan

Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization.

Quantization

129

Paper
Code

Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework

1 code implementation • 25 Mar 2024 • Ziyao Huang, Fan Tang, Yong Zhang, Xiaodong Cun, Juan Cao, Jintao Li, Tong-Yee Lee

We adopt a two-stage training strategy for the diffusion model, effectively binding movements with specific appearances.

Denoising

263

Paper
Code

Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

no code implementations • 7 Mar 2024 • Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cun

In this work, we introduce a test-time training (TTT) strategy to address the problem.

Depth Estimation Depth Prediction +4

Paper
Add Code

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

1 code implementation • 16 Feb 2024 • Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.

Video Generation

Paper
Code

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

2 code implementations • 17 Jan 2024 • Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, Ying Shan

Based on this stronger coupling, we shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.

Ranked #1 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,208

Paper
Code

Towards A Better Metric for Text-to-Video Generation

no code implementations • 15 Jan 2024 • Jay Zhangjie Wu, Guian Fang, HaoNing Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, YuChao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou

Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.

Text-to-Video Generation Video Alignment +1

Paper
Add Code

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

1 code implementation • 11 Dec 2023 • Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan

Both quantitative and qualitative results on this evaluation dataset indicate that our SmartEdit surpasses previous methods, paving the way for the practical application of complex instruction-based image editing.

170

Paper
Code

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

1 code implementation • 6 Dec 2023 • Jiwen Yu, Xiaodong Cun, Chenyang Qi, Yong Zhang, Xintao Wang, Ying Shan, Jian Zhang

For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation for ensuring the generated first frame is equal to the given generated image.

Image Animation Video Generation

345

Paper
Code

MagicStick: Controllable Video Editing via Control Handle Transformations

1 code implementation • 5 Dec 2023 • Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.

Video Editing Video Generation

Paper
Code

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

no code implementations • 4 Dec 2023 • Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou

To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model.

Denoising

Paper
Add Code

Sketch Video Synthesis

1 code implementation • 26 Nov 2023 • Yudian Zheng, Xiaodong Cun, Menghan Xia, Chi-Man Pun

Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos.

Video Editing

189

Paper
Code

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

3 code implementations • 30 Oct 2023 • Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Ranked #3 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,208

Paper
Code

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

1 code implementation • 17 Oct 2023 • Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan

For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos.

Benchmarking Language Modelling +4

Paper
Code

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

1 code implementation • 11 Oct 2023 • Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

Image Generation

454

Paper
Code

LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation

1 code implementation • ICCV 2023 • YiHao Zhi, Xiaodong Cun, Xuelin Chen, Xi Shen, Wen Guo, Shaoli Huang, Shenghua Gao

While previous methods are able to generate speech rhythm-synchronized gestures, the semantic context of the speech is generally lacking in the gesticulations.

Gesture Generation

Paper
Code

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net

2 code implementations • ICCV 2023 • Zinuo Li, Xuhang Chen, Chi-Man Pun, Xiaodong Cun

We handle high-resolution document shadow removal directly via a larger-scale real-world dataset and a carefully designed frequency-aware network.

Document Shadow Removal Image Shadow Removal

191

Paper
Code

ToonTalker: Cross-Domain Face Reenactment

no code implementations • ICCV 2023 • Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, Yujiu Yang

Moreover, since no paired data is provided, we propose a novel cross-domain training scheme using data from two domains with the designed analogy constraint.

Face Reenactment Talking Face Generation

Paper
Add Code

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

1 code implementation • 13 Jul 2023 • Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen

For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.

Retrieval Video Generation +2

241

Paper
Code

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations • 1 Jun 2023 • Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

Paper
Add Code

Inserting Anybody in Diffusion Models via Celeb Basis

1 code implementation • NeurIPS 2023 • Ge Yuan, Xiaodong Cun, Yong Zhang, Maomao Li, Chenyang Qi, Xintao Wang, Ying Shan, Huicheng Zheng

Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods.

241

Paper
Code

TaleCrafter: Interactive Story Visualization with Multiple Characters

1 code implementation • 29 May 2023 • Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.

Story Visualization Text-to-Image Generation

245

Paper
Code

Explicit Visual Prompting for Universal Foreground Segmentations

2 code implementations • 29 May 2023 • Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun

We take inspiration from the widely-used pre-training and then prompt tuning protocols in NLP and propose a new visual prompting model, named Explicit Visual Prompting (EVP).

Ranked #1 on Salient Object Detection on DUT-OMRON

Camouflaged Object Segmentation Defocus Blur Detection +5

169

Paper
Code

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

1 code implementation • 3 Apr 2023 • Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Ying Shan, Xiu Li, Qifeng Chen

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.

Text-to-Image Generation Text-to-Video Generation +1

1,051

Paper
Code

Explicit Visual Prompting for Low-Level Structure Segmentations

1 code implementation • CVPR 2023 • Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun

Different from the previous visual prompting which is typically a dataset-level implicit embedding, our key insight is to enforce the tunable parameters focusing on the explicit visual content from each individual image, i. e., the features from frozen patch embeddings and the input's high-frequency components.

Ranked #1 on Salient Object Detection on HKU-IS

Camouflaged Object Segmentation Defocus Blur Detection +5

169

Paper
Code

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

1 code implementation • ICCV 2023 • Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Attribute Text-to-Video Editing +2

1,062

Paper
Code

CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying

1 code implementation • 15 Mar 2023 • Weihuang Liu, Xiaodong Cun, Chi-Man Pun, Menghan Xia, Yong Zhang, Jue Wang

Thanks to the proposed structure, we only encode the high-resolution image in a relatively low resolution for larger reception field capturing.

Image Inpainting Vocal Bursts Intensity Prediction

Paper
Code

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

1 code implementation • CVPR 2023 • Youxin Pang, Yong Zhang, Weize Quan, Yanbo Fan, Xiaodong Cun, Ying Shan, Dong-Ming Yan

In this paper, we introduce a novel self-supervised disentanglement framework to decouple pose and expression without 3DMMs and paired data, which consists of a motion editing module, a pose generator, and an expression generator.

Disentanglement Talking Face Generation +1

408

Paper
Code

T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

1 code implementation • 15 Jan 2023 • Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Shaoli Huang, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen

Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach.

Ranked #11 on Motion Synthesis on HumanML3D

Motion Synthesis

540

Paper
Code

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

1 code implementation • CVPR 2023 • Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.

Ranked #4 on 3D Face Animation on BEAT2

3D Face Animation regression

484

Paper
Code

Generating Human Motion From Textual Descriptions With Discrete Representations

no code implementations • CVPR 2023 • Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Yong Zhang, Hongwei Zhao, Hongtao Lu, Xi Shen, Ying Shan

Additionally, we conduct analyses on HumanML3D and observe that the dataset size is a limitation of our approach.

Paper
Add Code

3D GAN Inversion with Facial Symmetry Prior

no code implementations • CVPR 2023 • Fei Yin, Yong Zhang, Xuan Wang, Tengfei Wang, Xiaoyu Li, Yuan Gong, Yanbo Fan, Xiaodong Cun, Ying Shan, Cengiz Oztireli, Yujiu Yang

It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion.

Image Reconstruction Neural Rendering

Paper
Add Code

ShaDocNet: Learning Spatial-Aware Tokens in Transformer for Document Shadow Removal

1 code implementation • 30 Nov 2022 • Xuhang Chen, Xiaodong Cun, Chi-Man Pun, Shuqiang Wang

Shadow removal improves the visual quality and legibility of digital copies of documents.

Document Shadow Removal Image Shadow Removal +1

Paper
Code

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

1 code implementation • 27 Nov 2022 • Kun Cheng, Xiaodong Cun, Yong Zhang, Menghan Xia, Fei Yin, Mingrui Zhu, Xuan Wang, Jue Wang, Nannan Wang

Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism.

Video Editing Video Generation

5,890

Paper
Code

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

1 code implementation • CVPR 2023 • Wenxuan Zhang, Xiaodong Cun, Xuan Wang, Yong Zhang, Xi Shen, Yu Guo, Ying Shan, Fei Wang

We present SadTalker, which generates 3D motion coefficients (head pose, expression) of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation.

Image Animation Talking Head Generation

10,923

Paper
Code

Learning Enriched Illuminants for Cross and Single Sensor Color Constancy

no code implementations • 21 Mar 2022 • Xiaodong Cun, Zhendong Wang, Chi-Man Pun, Jianzhuang Liu, Wengang Zhou, Xu Jia, Houqiang Li

Color constancy aims to restore the constant colors of a scene under different illuminants.

Color Constancy

Paper
Add Code

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

1 code implementation • 8 Mar 2022 • Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, Yujiu Yang

Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution.

Facial Editing Talking Face Generation +1

608

Paper
Code

Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization

2 code implementations • 13 Sep 2021 • Jingtang Liang, Xiaodong Cun, Chi-Man Pun, Jue Wang

To this end, we propose a novel spatial-separated curve rendering network(S$^2$CRNet) for efficient and high-resolution image harmonization for the first time.

Ranked #12 on Image Harmonization on iHarmony4

Image Harmonization Image-to-Image Translation +2

Paper
Code

Uformer: A General U-Shaped Transformer for Image Restoration

4 code implementations • CVPR 2022 • Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, Houqiang Li

Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration.

Ranked #2 on Deblurring on RealBlur-R (trained on GoPro)

Deblurring Decoder +6

747

Paper
Code

Split then Refine: Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal

1 code implementation • 13 Dec 2020 • Xiaodong Cun, Chi-Man Pun

Simultaneously, to increase the robustness of watermark, attacking technique, such as watermark removal, also gets the attention from the community.

218

Paper
Code

Defocus Blur Detection via Depth Distillation

1 code implementation • ECCV 2020 • Xiaodong Cun, Chi-Man Pun

In detail, we learn the defocus blur from ground truth and the depth distilled from a well-trained depth estimation network at the same time.

Decoder Defocus Blur Detection +2

Paper
Code

Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN

5 code implementations • 20 Nov 2019 • Xiaodong Cun, Chi-Man Pun, Cheng Shi

With the help of novel masks or scenes, we enhance the current datasets using synthesized shadow images.

Ranked #3 on Shadow Removal on ISTD

2k Generative Adversarial Network +3

283

Paper
Code

Improving the Harmony of the Composite Image by Spatial-Separated Attention Module

1 code implementation • 15 Jul 2019 • Xiaodong Cun, Chi-Man Pun

Thus, we address the problem of Image Harmonization: Given a spliced image and the mask of the spliced region, we try to harmonize the "style" of the pasted region with the background (non-spliced region).

Ranked #5 on Image Harmonization on HAdobe5k(1024$\times$1024)

Image Harmonization

Paper
Code

Depth Assisted Full Resolution Network for Single Image-based View Synthesis

no code implementations • 17 Nov 2017 • Xiaodong Cun, Feng Xu, Chi-Man Pun, Hao Gao

In this paper, we focus on a more challenging and ill-posed problem that is to synthesize novel viewpoints from one single input image.

Depth Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.