Search Results for author: Yujun Shen

Found 85 papers, 36 papers with code

Learning Temporally Consistent Video Depth from Video Diffusion Priors

no code implementations • 3 Jun 2024 • Jiahao Shao, Yuanbo Yang, HongYu Zhou, Youmin Zhang, Yujun Shen, Matteo Poggi, Yiyi Liao

This work addresses the challenge of video depth estimation, which expects not only per-frame accuracy but, more importantly, cross-frame consistency.

Depth Estimation Novel View Synthesis +1

Paper
Add Code

GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis

no code implementations • 30 May 2024 • Boming Zhao, Yuan Li, Ziyu Sun, Lin Zeng, Yujun Shen, Rui Ma, yinda zhang, Hujun Bao, Zhaopeng Cui

In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments.

Decision Making Novel View Synthesis +1

Paper
Add Code

MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

no code implementations • 26 Apr 2024 • Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

Instead of relying on extensive paired data, i. e., 3D meshes with material graphs and corresponding text descriptions, to train a material graph generative model, we propose to leverage the pre-trained 2D diffusion model as a bridge to connect the text and material graphs.

Paper
Add Code

InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

no code implementations • 17 Apr 2024 • Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao

3D Gaussians have recently emerged as an efficient representation for novel view synthesis.

Depth Completion Novel View Synthesis

Paper
Add Code

Learning 3D-Aware GANs from Unposed Images with Template Feature Field

no code implementations • 8 Apr 2024 • Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yue Wang, Yujun Shen, Yiyi Liao

Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice.

Pose Estimation

Paper
Add Code

SpatialTracker: Tracking Any 2D Pixels in 3D Space

no code implementations • 5 Apr 2024 • Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou

Recovering dense and long-range pixel motion in videos is a challenging problem.

Paper
Add Code

DreamLIP: Language-Image Pre-training with Long Captions

1 code implementation • 25 Mar 2024 • Kecheng Zheng, Yifei Zhang, Wei Wu, Fan Lu, Shuailei Ma, Xin Jin, Wei Chen, Yujun Shen

Motivated by this, we propose to dynamically sample sub-captions from the text label to construct multiple positive pairs, and introduce a grouping loss to match the embeddings of each sub-caption with its corresponding local image patches in a self-supervised manner.

Contrastive Learning Language Modelling +4

Paper
Code

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

no code implementations • 25 Mar 2024 • Shilong Zhang, Lianghua Huang, Xi Chen, Yifei Zhang, Zhi-Fan Wu, Yutong Feng, Wei Wang, Yujun Shen, Yu Liu, Ping Luo

This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt.

Face Swapping Instruction Following +1

Paper
Add Code

GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

1 code implementation • 21 Mar 2024 • Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein

We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0. 1s.

3D Reconstruction Image to 3D +1

481

Paper
Code

Contextual AD Narration with Interleaved Multimodal Sequence

no code implementations • 19 Mar 2024 • Hanlin Wang, Zhan Tong, Kecheng Zheng, Yujun Shen, LiMin Wang

With video feature, text, character bank and context information as inputs, the generated ADs are able to correspond to the characters by name and provide reasonable, contextual descriptions to help audience understand the storyline of movie.

Paper
Add Code

Bridging 3D Gaussian and Mesh for Freeview Video Rendering

no code implementations • 18 Mar 2024 • Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao

To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes.

Novel View Synthesis

Paper
Add Code

Real-time 3D-aware Portrait Editing from a Single Image

no code implementations • 21 Feb 2024 • Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner.

Paper
Add Code

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation • 25 Dec 2023 • Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Ranked #7 on Text-to-Video Generation on MSR-VTT

Text-to-Image Generation Text-to-Video Generation +2

Paper
Code

TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification

1 code implementation • 21 Dec 2023 • Qinying Liu, Wei Wu, Kecheng Zheng, Zhan Tong, Jiawei Liu, Yu Liu, Wei Chen, Zilei Wang, Yujun Shen

The crux of learning vision-language models is to extract semantically aligned information from visual and linguistic data.

Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on COCO-Stuff-171

Attribute Open Vocabulary Semantic Segmentation +3

Paper
Code

SAM-guided Graph Cut for 3D Instance Segmentation

no code implementations • 13 Dec 2023 • Haoyu Guo, He Zhu, Sida Peng, Yuang Wang, Yujun Shen, Ruizhen Hu, Xiaowei Zhou

Experimental results on the ScanNet, ScanNet++ and KITTI-360 datasets demonstrate that our method achieves robust segmentation performance and can generalize across different types of scenes.

3D Instance Segmentation Graph Neural Network +2

Paper
Add Code

HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation

no code implementations • 12 Dec 2023 • Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen

The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction.

Paper
Add Code

CCM: Adding Conditional Controls to Text-to-Image Consistency Models

no code implementations • 12 Dec 2023 • Jie Xiao, Kai Zhu, Han Zhang, Zhiheng Liu, Yujun Shen, Yu Liu, Xueyang Fu, Zheng-Jun Zha

Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality.

Paper
Add Code

Learning Naturally Aggregated Appearance for Efficient 3D Editing

1 code implementation • 11 Dec 2023 • Ka Leong Cheng, Qiuyu Wang, Zifan Shi, Kecheng Zheng, Yinghao Xu, Hao Ouyang, Qifeng Chen, Yujun Shen

Neural radiance fields, which represent a 3D scene as a color field and a density field, have demonstrated great progress in novel view synthesis yet are unfavorable for editing due to the implicitness.

Novel View Synthesis

Paper
Code

GenDeF: Learning Generative Deformation Field for Video Generation

no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen

We offer a new perspective on approaching the task of video generation.

Disentanglement Video Editing +3

Paper
Add Code

LivePhoto: Real Image Animation with Text-guided Motion Control

no code implementations • 5 Dec 2023 • Xi Chen, Zhiheng Liu, Mengting Chen, Yutong Feng, Yu Liu, Yujun Shen, Hengshuang Zhao

In particular, considering the facts that (1) text can only describe motions roughly (e. g., regardless of the moving speed) and (2) text may include both content and motion descriptions, we introduce a motion intensity estimation module as well as a text re-weighting module to reduce the ambiguity of text-to-motion mapping.

Image Animation Text-to-Video Generation +1

Paper
Add Code

BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation

no code implementations • 4 Dec 2023 • Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, Ceyuan Yang

Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales.

Scene Generation

Paper
Add Code

SMaRt: Improving GANs with Score Matching Regularity

no code implementations • 30 Nov 2023 • Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-Jin Liu

In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold.

valid

Paper
Add Code

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

no code implementations • 28 Nov 2023 • Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.

Attribute Denoising +1

Paper
Add Code

4K4D: Real-Time 4D View Synthesis at 4K Resolution

no code implementations • 17 Oct 2023 • Zhen Xu, Sida Peng, Haotong Lin, Guangzhao He, Jiaming Sun, Yujun Shen, Hujun Bao, Xiaowei Zhou

Experiments show that our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an RTX 4090 GPU, which is 30x faster than previous methods and achieves the state-of-the-art rendering quality.

Paper
Add Code

Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

no code implementations • 14 Oct 2023 • Mengfei Xia, Yujun Shen, Changsong Lei, Yu Zhou, Ran Yi, Deli Zhao, Wenping Wang, Yong-Jin Liu

By viewing the generation of diffusion models as a discretized integrating process, we argue that the quality drop is partly caused by applying an inaccurate integral direction to a timestep interval.

Denoising

Paper
Add Code

In-Domain GAN Inversion for Faithful Reconstruction and Editability

no code implementations • 25 Sep 2023 • Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen, Bolei Zhou

This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model.

Image Generation Image Reconstruction

Paper
Add Code

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

1 code implementation • 7 Sep 2023 • Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yujun Shen

Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling from grace on the task of text-conditioned image synthesis.

Image Generation Philosophy +1

Paper
Code

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

1 code implementation • 15 Aug 2023 • Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen, Yujun Shen

We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i. e., rendered from the canonical content field) to each individual frame along the time axis. Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline. We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e. g., the object shape) from the video. With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field. We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training. More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog. Project page can be found at https://qiuyu96. github. io/CoDeF/.

Image-to-Image Translation Keypoint Detection +1

4,777

Paper
Code

Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models

no code implementations • ICCV 2023 • Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao, Zheng-Jun Zha, Wei Chen, Yujun Shen

To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen.

Paper
Add Code

AnyDoor: Zero-shot Object-level Image Customization

2 code implementations • 18 Jul 2023 • Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way.

Object Virtual Try-on

6,292

Paper
Code

NEAT: Distilling 3D Wireframes from Neural Attraction Fields

1 code implementation • 14 Jul 2023 • Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu, Yujun Shen

Instead of leveraging matching-based solutions from 2D wireframes (or line segments) for 3D wireframe reconstruction as done in prior arts, we present NEAT, a rendering-distilling formulation using neural fields to represent 3D line segments with 2D observations, and bipartite matching for perceiving and distilling of a sparse set of 3D global junctions.

3D Wireframe Reconstruction Novel View Synthesis

Paper
Code

Eliminating Lipschitz Singularities in Diffusion Models

no code implementations • 20 Jun 2023 • Zhantao Yang, Ruili Feng, Han Zhang, Yujun Shen, Kai Zhu, Lianghua Huang, Yifei Zhang, Yu Liu, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which employ stochastic differential equations to sample images through integrals, have emerged as a dominant class of generative models.

Paper
Add Code

Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

no code implementations • 4 Jun 2023 • Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang

A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability.

Semantic Segmentation

Paper
Add Code

Balancing Logit Variation for Long-tailed Semantic Segmentation

1 code implementation • CVPR 2023 • Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen

In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation.

Semantic Segmentation

Paper
Code

VideoComposer: Compositional Video Synthesis with Motion Controllability

4 code implementations • NeurIPS 2023 • Xiang Wang, Hangjie Yuan, Shiwei Zhang, Dayou Chen, Jiuniu Wang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

The pursuit of controllability as a higher standard of visual content creation has yielded remarkable progress in customizable image synthesis.

Ranked #5 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Image Generation Text-to-Video Generation

848

Paper
Code

Cones 2: Customizable Image Synthesis with Multiple Subjects

1 code implementation • 30 May 2023 • Zhiheng Liu, Yifei Zhang, Yujun Shen, Kecheng Zheng, Kai Zhu, Ruili Feng, Yu Liu, Deli Zhao, Jingren Zhou, Yang Cao

Synthesizing images with user-specified subjects has received growing attention due to its practical applications.

Image Generation

489

Paper
Code

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

no code implementations • 23 May 2023 • Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.

Domain Generalization Semantic Segmentation

Paper
Add Code

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

1 code implementation • ICCV 2023 • Yulin Pan, Xiangteng He, Biao Gong, Yiliang Lv, Yujun Shen, Yuxin Peng, Deli Zhao

Video temporal grounding aims to pinpoint a video segment that matches the query description.

Paper
Code

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

1 code implementation • CVPR 2023 • Zhengxiong Luo, Dayou Chen, Yingya Zhang, Yan Huang, Liang Wang, Yujun Shen, Deli Zhao, Jingren Zhou, Tieniu Tan

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Ranked #7 on Video Generation on UCF-101

Code Generation Denoising +4

6,292

Paper
Code

ViM: Vision Middleware for Unified Downstream Transferring

no code implementations • ICCV 2023 • Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared frozen backbone.

Paper
Add Code

Composer: Creative and Controllable Image Synthesis with Composable Conditions

6 code implementations • 20 Feb 2023 • Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, Jingren Zhou

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability.

Image Colorization Image-to-Image Translation +3

6,292

Paper
Code

UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training

no code implementations • 14 Feb 2023 • Biao Gong, Xiaoying Xie, Yutong Feng, Yiliang Lv, Yujun Shen, Deli Zhao

This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data.

Common Sense Reasoning

Paper
Add Code

Spatial Steerability of GANs via Self-Supervision from Discriminator

no code implementations • 20 Jan 2023 • Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou

In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations.

Image Generation Inductive Bias +1

Paper
Add Code

Learning 3D-aware Image Synthesis with Unknown Pose Distribution

no code implementations • CVPR 2023 • Zifan Shi, Yujun Shen, Yinghao Xu, Sida Peng, Yiyi Liao, Sheng Guo, Qifeng Chen, Dit-yan Yeung

Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.

3D-Aware Image Synthesis

Paper
Add Code

GH-Feat: Learning Versatile Generative Hierarchical Features from GANs

no code implementations • 12 Jan 2023 • Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou

In this work we investigate that such a generative feature learned from image synthesis exhibits great potentials in solving a wide range of computer vision tasks, including both generative ones and more importantly discriminative ones.

Face Verification Image Harmonization +3

Paper
Add Code

LinkGAN: Linking GAN Latents to Pixels for Controllable Image Synthesis

no code implementations • ICCV 2023 • Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen

This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.

Image Generation

Paper
Add Code

LipFormer: High-Fidelity and Generalizable Talking Face Generation With a Pre-Learned Facial Codebook

no code implementations • CVPR 2023 • Jiayu Wang, Kang Zhao, Shiwei Zhang, Yingya Zhang, Yujun Shen, Deli Zhao, Jingren Zhou

Generating a talking face video from the input audio sequence is a practical yet challenging task.

Talking Face Generation

Paper
Add Code

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

no code implementations • CVPR 2023 • Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects.

3D-Aware Image Synthesis Object

Paper
Add Code

Towards Smooth Video Composition

1 code implementation • 14 Dec 2022 • Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou

Video generation requires synthesizing consistent and persistent frames with dynamic content over time.

Ranked #1 on Video Generation on YouTube Driving

Image Generation single-image-generation +2

Paper
Code

GLeaD: Improving GANs with A Generator-Leading Task

1 code implementation • CVPR 2023 • Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen

Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition.

domain classification Generative Adversarial Network +1

Paper
Code

Dimensionality-Varying Diffusion Process

no code implementations • CVPR 2023 • Han Zhang, Ruili Feng, Zhantao Yang, Lianghua Huang, Yu Liu, Yifei Zhang, Yujun Shen, Deli Zhao, Jingren Zhou, Fan Cheng

Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension.

Image Generation

Paper
Add Code

Neural Dependencies Emerging from Learning Massive Categories

no code implementations • CVPR 2023 • Ruili Feng, Kecheng Zheng, Kai Zhu, Yujun Shen, Jian Zhao, Yukun Huang, Deli Zhao, Jingren Zhou, Michael Jordan, Zheng-Jun Zha

Through investigating the properties of the problem solution, we confirm that neural dependency is guaranteed by a redundant logit covariance matrix, which condition is easily met given massive categories, and that neural dependency is highly sparse, implying that one category correlates to only a few others.

Image Classification

Paper
Add Code

Deep Generative Models on 3D Representations: A Survey

1 code implementation • 27 Oct 2022 • Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, Yujun Shen

In this survey, we thoroughly review the ongoing developments of 3D generative models, including methods that employ 2D and 3D supervision.

3D-Aware Image Synthesis 3D Shape Generation

981

Paper
Code

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

no code implementations • 30 Sep 2022 • Zifan Shi, Yinghao Xu, Yujun Shen, Deli Zhao, Qifeng Chen, Dit-yan Yeung

We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough.

3D-Aware Image Synthesis domain classification +2

Paper
Add Code

Improving GANs with A Dynamic Discriminator

no code implementations • 20 Sep 2022 • Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, Bolei Zhou

Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator.

3D-Aware Image Synthesis Data Augmentation

Paper
Add Code

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

1 code implementation • 15 Sep 2022 • Ye Du, Yujun Shen, Haochen Wang, Jingjing Fei, Wei Li, Liwei Wu, Rui Zhao, Zehua Fu, Qingjie Liu

Self-training has shown great potential in semi-supervised learning.

Pseudo Label Semi-Supervised Semantic Segmentation +1

Paper
Code

A Unified Model for Multi-class Anomaly Detection

1 code implementation • 8 Jun 2022 • Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, Xinyi Le

For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88. 1% to 96. 5%) and anomaly localization (from 89. 5% to 96. 8%).

Unsupervised Anomaly Detection

219

Paper
Code

Interpreting Class Conditional GANs with Channel Awareness

no code implementations • 21 Mar 2022 • Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen

To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis.

Paper
Add Code

High-fidelity GAN Inversion with Padding Space

1 code implementation • 21 Mar 2022 • Qingyan Bai, Yinghao Xu, Jiapeng Zhu, Weihao Xia, Yujiu Yang, Yujun Shen

In this work, we propose to involve the padding space of the generator to complement the latent space with spatial information.

Generative Adversarial Network Image Manipulation +1

Paper
Code

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

1 code implementation • CVPR 2022 • Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le

A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability.

Ranked #3 on Semi-Supervised Semantic Segmentation on PASCAL VOC 2012 50%

Semi-Supervised Semantic Segmentation

417

Paper
Code

Region-Based Semantic Factorization in GANs

1 code implementation • 19 Feb 2022 • Jiapeng Zhu, Yujun Shen, Yinghao Xu, Deli Zhao, Qifeng Chen

Despite the rapid advancement of semantic discovery in the latent space of Generative Adversarial Networks (GANs), existing approaches either are limited to finding global attributes or rely on a number of segmentation masks to identify local attributes.

Paper
Code

3D-Aware Indoor Scene Synthesis with Depth Priors

no code implementations • 17 Feb 2022 • Zifan Shi, Yujun Shen, Jiapeng Zhu, Dit-yan Yeung, Qifeng Chen

In this way, the discriminator can take the spatial arrangement into account and advise the generator to learn an appropriate depth condition.

3D-Aware Image Synthesis Indoor Scene Synthesis

Paper
Add Code

3D-aware Image Synthesis via Learning Structural and Textural Representations

1 code implementation • CVPR 2022 • Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, Bolei Zhou

The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis.

3D-Aware Image Synthesis Generative Adversarial Network

126

Paper
Code

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

no code implementations • CVPR 2022 • Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin

Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.

Action Recognition

Paper
Add Code

Improving GAN Equilibrium by Raising Spatial Awareness

1 code implementation • CVPR 2022 • Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou

We further propose to align the spatial awareness of G with the attention map induced from D. Through this way we effectively lessen the information gap between D and G. Extensive results show that our method pushes the two-player game in GANs closer to the equilibrium, leading to a better synthesis performance.

Attribute Inductive Bias

157

Paper
Code

One-Shot Generative Domain Adaptation

no code implementations • ICCV 2023 • Ceyuan Yang, Yujun Shen, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Zhirong Wu, Bolei Zhou

We then equip the well-learned discriminator backbone with an attribute classifier to ensure that the generator captures the appropriate characters from the reference.

Attribute Domain Adaptation +1

Paper
Add Code

Glancing at the Patch: Anomaly Localization With Global and Local Feature Comparison

no code implementations • CVPR 2021 • Shenzhi Wang, Liwei Wu, Lei Cui, Yujun Shen

More concretely, we employ a Local-Net and Global-Net to extract features from any individual patch and its surrounding respectively.

Anomaly Detection

Paper
Add Code

CompConv: A Compact Convolution Module for Efficient Feature Learning

no code implementations • 19 Jun 2021 • Chen Zhang, Yinghao Xu, Yujun Shen

Convolutional Neural Networks (CNNs) have achieved remarkable success in various computer vision tasks but rely on tremendous computational cost.

Paper
Add Code

Data-Efficient Instance Generation from Instance Discrimination

1 code implementation • NeurIPS 2021 • Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou

Meanwhile, the learned instance discrimination capability from the discriminator is in turn exploited to encourage the generator for diverse generation.

Ranked #6 on Image Generation on FFHQ 256 x 256 (FD metric)

2k Data Augmentation +1

100

Paper
Code

Low-Rank Subspaces in GANs

1 code implementation • NeurIPS 2021 • Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, ZhengJun Zha, Jingren Zhou, Qifeng Chen

Concretely, given an arbitrary image and a region of interest (e. g., eyes of face images), we manage to relate the latent space to the image region with the Jacobian matrix and then use low-rank factorization to discover steerable latent subspaces.

Attribute Generative Adversarial Network

123

Paper
Code

Decorating Your Own Bedroom: Locally Controlling Image Generation with Generative Adversarial Networks

no code implementations • 18 May 2021 • Chen Zhang, Yinghao Xu, Yujun Shen

Generative Adversarial Networks (GANs) have made great success in synthesizing high-quality images.

Image Generation

Paper
Add Code

Unsupervised Image Transformation Learning via Generative Adversarial Networks

no code implementations • 13 Mar 2021 • Kaiwen Zha, Yujun Shen, Bolei Zhou

In this work, we study the image transformation problem, which targets at learning the underlying transformations (e. g., the transition of seasons) from a collection of unlabeled images.

Image Generation valid

Paper
Add Code

Improving the Fairness of Deep Generative Models without Retraining

1 code implementation • 9 Dec 2020 • Shuhan Tan, Yujun Shen, Bolei Zhou

Generative Adversarial Networks (GANs) advance face synthesis through learning the underlying distribution of observed data.

Attribute Face Generation +3

Paper
Code

Generative Hierarchical Features from Synthesizing Images

1 code implementation • CVPR 2021 • Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou

Generative Adversarial Networks (GANs) have recently advanced image synthesis by learning the underlying distribution of the observed data.

Face Verification Image Classification +2

156

Paper
Code

Closed-Form Factorization of Latent Semantics in GANs

11 code implementations • CVPR 2021 • Yujun Shen, Bolei Zhou

A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.

Attribute Image Generation +1

2,682

Paper
Code

InterFaceGAN: Interpreting the Disentangled Face Representation Learned by GANs

2 code implementations • 18 May 2020 • Yujun Shen, Ceyuan Yang, Xiaoou Tang, Bolei Zhou

In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space.

Attribute Face Generation

1,474

Paper
Code

In-Domain GAN Inversion for Real Image Editing

2 code implementations • ECCV 2020 • Jiapeng Zhu, Yujun Shen, Deli Zhao, Bolei Zhou

A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code.

Image Reconstruction

459

Paper
Code

Residual Knowledge Distillation

no code implementations • 21 Feb 2020 • Mengya Gao, Yujun Shen, Quanquan Li, Chen Change Loy

Knowledge distillation (KD) is one of the most potent ways for model compression.

Knowledge Distillation Model Compression

Paper
Add Code

Image Processing Using Multi-Code GAN Prior

1 code implementation • CVPR 2020 • Jinjin Gu, Yujun Shen, Bolei Zhou

Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors.

Ranked #7 on Blind Face Restoration on CelebA-Test

Blind Face Restoration Colorization +6

288

Paper
Code

Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

2 code implementations • 21 Nov 2019 • Ceyuan Yang, Yujun Shen, Bolei Zhou

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs.

Image Generation

157

Paper
Code

Semantic Hierarchy Emerges in the Deep Generative Representations for Scene Synthesis

no code implementations • 25 Sep 2019 • Ceyuan Yang, Yujun Shen, Bolei Zhou

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what networks have learned inside the deep generative representations and how photo-realistic images are able to be composed from random noises.

Image Generation

Paper
Add Code

Interpreting the Latent Space of GANs for Semantic Face Editing

4 code implementations • CVPR 2020 • Yujun Shen, Jinjin Gu, Xiaoou Tang, Bolei Zhou

In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs.

Attribute Disentanglement +2

1,474

Paper
Code

An Embarrassingly Simple Approach for Knowledge Distillation

1 code implementation • 5 Dec 2018 • Mengya Gao, Yujun Shen, Quanquan Li, Junjie Yan, Liang Wan, Dahua Lin, Chen Change Loy, Xiaoou Tang

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model.

Face Recognition Knowledge Distillation +3

Paper
Code

FaceFeat-GAN: a Two-Stage Approach for Identity-Preserving Face Synthesis

no code implementations • 4 Dec 2018 • Yujun Shen, Bolei Zhou, Ping Luo, Xiaoou Tang

In the second stage, they compete in the image domain to render photo-realistic images that contain high diversity but preserve identity.

Face Generation Vocal Bursts Valence Prediction

Paper
Add Code

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

no code implementations • CVPR 2018 • Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.

Face Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.