no code implementations • 21 May 2024 • Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel Tuzel
During training, we use variable sequence length and batch size, sampling simultaneously from all buckets with a curriculum.
no code implementations • 30 Nov 2023 • Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel
While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world.
1 code implementation • 29 Nov 2023 • Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan
We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.
1 code implementation • 23 Oct 2023 • Byeongjoo Ahn, Karren Yang, Brian Hamilton, Jonathan Sheaffer, Anurag Ranjan, Miguel Sarabia, Oncel Tuzel, Jen-Hao Rick Chang
Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene.
no code implementations • 4 Oct 2023 • Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao
Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs.
no code implementations • 18 Sep 2023 • Hsuan Su, Ting-yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel
In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • CVPR 2023 • Jen-Hao Rick Chang, Wei-Yu Chen, Anurag Ranjan, Kwang Moo Yi, Oncel Tuzel
Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray.
no code implementations • CVPR 2023 • Anurag Ranjan, Kwang Moo Yi, Jen-Hao Rick Chang, Oncel Tuzel
We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation.
no code implementations • 27 Mar 2023 • Karren Yang, Ting-yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel
Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases?
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 21 Oct 2021 • Ting-yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel
With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Oct 2021 • Jen-Hao Rick Chang, Martin Bresler, Youssouf Chherawala, Adrien Delaye, Thomas Deselaers, Ryan Dixon, Oncel Tuzel
We use the framework to optimize data synthesis and demonstrate significant improvement on handwriting recognition over a model trained on real data only.
no code implementations • 8 Oct 2021 • Dmitrii Marin, Jen-Hao Rick Chang, Anurag Ranjan, Anish Prabhu, Mohammad Rastegari, Oncel Tuzel
Token Pooling is a simple and effective operator that can benefit many architectures.
no code implementations • 6 Oct 2021 • Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel
However, under an unsupervised-style setting, typical training algorithms for controllable sequence generative models suffer from the training-inference mismatch, where the same sample is used as content and style input during training but unpaired samples are given during inference.
no code implementations • 2 Nov 2020 • Ting-yao Hu, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel
Our policy adapts the augmentation parameters based on the training loss of the data samples.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 May 2020 • Jen-Hao Rick Chang, Anat Levin, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan
Multifocal displays, one of the classic approaches to satisfy the accommodation cue, place virtual content at multiple focal planes, each at a di erent depth.
no code implementations • 27 May 2018 • Jen-Hao Rick Chang, B. V. K. Vijaya Kumar, Aswin C. Sankaranarayanan
We present a virtual reality display that is capable of generating a dense collection of depth/focal planes.
no code implementations • CVPR 2016 • Jen-Hao Rick Chang, Aswin C. Sankaranarayanan, B. V. K. Vijaya Kumar
Random features is an approach for kernel-based inference on large datasets.
no code implementations • CVPR 2015 • Jen-Hao Rick Chang, Yu-Chiang Frank Wang
In this paper, we propose the propagation filter as a novel image filtering operator, with the goal of smoothing over neighboring image pixels while preserving image context like edges or textural regions.