no code implementations • 22 May 2024 • Tarun Kalluri, Jihyeon Lee, Kihyuk Sohn, Sahil Singla, Manmohan Chandraker, Joseph Xu, Jeremiah Liu
We present a simple and efficient method to leverage emerging text-to-image generative models in creating large-scale synthetic supervision for the task of damage assessment from aerial images.
no code implementations • 22 May 2024 • Divya Kothandaraman, Kihyuk Sohn, Ruben Villegas, Paul Voigtlaender, Dinesh Manocha, Mohammad Babaeizadeh
We present a method for multi-concept customization of pretrained text-to-video (T2V) models.
no code implementations • 22 Mar 2024 • Kyungmin Lee, Kihyuk Sohn, Jinwoo Shin
Recent progress in text-to-3D generation has been achieved through the utilization of score distillation methods: they make use of the pre-trained text-to-image (T2I) diffusion models by distilling via the diffusion model training objective.
no code implementations • 19 Feb 2024 • Kyungmin Lee, Sangkyung Kwak, Kihyuk Sohn, Jinwoo Shin
In particular, our method results in a superior Pareto frontier to the baselines.
no code implementations • 16 Feb 2024 • Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku
In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning.
no code implementations • 3 Jan 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #4 on Text-to-Video Generation on MSR-VTT
no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.
Ranked #1 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • 1 Dec 2023 • KaiFeng Chen, Daniel Salz, Huiwen Chang, Kihyuk Sohn, Dilip Krishnan, Mojtaba Seyedhosseini
On K-Nearest-Neighbor image retrieval evaluation with ImageNet-1k, the same model outperforms the baseline by 1. 32%.
no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64
no code implementations • 24 Aug 2023 • Ximeng Sun, Kihyuk Sohn, Kate Saenko, Clayton Mellina, Xiao Bian
How should the label budget (i. e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance?
no code implementations • 4 Jul 2023 • Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, Jinwoo Shin
Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities.
no code implementations • 1 Jun 2023 • Kihyuk Sohn, Albert Shaw, Yuan Hao, Han Zhang, Luisa Polania, Huiwen Chang, Lu Jiang, Irfan Essa
We study domain-adaptive image synthesis, the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images, to better understand the compositional image synthesis.
3 code implementations • 1 Jun 2023 • Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan
Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts.
no code implementations • 4 May 2023 • Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister
In FormNetV2, we introduce a centralized multimodal graph contrastive learning strategy to unify self-supervised pre-training for all modalities in one loss.
1 code implementation • CVPR 2023 • Sihyun Yu, Kihyuk Sohn, Subin Kim, Jinwoo Shin
Specifically, PVDM is composed of two components: (a) an autoencoder that projects a given video as 2D-shaped latent vectors that factorize the complex cubic structure of video pixels and (b) a diffusion model architecture specialized for our new factorized latent space and the training/sampling procedure to synthesize videos of arbitrary length with a single model.
2 code implementations • CVPR 2023 • Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko, Irfan Essa
We show that intermediate self-attention maps of a masked generative transformer encode important structural information of the input image, such as scene layout and object shape, and we propose a novel sampling method based on this observation to enable structure-guided generation.
1 code implementation • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister
Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image.
Ranked #1 on Zero-shot Image Retrieval on ImageNet-R
1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.
Ranked #1 on Video Prediction on Something-Something V2
no code implementations • 30 Nov 2022 • Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Tomas Pfister
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled.
Semi-supervised Anomaly Detection Supervised Anomaly Detection
1 code implementation • CVPR 2023 • Kihyuk Sohn, Yuan Hao, José Lezama, Luisa Polania, Huiwen Chang, Han Zhang, Irfan Essa, Lu Jiang
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
no code implementations • CVPR 2023 • Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
1 code implementation • 27 May 2022 • Woojung Kim, Keondo Park, Kihyuk Sohn, Raphael Shu, Hyung-Sin Kim
Compared to a FSSL approach based on weight sharing, the prototype-based inter-client knowledge sharing significantly reduces both communication and computation costs, enabling more frequent knowledge sharing between more clients for better accuracy.
no code implementations • 10 Jan 2022 • Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister
Such a requirement is impractical in situations where the data labeling efforts for minority or rare groups are significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information.
2 code implementations • 21 Dec 2021 • Kihyuk Sohn, Jinsung Yoon, Chun-Liang Li, Chen-Yu Lee, Tomas Pfister
We define a distance function between images, each of which is represented as a bag of embeddings, by the Euclidean distance between weighted averaged embeddings.
no code implementations • 29 Sep 2021 • Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister
Such a requirement is impractical in situations where the data labelling efforts for minority or rare groups is significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information.
no code implementations • 29 Sep 2021 • Justin Lazarow, Kihyuk Sohn, Chun-Liang Li, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister
While remarkable progress in imbalanced supervised learning has been made recently, less attention has been given to the setting of imbalanced semi-supervised learning (SSL) where not only is a few labeled data provided, but the underlying data distribution can be severely imbalanced.
1 code implementation • NeurIPS 2021 • Sangwoo Mo, Hyunwoo Kang, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin
Contrastive self-supervised learning has shown impressive results in learning visual representations from unlabeled images by enforcing invariance against different data augmentations.
1 code implementation • NeurIPS 2021 • Sungyong Seo, Sercan O. Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister
The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength -- at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio.
no code implementations • 11 Jun 2021 • Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Chen-Yu Lee, Tomas Pfister
We demonstrate our method on various unsupervised AD tasks with image and tabular data.
5 code implementations • ICLR 2022 • David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alex Kurakin
We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one.
Semi-supervised Domain Adaptation Unsupervised Domain Adaptation
2 code implementations • CVPR 2021 • Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, Tomas Pfister
We aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data.
Ranked #56 on Anomaly Detection on MVTec AD
1 code implementation • CVPR 2021 • Chen Wei, Kihyuk Sohn, Clayton Mellina, Alan Yuille, Fan Yang
Semi-supervised learning on class-imbalanced data, although a realistic problem, has been under studied.
no code implementations • 24 Nov 2020 • Jihyeon Lee, Joseph Z. Xu, Kihyuk Sohn, Wenhan Lu, David Berthelot, Izzeddin Gur, Pranav Khaitan, Ke-Wei, Huang, Kyriacos Koupparis, Bernhard Kowatsch
To respond to disasters such as earthquakes, wildfires, and armed conflicts, humanitarian organizations require accurate and timely data in the form of damage assessments, which indicate what buildings and population centers have been most affected.
1 code implementation • ICLR 2021 • Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
Ranked #7 on Anomaly Detection on One-class CIFAR-100
3 code implementations • ICLR 2021 • Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee
Contrastive representation learning has shown to be effective to learn representations from unlabeled data.
no code implementations • ECCV 2020 • Aruni RoyChowdhury, Xiang Yu, Kihyuk Sohn, Erik Learned-Miller, Manmohan Chandraker
While deep face recognition has benefited significantly from large-scale labeled data, current research is focused on leveraging unlabeled data to further boost performance, reducing the cost of human annotation.
7 code implementations • 10 May 2020 • Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister
Semi-supervised learning (SSL) has a potential to improve the predictive performance of machine learning models using unlabeled data.
Ranked #13 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)
1 code implementation • ICLR 2020 • David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel
We improve the recently-proposed ``MixMatch semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring.
no code implementations • CVPR 2020 • Yichun Shi, Xiang Yu, Kihyuk Sohn, Manmohan Chandraker, Anil K. Jain
Recognizing wild faces is extremely hard as they appear with all kinds of variations.
27 code implementations • NeurIPS 2020 • Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance.
no code implementations • 22 Nov 2019 • Taihong Xiao, Yi-Hsuan Tsai, Kihyuk Sohn, Manmohan Chandraker, Ming-Hsuan Yang
For instance, there could be a potential privacy risk of machine learning systems via the model inversion attack, whose goal is to reconstruct the input data from the latent representation of deep networks.
3 code implementations • 21 Nov 2019 • David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel
Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels.
no code implementations • 5 Jun 2019 • Shuyang Dai, Kihyuk Sohn, Yi-Hsuan Tsai, Lawrence Carin, Manmohan Chandraker
We tackle an unsupervised domain adaptation problem for which the domain discrepancy between labeled source and unlabeled target domains is large, due to many factors of inter and intra-domain variation.
no code implementations • ICLR 2019 • Kihyuk Sohn, Wenling Shang, Xiang Yu, Manmohan Chandraker
Unsupervised domain adaptation is a promising avenue to enhance the performance of deep neural networks on a target domain, using labels only from a source domain.
no code implementations • ICLR 2019 • Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker
To this end, we propose to learn discriminative feature representations of patches based on label histograms in the source domain, through the construction of a disentangled space.
no code implementations • 16 Apr 2019 • Jong-Chyi Su, Yi-Hsuan Tsai, Kihyuk Sohn, Buyu Liu, Subhransu Maji, Manmohan Chandraker
Our approach, active adversarial domain adaptation (AADA), explores a duality between two related problems: adversarial domain alignment and importance sampling for adapting models across domains.
8 code implementations • ICCV 2019 • Yi-Hsuan Tsai, Kihyuk Sohn, Samuel Schulter, Manmohan Chandraker
Predicting structured outputs such as semantic segmentation relies on expensive per-pixel annotations to learn supervised models like convolutional neural networks.
Ranked #22 on Image-to-Image Translation on SYNTHIA-to-Cityscapes
no code implementations • 27 Sep 2018 • Nicholas Rhinehart, Anqi Liu, Kihyuk Sohn, Paul Vernaza
We propose a novel approach to regularizing generative adversarial networks (GANs) leveraging learned {\em structured Gibbs distributions}.
no code implementations • 23 Mar 2018 • Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker
In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
12 code implementations • CVPR 2018 • Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker
In this paper, we propose an adversarial learning method for domain adaptation in the context of semantic segmentation.
Ranked #3 on Domain Adaptation on Synscapes-to-Cityscapes
1 code implementation • CVPR 2019 • Luan Tran, Kihyuk Sohn, Xiang Yu, Xiaoming Liu, Manmohan Chandraker
Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels.
no code implementations • ICCV 2017 • Kihyuk Sohn, Sifei Liu, Guangyu Zhong, Xiang Yu, Ming-Hsuan Yang, Manmohan Chandraker
Despite rapid advances in face recognition, there remains a clear gap between the performance of still image-based face recognition and video-based face recognition, due to the vast difference in visual quality between the domains and the difficulty of curating diverse large-scale video datasets.
no code implementations • 12 Jun 2017 • Wenling Shang, Kihyuk Sohn, Yuandong Tian
Despite recent successes in synthesizing faces and bedrooms, existing generative models struggle to capture more complex image types, potentially due to the oversimplification of their latent space constructions.
no code implementations • ICCV 2017 • Xi Yin, Xiang Yu, Kihyuk Sohn, Xiaoming Liu, Manmohan Chandraker
Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments.
no code implementations • ICCV 2017 • Xi Peng, Xiang Yu, Kihyuk Sohn, Dimitris Metaxas, Manmohan Chandraker
Finally, we propose a new feature reconstruction metric learning to explicitly disentangle identity and pose, by demanding alignment between the feature reconstructions through various combinations of identity and pose features, which is obtained from two images of the same subject.
1 code implementation • NeurIPS 2016 • Kihyuk Sohn
Deep metric learning has gained much popularity in recent years, following the success of deep learning.
2 code implementations • 16 Mar 2016 • Wenling Shang, Kihyuk Sohn, Diogo Almeida, Honglak Lee
Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision.
1 code implementation • 2 Dec 2015 • Xinchen Yan, Jimei Yang, Kihyuk Sohn, Honglak Lee
This paper investigates a novel problem of generating images from visual attributes.
1 code implementation • NeurIPS 2015 • Kihyuk Sohn, Honglak Lee, Xinchen Yan
The model is trained efficiently in the framework of stochastic gradient variational Bayes, and allows a fast prediction using stochastic feed-forward inference.
Ranked #1 on Structured Prediction on MNIST
no code implementations • CVPR 2015 • Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee
Object detection systems based on the deep convolutional neural network (CNN) have recently made ground- breaking advances on several object detection benchmarks.
no code implementations • NeurIPS 2014 • Kihyuk Sohn, Wenling Shang, Honglak Lee
Deep learning has been successfully applied to multimodal representation learning problems, with a common strategy to learning joint representations that are shared across multiple modalities on top of layers of modality-specific networks.
no code implementations • CVPR 2013 • Andrew Kae, Kihyuk Sohn, Honglak Lee, Erik Learned-Miller
Although the CRF is a good baseline labeler, we show how an RBM can be added to the architecture to provide a global shape bias that complements the local modeling provided by the CRF.