no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #4 on Text-to-Video Generation on MSR-VTT
1 code implementation • 24 Nov 2023 • Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar
We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment.
no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64
1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
The scaling of Transformers has driven breakthrough capabilities for language models.
Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet
no code implementations • 20 Dec 2022 • Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross
Detecting actions in untrimmed videos should not be limited to a small, closed set of classes.
1 code implementation • CVPR 2022 • Cristina Vasconcelos, Vighnesh Birodkar, Vincent Dumoulin
A common practice in transfer learning is to initialize the downstream model weights by pre-training on a data-abundant upstream task.
no code implementations • CVPR 2022 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.
no code implementations • 7 May 2021 • Sara Beery, Arushi Agarwal, Elijah Cole, Vighnesh Birodkar
The challenge is to classify species and count individual animals across sequences in the test cameras.
3 code implementations • ICCV 2021 • Vighnesh Birodkar, Zhichao Lu, Siyang Li, Vivek Rathod, Jonathan Huang
Under this family, we study Mask R-CNN and discover that instead of its default strategy of training the mask-head with a combination of proposals and groundtruth boxes, training the mask-head with only groundtruth boxes dramatically improves its performance on novel classes.
no code implementations • 10 Jun 2019 • Vighnesh Birodkar, Hossein Mobahi, Dilip Krishnan, Samy Bengio
This operator can learn a strict super-set of what can be learned by average pooling or convolutions.
no code implementations • 2 Mar 2019 • Fausto Milletari, Vighnesh Birodkar, Michal Sofka
Point of care ultrasound (POCUS) consists in the use of ultrasound imaging in critical or emergency situations to support clinical decisions by healthcare professionals and first responders.
no code implementations • 29 Jan 2019 • Vighnesh Birodkar, Hossein Mobahi, Samy Bengio
Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data.
2 code implementations • NeurIPS 2017 • Emily Denton, Vighnesh Birodkar
We present a new model DrNET that learns disentangled image representations from video.
1 code implementation • 17 Sep 2016 • Marcelo Cicconet, Vighnesh Birodkar, Mads Lund, Michael Werman, Davi Geiger
We present a convolutional approach to reflection symmetry detection in 2D.