Search Results for author: Vighnesh Birodkar

Found 14 papers, 6 papers with code

VideoPoet: A Large Language Model for Zero-Shot Video Generation

no code implementations • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.

Ranked #4 on Text-to-Video Generation on MSR-VTT

Decoder Language Modelling +3

Paper
Add Code

Text and Click inputs for unambiguous open vocabulary instance segmentation

1 code implementation • 24 Nov 2023 • Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar

We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment.

Instance Segmentation Segmentation +1

Paper
Code

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

no code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.

Ranked #2 on Video Prediction on Kinetics-600 12 frames, 64x64

Action Recognition Image Generation +4

Paper
Add Code

Scaling Vision Transformers to 22 Billion Parameters

1 code implementation • 10 Feb 2023 • Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby

The scaling of Transformers has driven breakthrough capabilities for language models.

Ranked #1 on Zero-Shot Transfer Image Classification on ObjectNet

Action Classification Fairness +3

196

Paper
Code

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

no code implementations • 20 Dec 2022 • Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross

Detecting actions in untrimmed videos should not be limited to a small, closed set of classes.

Action Detection Optical Flow Estimation

Paper
Add Code

Proper Reuse of Image Classification Features Improves Object Detection

1 code implementation • CVPR 2022 • Cristina Vasconcelos, Vighnesh Birodkar, Vincent Dumoulin

A common practice in transfer learning is to initialize the downstream model weights by pre-training on a data-abundant upstream task.

Classification Image Classification +4

76,719

Paper
Code

Less is More: Generating Grounded Navigation Instructions from Landmarks

no code implementations • CVPR 2022 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson

We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.

Decoder Instruction Following +1

Paper
Add Code

The iWildCam 2021 Competition Dataset

no code implementations • 7 May 2021 • Sara Beery, Arushi Agarwal, Elijah Cole, Vighnesh Birodkar

The challenge is to classify species and count individual animals across sequences in the test cameras.

object-detection Object Detection

Paper
Add Code

The surprising impact of mask-head architecture on novel class segmentation

3 code implementations • ICCV 2021 • Vighnesh Birodkar, Zhichao Lu, Siyang Li, Vivek Rathod, Jonathan Huang

Under this family, we study Mask R-CNN and discover that instead of its default strategy of training the mask-head with a combination of proposals and groundtruth boxes, training the mask-head with only groundtruth boxes dramatically improves its performance on novel classes.

Instance Segmentation Segmentation +1

76,727

Paper
Code

A Closed-Form Learned Pooling for Deep Classification Networks

no code implementations • 10 Jun 2019 • Vighnesh Birodkar, Hossein Mobahi, Dilip Krishnan, Samy Bengio

This operator can learn a strict super-set of what can be learned by average pooling or convolutions.

Classification Foveation +2

Paper
Add Code

Straight to the point: reinforcement learning for user guidance in ultrasound

no code implementations • 2 Mar 2019 • Fausto Milletari, Vighnesh Birodkar, Michal Sofka

Point of care ultrasound (POCUS) consists in the use of ultrasound imaging in critical or emergency situations to support clinical decisions by healthcare professionals and first responders.

Anatomy reinforcement-learning +1

Paper
Add Code

Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

no code implementations • 29 Jan 2019 • Vighnesh Birodkar, Hossein Mobahi, Samy Bengio

Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data.

General Classification Image Classification +1

Paper
Add Code

Unsupervised Learning of Disentangled Representations from Video

2 code implementations • NeurIPS 2017 • Emily Denton, Vighnesh Birodkar

We present a new model DrNET that learns disentangled image representations from video.

105

Paper
Code

A convolutional approach to reflection symmetry

1 code implementation • 17 Sep 2016 • Marcelo Cicconet, Vighnesh Birodkar, Mads Lund, Michael Werman, Davi Geiger

We present a convolutional approach to reflection symmetry detection in 2D.

Symmetry Detection

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.