1 code implementation • 8 Apr 2024 • Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim
However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.
Ranked #1 on Video Classification on COIN
1 code implementation • 4 Dec 2023 • Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim
We present an approach to pose object recognition as next token prediction.
1 code implementation • 24 Nov 2023 • Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang
In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target.
no code implementations • 22 May 2023 • Wujian Peng, Zejia Weng, Hengduo Li, Zuxuan Wu
Exploring a substantial amount of unlabeled data, semi-supervised learning (SSL) boosts the recognition performance when only a limited number of labels are provided.
1 code implementation • 30 Sep 2022 • Zhen Xing, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang
In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.
no code implementations • CVPR 2022 • Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim
To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.
1 code implementation • 23 Nov 2021 • Junke Wang, Xitong Yang, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang
Video transformers have achieved impressive results on major video recognition benchmarks, which however suffer from high computational cost.
no code implementations • 1 Jun 2021 • Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis
In this paper, we introduce certainty-aware pseudo labels tailored for object detection, which can effectively estimate the classification and localization quality of derived pseudo labels.
Ranked #8 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)
no code implementations • 20 Apr 2021 • Zejia Weng, Zuxuan Wu, Hengduo Li, Jingjing Chen, Yu-Gang Jiang
Conventional video recognition pipelines typically fuse multimodal features for improved performance.
no code implementations • CVPR 2021 • Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis
Then, only frames and convolutions that are selected by the selection network are used in the 3D model to generate predictions.
Ranked #11 on Action Recognition on ActivityNet
no code implementations • 22 Feb 2020 • Chen Zhu, Renkun Ni, Ping-Yeh Chiang, Hengduo Li, Furong Huang, Tom Goldstein
Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness.
1 code implementation • CVPR 2020 • Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis
State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects.
no code implementations • 25 Sep 2019 • Chen Zhu, Renkun Ni, Ping-Yeh Chiang, Hengduo Li, Furong Huang, Tom Goldstein
Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical (PGD) robustness.
1 code implementation • 15 May 2019 • Chen Zhu, W. Ronny Huang, Ali Shafahi, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein
Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data.
no code implementations • 11 Apr 2019 • Hengduo Li, Bharat Singh, Mahyar Najibi, Zuxuan Wu, Larry S. Davis
We analyze how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397, Flowers-102 etc.
2 code implementations • CVPR 2018 • Bharat Singh, Hengduo Li, Abhishek Sharma, Larry S. Davis
Our approach is a modification of the R-FCN architecture in which position-sensitive filters are shared across different object classes for performing localization.
no code implementations • 3 Nov 2017 • Hengduo Li, Jun Liu, Guyue Zhang, Yuan Gao, Yirui Wu
In this paper, we propose a new Multi-Glimpse LSTM (MG-LSTM) network, in which multi-scale contextual information is sequentially integrated to promote the human detection performance.