Search Results for author: Hengduo Li

Found 17 papers, 8 papers with code

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

1 code implementation • 8 Apr 2024 • Bo He, Hengduo Li, Young Kyun Jang, Menglin Jia, Xuefei Cao, Ashish Shah, Abhinav Shrivastava, Ser-Nam Lim

However, existing LLM-based large multimodal models (e. g., Video-LLaMA, VideoChat) can only take in a limited number of frames for short video understanding.

Ranked #1 on Video Classification on COIN

Question Answering Video Captioning +4

158

Paper
Code

Object Recognition as Next Token Prediction

1 code implementation • 4 Dec 2023 • Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim

We present an approach to pose object recognition as next token prediction.

Decoder Language Modelling +2

114

Paper
Code

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

1 code implementation • 24 Nov 2023 • Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target.

Meta-Learning One-Shot Segmentation +3

Paper
Code

BMB: Balanced Memory Bank for Imbalanced Semi-supervised Learning

no code implementations • 22 May 2023 • Wujian Peng, Zejia Weng, Hengduo Li, Zuxuan Wu

Exploring a substantial amount of unlabeled data, semi-supervised learning (SSL) boosts the recognition performance when only a limited number of labels are provided.

Paper
Add Code

Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors

1 code implementation • 30 Sep 2022 • Zhen Xing, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang

In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.

3D Reconstruction Object Reconstruction +2

Paper
Code

AdaViT: Adaptive Vision Transformers for Efficient Image Recognition

no code implementations • CVPR 2022 • Lingchen Meng, Hengduo Li, Bor-Chun Chen, Shiyi Lan, Zuxuan Wu, Yu-Gang Jiang, Ser-Nam Lim

To this end, we introduce AdaViT, an adaptive computation framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use throughout the backbone on a per-input basis, aiming to improve inference efficiency of vision transformers with a minimal drop of accuracy for image recognition.

Paper
Add Code

Efficient Video Transformers with Spatial-Temporal Token Selection

1 code implementation • 23 Nov 2021 • Junke Wang, Xitong Yang, Hengduo Li, Li Liu, Zuxuan Wu, Yu-Gang Jiang

Video transformers have achieved impressive results on major video recognition benchmarks, which however suffer from high computational cost.

Video Recognition

Paper
Code

Rethinking Pseudo Labels for Semi-Supervised Object Detection

no code implementations • 1 Jun 2021 • Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

In this paper, we introduce certainty-aware pseudo labels tailored for object detection, which can effectively estimate the classification and localization quality of derived pseudo labels.

Ranked #8 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Classification Image Classification +4

Paper
Add Code

HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition

no code implementations • 20 Apr 2021 • Zejia Weng, Zuxuan Wu, Hengduo Li, Jingjing Chen, Yu-Gang Jiang

Conventional video recognition pipelines typically fuse multimodal features for improved performance.

Video Recognition

Paper
Add Code

2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition

no code implementations • CVPR 2021 • Hengduo Li, Zuxuan Wu, Abhinav Shrivastava, Larry S. Davis

Then, only frames and convolutions that are selected by the selection network are used in the 3D model to generate predictions.

Ranked #11 on Action Recognition on ActivityNet

Action Recognition Policy Gradient Methods +1

Paper
Add Code

Improving the Tightness of Convex Relaxation Bounds for Training Certifiably Robust Classifiers

no code implementations • 22 Feb 2020 • Chen Zhu, Renkun Ni, Ping-Yeh Chiang, Hengduo Li, Furong Huang, Tom Goldstein

Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness.

Paper
Add Code

Learning from Noisy Anchors for One-stage Object Detection

1 code implementation • CVPR 2020 • Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis

State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects.

Classification General Classification +3

Paper
Code

Improved Training of Certifiably Robust Models

no code implementations • 25 Sep 2019 • Chen Zhu, Renkun Ni, Ping-Yeh Chiang, Hengduo Li, Furong Huang, Tom Goldstein

Convex relaxations are effective for training and certifying neural networks against norm-bounded adversarial attacks, but they leave a large gap between certifiable and empirical (PGD) robustness.

Paper
Add Code

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

1 code implementation • 15 May 2019 • Chen Zhu, W. Ronny Huang, Ali Shafahi, Hengduo Li, Gavin Taylor, Christoph Studer, Tom Goldstein

Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data.

Transfer Learning

Paper
Code

An Analysis of Pre-Training on Object Detection

no code implementations • 11 Apr 2019 • Hengduo Li, Bharat Singh, Mahyar Najibi, Zuxuan Wu, Larry S. Davis

We analyze how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397, Flowers-102 etc.

Avg Classification +6

Paper
Add Code

R-FCN-3000 at 30fps: Decoupling Detection and Classification

2 code implementations • CVPR 2018 • Bharat Singh, Hengduo Li, Abhishek Sharma, Larry S. Davis

Our approach is a modification of the R-FCN architecture in which position-sensitive filters are shared across different object classes for performing localization.

Classification General Classification +2

2,684

Paper
Code

Multi-Glimpse LSTM with Color-Depth Feature Fusion for Human Detection

no code implementations • 3 Nov 2017 • Hengduo Li, Jun Liu, Guyue Zhang, Yuan Gao, Yirui Wu

In this paper, we propose a new Multi-Glimpse LSTM (MG-LSTM) network, in which multi-scale contextual information is sequentially integrated to promote the human detection performance.

Human Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.