1 code implementation • 30 May 2024 • Feng Chen, Zhen Yang, Bohan Zhuang, Qi Wu
We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency.
no code implementations • 23 May 2024 • Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang
In terms of efficiency, ZipCache also showcases a $37. 3\%$ reduction in prefill-phase latency, a $56. 9\%$ reduction in decoding-phase latency, and a $19. 8\%$ reduction in GPU memory usage when evaluating LLaMA3-8B model with a input length of $4096$.
no code implementations • 23 May 2024 • Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang
In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference.
1 code implementation • 4 Apr 2024 • Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang
In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.
1 code implementation • 21 Mar 2024 • Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images.
Ranked #1 on Generalizable Novel View Synthesis on ACID
1 code implementation • 12 Mar 2024 • Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging.
1 code implementation • 21 Feb 2024 • Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.
1 code implementation • 12 Jan 2024 • Xinyu Wang, Bohan Zhuang, Qi Wu
This alignment process, which synchronizes a language model trained on textual data with encoders and decoders trained on multi-modal data, often necessitates extensive training of several projection layers in multiple stages.
1 code implementation • 29 Nov 2023 • Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang
In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.
1 code implementation • NeurIPS 2023 • Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang
By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.
1 code implementation • NeurIPS 2023 • Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan
Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images.
no code implementations • 18 Oct 2023 • Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen
Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.
2 code implementations • 12 Oct 2023 • Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.
1 code implementation • 5 Oct 2023 • Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
no code implementations • 14 Sep 2023 • Xinyu Wang, Bohan Zhuang, Qi Wu
To bridge this gap, we propose a novel approach, \methodname, from a modality conversion perspective that evolves a text-based LLM into a multi-modal one.
1 code implementation • 30 Jun 2023 • Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.
no code implementations • 28 May 2023 • Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang
This is due to their utilization of unstructured pruning on LPMs, impeding the merging of LoRA weights, or their dependence on the gradients of pre-trained weights to guide pruning, which can impose significant memory overhead.
1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang
Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.
2 code implementations • CVPR 2023 • Zizheng Pan, Jianfei Cai, Bohan Zhuang
As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.
no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.
no code implementations • ICCV 2023 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.
no code implementations • 14 Nov 2022 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.
1 code implementation • 19 Sep 2022 • Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang
To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.
no code implementations • 23 Aug 2022 • Jing Liu, Jianfei Cai, Bohan Zhuang
During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.
no code implementations • 21 Jul 2022 • Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang
The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.
5 code implementations • 26 May 2022 • Zizheng Pan, Jianfei Cai, Bohan Zhuang
Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.
Ranked #280 on Image Classification on ImageNet
2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang
In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.
Ranked #21 on Semantic Segmentation on ADE20K
1 code implementation • CVPR 2022 • Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang
First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.
4 code implementations • 24 Nov 2021 • Jing Liu, Jianfei Cai, Bohan Zhuang
However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.
Ranked #196 on Image Classification on CIFAR-100
3 code implementations • 23 Nov 2021 • Haoyu He, Jianfei Cai, Jing Liu, Zizheng Pan, Jing Zhang, DaCheng Tao, Bohan Zhuang
Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers.
Ranked #18 on Efficient ViTs on ImageNet-1K (with DeiT-T)
3 code implementations • 22 Nov 2021 • Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang
While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.
no code implementations • 3 Aug 2021 • Jing Liu, Bohan Zhuang, Mingkui Tan, Xu Liu, Dinh Phung, Yuanqing Li, Jianfei Cai
More critically, EAS is able to find compact architectures within 0. 1 second for 50 deployment scenarios.
2 code implementations • 29 May 2021 • Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai
Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.
1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao
To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).
2 code implementations • ICCV 2021 • Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai
However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.
Ranked #22 on Efficient ViTs on ImageNet-1K (with DeiT-T)
no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan
By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.
1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen
With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.
no code implementations • ICCV 2021 • Peng Chen, Bohan Zhuang, Chunhua Shen
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen
Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.
1 code implementation • ICLR 2020 • Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong
To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated.
3 code implementations • ECCV 2020 • Shoukai Xu, Haokun Li, Bohan Zhuang, Jing Liu, JieZhang Cao, Chuangrun Liang, Mingkui Tan
More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.
Ranked #2 on Data Free Quantization on CIFAR-100
no code implementations • 7 Feb 2020 • Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond
Instantaneous and on demand accuracy-efficiency trade-off has been recently explored in the context of neural networks slimming.
no code implementations • 3 Feb 2020 • Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond
In particular, for ResNet-18 on ImageNet, we prune 26. 12% of the model size with Binarized Neural Network quantization, achieving a top-1 classification accuracy of 47. 32% in a model of 2. 47 MB and 59. 30% with a 2-bit DoReFa-Net in 4. 36 MB.
1 code implementation • 4 Jan 2020 • Jing Liu, Bohan Zhuang, Zhuangwei Zhuang, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan
In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.
no code implementations • 22 Sep 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid
Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.
no code implementations • 5 Sep 2019 • Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin
The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.
no code implementations • 21 Aug 2019 • Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo
Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.
no code implementations • 10 Aug 2019 • Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen
Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.
no code implementations • CVPR 2020 • Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, Ian Reid
In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function.
no code implementations • CVPR 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically} for mobile devices with limited power capacity and computation resources.
1 code implementation • NeurIPS 2018 • Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
Channel pruning is one of the predominant approaches for deep model compression.
no code implementations • 8 Aug 2018 • Bohan Zhuang, Chunhua Shen, Ian Reid
In this paper, we propose to train a network with binary weights and low-bitwidth activations, designed especially for mobile devices with limited power consumption.
no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.
2 code implementations • CVPR 2018 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.
1 code implementation • ICCV 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
The proposed method still builds one classifier for one interaction (as per type (ii) above), but the classifier built is adaptive to context via weights which are context dependent.
no code implementations • 7 Jul 2017 • Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen
To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.
no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.
Human-Object Interaction Detection Relationship Detection +1
no code implementations • 18 Mar 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
Recognizing how objects interact with each other is a crucial task in visual recognition.
no code implementations • CVPR 2017 • Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid
Large-scale datasets have driven the rapid development of deep neural networks for visual recognition.
no code implementations • CVPR 2017 • Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
In this work, we propose to model the relational information between people as a sequence prediction task.
no code implementations • 27 Jul 2016 • Bohan Zhuang, Lijun Wang, Huchuan Lu
In the discriminative model, we exploit the advances of deep learning architectures to learn generic features which are robust to both background clutters and foreground appearance variations.
no code implementations • CVPR 2016 • Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid
To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.