1 code implementation • 19 Jan 2024 • Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao
We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.
1 code implementation • NeurIPS 2023 • Zhengyang Geng, Ashwini Pokle, J. Zico Kolter
We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5\times$ larger ViT in terms of FID scores while striking a critical balance of computational cost and image quality.
1 code implementation • 28 Oct 2023 • Zhengyang Geng, J. Zico Kolter
Deep Equilibrium (DEQ) Models, an emerging class of implicit models that maps inputs to fixed points of neural networks, are of growing interest in the deep learning community.
1 code implementation • 23 Oct 2022 • Ashwini Pokle, Zhengyang Geng, Zico Kolter
In this paper, we look at diffusion models through a different perspective, that of a (deep) equilibrium (DEQ) fixed point model.
1 code implementation • 13 Jul 2022 • Zekun Li, Zhengyang Geng, Zhao Kang, Wenyu Chen, Yibo Yang
To understand the instability in training, we detect the gradient flow of attention and observe gradient conflict among attention branches.
1 code implementation • CVPR 2022 • Shaojie Bai, Zhengyang Geng, Yash Savani, J. Zico Kolter
Many recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms by encouraging iterative refinements toward a stable flow estimation.
Ranked #1 on Optical Flow Estimation on KITTI 2015 (train)
1 code implementation • NeurIPS 2021 • Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, Zhouchen Lin
This paper focuses on training implicit models of infinite layers.
no code implementations • NeurIPS 2021 • Yifei Wang, Zhengyang Geng, Feng Jiang, Chuming Li, Yisen Wang, Jiansheng Yang, Zhouchen Lin
Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation.
3 code implementations • ICLR 2021 • Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen Lin
As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery.
Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 test