no code implementations • 26 May 2024 • Jiayi Yao, Hanchen Li, YuHan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang
To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input.
1 code implementation • 24 Oct 2023 • Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang
Nonetheless, prompting LLMs with compiler source-code information remains a missing piece of research in compiler testing.
1 code implementation • 11 Oct 2023 • YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 5-4. 3x and the total delay in fetching and processing contexts by 3. 2-3. 7x while having negligible impact on the LLM response quality in accuracy or perplexity.
1 code implementation • 29 Oct 2022 • Jiayi Yao, Ping Li, Xiatao Kang, Yuzhe Wang
Firstly, we train a sparse model by GL penalty, and impose an angle dissimilarity constraint on the channels and filters of convolutional network to obtain a more sparse structure.
1 code implementation • 27 Sep 2022 • Xiatao Kang, Ping Li, Jiayi Yao, Chengxi Li
Pruning on neural networks before training not only compresses the original models, but also accelerates the network training phase, which has substantial application value.