Search Results for author: Ruslan Svirschevski

Found 3 papers, 3 papers with code

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

1 code implementation4 Jun 2024 Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin

We propose SpecExec (Speculative Execution), a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families.

Quantization

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

1 code implementation19 Feb 2024 Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.

Cannot find the paper you are looking for? You can Submit a new open access paper.