Search Results for author: Ruslan Svirschevski

Found 3 papers, 3 papers with code

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

1 code implementation • 4 Jun 2024 • Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin

We propose SpecExec (Speculative Execution), a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families.

Quantization

Paper
Code

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

1 code implementation • 19 Feb 2024 • Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.

271

Paper
Code

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

1 code implementation • 5 Jun 2023 • Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh

Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities.

Language Modelling Large Language Model +1

513

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.