Search Results for author: Sungsoo Ha

BASS: Batched Attention-optimized Speculative Sampling

Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models.

Paper
Add Code

We tested the proposed method with two clinical datasets that were both obtained during spine surgery.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.