no code implementations • 23 Apr 2024 • Chen Zhang, Zhuorui Liu, Dawei Song
The bottleneck is mainly due to the autoregressive innateness of LLMs, where tokens can only be generated sequentially during decoding.