1 code implementation • 29 Apr 2024 • Davis Wertheimer, Joshua Rosenkranz, Thomas Parnell, Sahil Suneja, Pavithra Ranganathan, Raghu Ganti, Mudhakar Srivatsa
This technical report describes the design and training of novel speculative decoding draft models, for accelerating the inference speeds of large language models in a production environment.