Speed and Conversational Large Language Models: Not All Is About Tokens per Second

23 Feb 2025  ·  Javier Conde, Miguel González, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi ·

The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods