Search Results for author: Szymon Antoniak

Found 5 papers, 4 papers with code

Scaling Laws for Fine-Grained Mixture of Experts

1 code implementation • 12 Feb 2024 • Jakub Krajewski, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Piotr Sankowski, Marek Cygan, Sebastian Jaszczur

Our findings not only show that MoE models consistently outperform dense Transformers but also highlight that the efficiency gap between dense and MoE models widens as we scale up the model size and training budget.

145

Paper
Code

Mixtral of Experts

3 code implementations • 8 Jan 2024 • Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.

Ranked #9 on Question Answering on PIQA

Code Generation Common Sense Reasoning +4

611

Paper
Code

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

1 code implementation • 8 Jan 2024 • Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Michał Krutul, Jakub Krajewski, Szymon Antoniak, Piotr Miłoś, Marek Cygan, Sebastian Jaszczur

State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers.

145

Paper
Code

Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation

1 code implementation • 24 Oct 2023 • Szymon Antoniak, Sebastian Jaszczur, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Tomasz Odrzygóźdź, Marek Cygan

The operation of matching experts and tokens is discrete, which makes MoE models prone to issues like training instability and uneven expert utilization.

Language Modelling Large Language Model

145

Paper
Code

Magnushammer: A Transformer-Based Approach to Premise Selection

no code implementations • 8 Mar 2023 • Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57. 0\%$ to $71. 0\%$ on the PISA benchmark using $4$x fewer parameters.

Automated Theorem Proving Language Modelling +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.