Search Results for author: Szymon Antoniak

Found 6 papers, 5 papers with code

Scaling Laws for Fine-Grained Mixture of Experts

1 code implementation12 Feb 2024 Jakub Krajewski, Jan Ludziejewski, Kamil Adamczewski, Maciej Pióro, Michał Krutul, Szymon Antoniak, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Piotr Sankowski, Marek Cygan, Sebastian Jaszczur

Our findings not only show that MoE models consistently outperform dense Transformers but also highlight that the efficiency gap between dense and MoE models widens as we scale up the model size and training budget.

Mixture of Tokens: Continuous MoE through Cross-Example Aggregation

1 code implementation24 Oct 2023 Szymon Antoniak, Michał Krutul, Maciej Pióro, Jakub Krajewski, Jan Ludziejewski, Kamil Ciebiera, Krystian Król, Tomasz Odrzygóźdź, Marek Cygan, Sebastian Jaszczur

Our best models not only achieve a 3x increase in training speed over dense Transformer models in language pretraining but also match the performance of state-of-the-art MoE architectures.

Language Modelling Large Language Model

Magnushammer: A Transformer-Based Approach to Premise Selection

no code implementations8 Mar 2023 Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57. 0\%$ to $71. 0\%$ on the PISA benchmark using $4$x fewer parameters.

Automated Theorem Proving Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.