1 code implementation • 9 Nov 2023 • Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo
In this work, we conduct a comprehensive ablation study of possible training configurations for large language models.
no code implementations • 12 Oct 2023 • Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr
The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot.
1 code implementation • NeurIPS 2023 • Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach
The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users.
1 code implementation • NeurIPS 2023 • Björn Deiseroth, Mayukh Deb, Samuel Weinbach, Manuel Brack, Patrick Schramowski, Kristian Kersting
Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities.
no code implementations • 6 Dec 2022 • Samuel Weinbach, Marco Bellagente, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Björn Deiseroth, Koen Oostermeijer, Hannah Teufel, Andres Felipe Cruz-Salinas
We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text.
5 code implementations • BigScience (ACL) 2022 • Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.
Ranked #86 on Multi-task Language Understanding on MMLU
1 code implementation • 9 Dec 2021 • Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia Parcalabescu, Anette Frank
Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling.
no code implementations • 12 Nov 2020 • Jonas Andrulis, Ole Meyer, Grégory Schott, Samuel Weinbach, Volker Gruhn
For strategic problems, intelligent systems based on Deep Reinforcement Learning (DRL) have demonstrated an impressive ability to learn advanced solutions that can go far beyond human capabilities, especially when dealing with complex scenarios.