Search Results for author: Samuel Weinbach

Found 11 papers, 7 papers with code

u-$μ$P: The Unit-Scaled Maximal Update Parametrization

1 code implementation24 Jul 2024 Charlie Blake, Constantin Eichenberg, Josef Dean, Lukas Balles, Luke Y. Prince, Björn Deiseroth, Andres Felipe Cruz-Salinas, Carlo Luschi, Samuel Weinbach, Douglas Orr

The Maximal Update Parametrization ($\mu$P) aims to make the optimal hyperparameters (HPs) of a model independent of its size, allowing them to be swept using a cheap proxy model rather than the full-size target model.

T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

1 code implementation27 Jun 2024 Björn Deiseroth, Manuel Brack, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses.

Cross-Lingual Transfer Transfer Learning

Efficient Parallelization Layouts for Large-Scale Distributed Model Training

1 code implementation9 Nov 2023 Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo

In this work, we conduct a comprehensive ablation study of possible training configurations for large language models.

Tokenizer Choice For LLM Training: Negligible or Crucial?

no code implementations12 Oct 2023 Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr

The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as a blind spot.

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

1 code implementation NeurIPS 2023 Björn Deiseroth, Mayukh Deb, Samuel Weinbach, Manuel Brack, Patrick Schramowski, Kristian Kersting

Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities.

Domain-Level Explainability -- A Challenge for Creating Trust in Superhuman AI Strategies

no code implementations12 Nov 2020 Jonas Andrulis, Ole Meyer, Grégory Schott, Samuel Weinbach, Volker Gruhn

For strategic problems, intelligent systems based on Deep Reinforcement Learning (DRL) have demonstrated an impressive ability to learn advanced solutions that can go far beyond human capabilities, especially when dealing with complex scenarios.

Deep Reinforcement Learning Explainable Artificial Intelligence (XAI)

Cannot find the paper you are looking for? You can Submit a new open access paper.