no code implementations • 13 Aug 2021 • Anastasia Dietrich, Frithjof Gressmann, Douglas Orr, Ivan Chelombiev, Daniel Justus, Carlo Luschi
Identifying algorithms for computational efficient unsupervised training of large language models is an important and active area of research.
no code implementations • 10 Jun 2021 • Ivan Chelombiev, Daniel Justus, Douglas Orr, Anastasia Dietrich, Frithjof Gressmann, Alexandros Koliousis, Carlo Luschi
Attention based language models have become a critical component in state-of-the-art natural language processing systems.
1 code implementation • NeurIPS 2020 • Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi
Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters.
no code implementations • 2 Jan 2018 • Frithjof Gressmann, Franz J. Király, Bilal Mateen, Harald Oberhauser
Predictive modelling and supervised learning are central to modern data science.