no code implementations • 2 Feb 2024 • David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun
Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets.
Ranked #1 on Language Modelling on The Pile (Test perplexity metric)
no code implementations • 1 Nov 2023 • Mark Levy, Bruno Di Giorgi, Floris Weers, Angelos Katharopoulos, Tom Nickson
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44. 1kHz stereo audio with sampling-time guidance.
no code implementations • CVPR 2023 • Floris Weers, Vaishaal Shankar, Angelos Katharopoulos, Yinfei Yang, Tom Gunter
Self supervision and natural language supervision have emerged as two exciting ways to train general purpose image encoders which excel at a variety of downstream tasks.
1 code implementation • CVPR 2021 • Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, Sanja Fidler
The INN allows us to compute the inverse mapping of the homeomorphism, which in turn, enables the efficient computation of both the implicit surface function of a primitive and its mesh, without any additional post-processing.
1 code implementation • NeurIPS 2020 • Apoorv Vyas, Angelos Katharopoulos, François Fleuret
This results in a model with linear complexity with respect to the sequence length for a fixed number of clusters.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
5 code implementations • ICML 2020 • Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences.
Ranked #5 on Offline RL on D4RL
2 code implementations • 3 May 2019 • Angelos Katharopoulos, François Fleuret
We show that sampling from the attention distribution results in an unbiased estimator of the full model with minimal variance, and we derive an unbiased estimator of the gradient that we use to train our model end-to-end with a normal SGD procedure.
2 code implementations • ICML 2018 • Angelos Katharopoulos, François Fleuret
Deep neural network training spends most of the computation on examples that are properly handled, and could be ignored.
no code implementations • 26 Jun 2017 • Angelos Katharopoulos, Despoina Paschalidou, Christos Diou, Anastasios Delopoulos
This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem).
1 code implementation • 31 May 2017 • Angelos Katharopoulos, François Fleuret
Importance sampling has been successfully used to accelerate stochastic optimization in many convex problems.