1 code implementation • 2 Oct 2023 • Jean Kaddour, Qi Liu
The in-context learning ability of large language models (LLMs) enables them to generalize to novel downstream tasks with relatively few labeled examples.
2 code implementations • 9 Mar 2023 • Aengus Lynch, Gbètondji J-S Dovonon, Jean Kaddour, Ricardo Silva
The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data.
1 code implementation • NeurIPS 2023 • Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner
The computation necessary for training Transformer-based language models has skyrocketed in recent years.
2 code implementations • NeurIPS 2021 • Jean Kaddour, Yuchen Zhu, Qi Liu, Matt J. Kusner, Ricardo Silva
We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e. g., graphs, images, texts).
1 code implementation • 1 Feb 2022 • Jean Kaddour, Linqing Liu, Ricardo Silva, Matt J. Kusner
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers.
1 code implementation • 29 Sep 2022 • Jean Kaddour
Training vision or language models on large datasets can take days, if not weeks.
1 code implementation • NeurIPS 2020 • Jean Kaddour, Steindór Sæmundsson, Marc Peter Deisenroth
However, this setting does not take into account the sequential nature that naturally arises when training a model from scratch in real-life: how do we collect a set of training tasks in a data-efficient manner?
1 code implementation • 5 Jun 2023 • Sunny Sanyal, Atula Neerkaje, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi
Specifically, we pre-trained nanoGPT-2 models of varying sizes, small (125M), medium (335M), and large (770M)on the OpenWebText dataset, comprised of 9B tokens.
1 code implementation • 27 Jan 2023 • Valentina Zantedeschi, Luca Franceschi, Jean Kaddour, Matt J. Kusner, Vlad Niculae
We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data.
1 code implementation • NeurIPS 2023 • Hanchen Wang, Jean Kaddour, Shengchao Liu, Jian Tang, Joan Lasenby, Qi Liu
Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels.
1 code implementation • 18 Apr 2023 • Yuwei Yin, Jean Kaddour, Xiang Zhang, Yixin Nie, Zhenguang Liu, Lingpeng Kong, Qi Liu
In addition, generative data augmentation (GDA) has been shown to produce more diverse and flexible data.
no code implementations • 30 Jun 2022 • Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva
Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM).
no code implementations • 17 Apr 2023 • Jean Kaddour
MiniPile is a 6GB subset of the deduplicated 825GB The Pile corpus.
no code implementations • 19 Jul 2023 • Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy
Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas.