no code implementations • 21 Dec 2024 • OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich, Andrey Mishchenko, Andy Applebaum, Angela Jiang, Ashvin Nair, Barret Zoph, Behrooz Ghorbani, Ben Rossen, Benjamin Sokolowsky, Boaz Barak, Bob McGrew, Borys Minaiev, Botao Hao, Bowen Baker, Brandon Houghton, Brandon McKinzie, Brydon Eastman, Camillo Lugaresi, Cary Bassin, Cary Hudson, Chak Ming Li, Charles de Bourcy, Chelsea Voss, Chen Shen, Chong Zhang, Chris Koch, Chris Orsinger, Christopher Hesse, Claudia Fischer, Clive Chan, Dan Roberts, Daniel Kappler, Daniel Levy, Daniel Selsam, David Dohan, David Farhi, David Mely, David Robinson, Dimitris Tsipras, Doug Li, Dragos Oprica, Eben Freeman, Eddie Zhang, Edmund Wong, Elizabeth Proehl, Enoch Cheung, Eric Mitchell, Eric Wallace, Erik Ritter, Evan Mays, Fan Wang, Felipe Petroski Such, Filippo Raso, Florencia Leoni, Foivos Tsimpourlas, Francis Song, Fred von Lohmann, Freddie Sulit, Geoff Salmon, Giambattista Parascandolo, Gildas Chabot, Grace Zhao, Greg Brockman, Guillaume Leclerc, Hadi Salman, Haiming Bao, Hao Sheng, Hart Andrin, Hessam Bagherinezhad, Hongyu Ren, Hunter Lightman, Hyung Won Chung, Ian Kivlichan, Ian O'Connell, Ian Osband, Ignasi Clavera Gilaberte, Ilge Akkaya, Ilya Kostrikov, Ilya Sutskever, Irina Kofman, Jakub Pachocki, James Lennon, Jason Wei, Jean Harb, Jerry Twore, Jiacheng Feng, Jiahui Yu, Jiayi Weng, Jie Tang, Jieqi Yu, Joaquin Quiñonero Candela, Joe Palermo, Joel Parish, Johannes Heidecke, John Hallman, John Rizzo, Jonathan Gordon, Jonathan Uesato, Jonathan Ward, Joost Huizinga, Julie Wang, Kai Chen, Kai Xiao, Karan Singhal, Karina Nguyen, Karl Cobbe, Katy Shi, Kayla Wood, Kendra Rimbach, Keren Gu-Lemberg, Keren GuLemberg, Kevin Liu, Kevin Lu, Kevin Stone, Kevin Yu, Lama Ahmad, Lauren Yang, Leo Liu, Leon Maksin, Leyton Ho, Liam Fedus, Lilian Weng, Linden Li, Lindsay McCallum, Lindsey Held, Lorenz Kuhn, Lukas Kondraciuk, Lukasz Kaiser, Luke Metz, Madelaine Boyd, Maja Trebacz, Manas Joglekar, Mark Chen, Marko Tintor, Mason Meyer, Matt Jones, Matt Kaufer, Max Schwarzer, Meghan Shah, Mehmet Yatbaz, Melody Guan, Mengyuan Xu, Mengyuan Yan, Mia Glaese, Mianna Chen, Michael Lampe, Michael Malek, Michele Wang, Michelle Fradin, Mike McClay, Mikhail Pavlov, Miles Wang, Mingxuan Wang, Mira Murati, Mo Bavarian, Mostafa Rohaninejad, Nat McAleese, Neil Chowdhury, Nick Ryder, Nikolas Tezak, Noam Brown, Ofir Nachum, Oleg Boiko, Oleg Murk, Olivia Watkins, Patrick Chao, Paul Ashbourne, Pavel Izmailov, Peter Zhokhov, Rachel Dias, Rahul Arora, Randall Lin, Rapha Gontijo Lopes, Raz Gaon, Reah Miyara, Reimar Leike, Renny Hwang, Rhythm Garg, Robin Brown, Roshan James, Rui Shu, Ryan Cheu, Ryan Greene, Saachi Jain, Sam Altman, Sam Toizer, Sam Toyer, Samuel Miserendino, Sandhini Agarwal, Santiago Hernandez, Sasha Baker, Scott McKinney, Scottie Yan, Shengjia Zhao, Shengli Hu, Shibani Santurkar, Shraman Ray Chaudhuri, Shuyuan Zhang, Siyuan Fu, Spencer Papay, Steph Lin, Suchir Balaji, Suvansh Sanjeev, Szymon Sidor, Tal Broda, Aidan Clark, Tao Wang, Taylor Gordon, Ted Sanders, Tejal Patwardhan, Thibault Sottiaux, Thomas Degry, Thomas Dimson, Tianhao Zheng, Timur Garipov, Tom Stasi, Trapit Bansal, Trevor Creech, Troy Peterson, Tyna Eloundou, Valerie Qi, Vineet Kosaraju, Vinnie Monaco, Vitchyr Pong, Vlad Fomenko, Weiyi Zheng, Wenda Zhou, Wes McCabe, Wojciech Zaremba, Yann Dubois, Yinghai Lu, Yining Chen, Young Cha, Yu Bai, Yuchen He, Yuchen Zhang, Yunyun Wang, Zheng Shao, Zhuohan Li
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
1 code implementation • 2 Mar 2024 • Martin Marek, Brooks Paige, Pavel Izmailov
First, we introduce a "DirClip" prior that is practical to sample and nearly matches the performance of a cold posterior.
no code implementations • 14 Dec 2023 • Collin Burns, Pavel Izmailov, Jan Hendrik Kirchner, Bowen Baker, Leo Gao, Leopold Aschenbrenner, Yining Chen, Adrien Ecoffet, Manas Joglekar, Jan Leike, Ilya Sutskever, Jeff Wu
Widely used alignment techniques, such as reinforcement learning from human feedback (RLHF), rely on the ability of humans to supervise model behavior - for example, to evaluate whether a model faithfully followed instructions or generated safe outputs.
1 code implementation • 19 Jun 2023 • Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson
A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target.
5 code implementations • CVPR 2023 • Lucas Beyer, Pavel Izmailov, Alexander Kolesnikov, Mathilde Caron, Simon Kornblith, Xiaohua Zhai, Matthias Minderer, Michael Tschannen, Ibrahim Alabdulmohsin, Filip Pavetic
Vision Transformers convert images to sequences by slicing them into patches.
1 code implementation • 20 Oct 2022 • Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson
Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds.
4 code implementations • 6 Apr 2022 • Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions.
Ranked #1 on
Out-of-Distribution Generalization
on UrbanCars
1 code implementation • 30 Mar 2022 • Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson
In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter.
1 code implementation • 23 Feb 2022 • Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
1 code implementation • NeurIPS 2021 • Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data.
2 code implementations • NeurIPS 2021 • Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson
Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks.
3 code implementations • 29 Apr 2021 • Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson
The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex.
no code implementations • NeurIPS 2020 • Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Invariances to translations have imbued convolutional neural networks with powerful generalization properties.
1 code implementation • 22 Oct 2020 • Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Invariances to translations have imbued convolutional neural networks with powerful generalization properties.
1 code implementation • NeurIPS 2020 • Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson
Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems.
2 code implementations • ICML 2020 • Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson
The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems.
1 code implementation • NeurIPS 2020 • Andrew Gordon Wilson, Pavel Izmailov
The key distinguishing property of a Bayesian approach is marginalization, rather than using a single setting of weights.
2 code implementations • ICML 2020 • Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson
Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood.
Semi-Supervised Image Classification
Semi-Supervised Text Classification
1 code implementation • 17 Jul 2019 • Pavel Izmailov, Wesley J. Maddox, Polina Kirichenko, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty.
9 code implementations • NeurIPS 2019 • Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson
We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning.
2 code implementations • ICLR 2019 • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters.
17 code implementations • 14 Mar 2018 • Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.
Ranked #77 on
Image Classification
on CIFAR-100
(using extra training data)
9 code implementations • NeurIPS 2018 • Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson
The loss functions of deep neural networks are complex and their geometric properties are not well understood.
2 code implementations • 5 Jan 2018 • Alexander Novikov, Pavel Izmailov, Valentin Khrulkov, Michael Figurnov, Ivan Oseledets
Tensor Train decomposition is used across many branches of machine learning.
Mathematical Software Numerical Analysis
1 code implementation • 19 Oct 2017 • Pavel Izmailov, Alexander Novikov, Dmitry Kropotov
We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models.
no code implementations • 18 Nov 2016 • Pavel Izmailov, Dmitry Kropotov
However, the new lower bound depends on $O(m^2)$ variational parameter, which makes optimization challenging in case of big m. In this work we develop a new approach for training inducing input GP models for classification problems.