no code implementations • 29 Sep 2023 • Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet
We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models.
no code implementations • 21 Apr 2023 • Paul Vicol, Zico Kolter, Kevin Swersky
We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single.
1 code implementation • 1 Nov 2022 • Sadegh Mahdavi, Kevin Swersky, Thomas Kipf, Milad Hashemi, Christos Thrampoulidis, Renjie Liao
In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal is to learn an algorithm (e. g., sorting, breadth-first search, and depth-first search) from input-output pairs using deep neural networks.
no code implementations • CVPR 2023 • Cristina Vasconcelos, Cengiz Oztireli, Mark Matthews, Milad Hashemi, Kevin Swersky, Andrea Tagliasacchi
Neural fields have rapidly been adopted for representing 3D signals, but their application to more classical 2D image-processing has been relatively limited.
no code implementations • 9 Aug 2022 • Binghong Chen, Daniel Tarlow, Kevin Swersky, Martin Maas, Pablo Heiber, Ashish Naik, Milad Hashemi, Parthasarathy Ranganathan
To automatically learn these hints from the dataset, we propose a novel discrete variational auto-encoder, where each discrete latent variable represents a different learned category of code-edit that increases performance.
1 code implementation • 7 Jul 2022 • Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani
Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully.
1 code implementation • ICLR 2022 • Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine
An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations.
4 code implementations • 16 Sep 2021 • Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani
Contrary to a common expectation that BO is suited to optimizing black-box functions, it actually requires domain knowledge about those functions to deploy BO successfully.
1 code implementation • 12 Feb 2021 • Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, Danai Koutra
We are the first to take a unified perspective to jointly explain the oversmoothing and heterophily problems at the node level.
1 code implementation • 8 Feb 2021 • Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison
We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables.
no code implementations • 2 Feb 2021 • Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon
We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements).
no code implementations • 18 Dec 2020 • Francis Williams, Or Litany, Avneesh Sud, Kevin Swersky, Andrea Tagliasacchi
We introduce a technique for 3D human keypoint estimation that directly models the notion of spatial uncertainty of a keypoint.
1 code implementation • ICLR 2021 • Will Grathwohl, Jacob Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud
Energy-Based Models (EBMs) present a flexible and appealing way to represent uncertainty.
no code implementations • 5 Oct 2020 • Zhan Shi, Chirag Sakhuja, Milad Hashemi, Kevin Swersky, Calvin Lin
The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning.
no code implementations • ICML 2020 • Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier
We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.
1 code implementation • ICML 2020 • Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn
While directly applying Belady's is infeasible since the future is unknown, we train a policy conditioned only on past accesses that accurately approximates Belady's even on diverse and complex access patterns, and call this approach Parrot.
8 code implementations • NeurIPS 2020 • Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton
The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.
Self-Supervised Image Classification
Semi-Supervised Image Classification
1 code implementation • NeurIPS 2020 • Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi
A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms.
1 code implementation • 18 Feb 2020 • Micha Livne, Kevin Swersky, David J. Fleet
MIM learning encourages high mutual information between observations and latent variables, and is robust against posterior collapse.
Ranked #1 on
Question Answering
on YahooCQA
(using extra training data)
no code implementations • ICLR 2020 • Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi
Turing complete computation and reasoning are often regarded as necessary pre- cursors to general intelligence.
4 code implementations • ICLR 2020 • Will Grathwohl, Kuan-Chieh Wang, Jörn-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, Kevin Swersky
In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(x|y).
1 code implementation • 8 Oct 2019 • Micha Livne, Kevin Swersky, David J. Fleet
Experiments show that MIM learns representations with high mutual information, consistent encoding and decoding distributions, effective latent clustering, and data log likelihood comparable to VAE, while avoiding posterior collapse.
no code implementations • 4 Oct 2019 • Micha Livne, Kevin Swersky, David J. Fleet
We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework.
no code implementations • ICLR 2020 • Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi
In this work, we propose a new approach to use GNNs to learn fused representations of general source code and its execution.
no code implementations • 6 Jun 2019 • Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel
We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes.
2 code implementations • 31 May 2019 • Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton
Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.
1 code implementation • NeurIPS 2019 • Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky
We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.
no code implementations • 4 Apr 2019 • Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow
In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files.
12 code implementations • ICLR 2020 • Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle
Few-shot classification refers to learning a classifier for new classes given only a few examples.
Ranked #7 on
Few-Shot Image Classification
on Meta-Dataset Rank
1 code implementation • NIPS Workshop CDNNRIA 2018 • Aidan N. Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton
Neural networks are extremely flexible models due to their large number of parameters, which is beneficial for learning, but also highly redundant.
no code implementations • ICML 2018 • Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan
In this paper, we demonstrate the potential of deep learning to address the von Neumann bottleneck of memory performance.
8 code implementations • ICLR 2018 • Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel
To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes.
no code implementations • 16 Jun 2017 • Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly
This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.
no code implementations • 16 May 2017 • Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly
There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.
42 code implementations • NeurIPS 2017 • Jake Snell, Kevin Swersky, Richard S. Zemel
We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class.
2 code implementations • 3 Nov 2015 • Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel
We investigate the problem of learning representations that are invariant to certain nuisance or sensitive factors of variation in the data while retaining as much of the remaining information as possible.
Ranked #4 on
Sentiment Analysis
on Multi-Domain Sentiment Dataset
no code implementations • ICCV 2015 • Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov
One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images.
4 code implementations • 19 Feb 2015 • Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat, Ryan P. Adams
Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations.
Ranked #155 on
Image Classification
on CIFAR-100
(using extra training data)
3 code implementations • 10 Feb 2015 • Yujia Li, Kevin Swersky, Richard Zemel
We consider the problem of learning deep generative models from data.
no code implementations • 17 Dec 2014 • Yujia Li, Kevin Swersky, Richard Zemel
Different forms of representation learning can be derived from alternative definitions of unwanted bias, e. g., bias to particular tasks, domains, or irrelevant underlying data dimensions.
no code implementations • 14 Sep 2014 • Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, Michael A. Osborne
In practical Bayesian optimization, we must often search over structures with differing numbers of parameters.
1 code implementation • 16 Jun 2014 • Kevin Swersky, Jasper Snoek, Ryan Prescott Adams
In this paper we develop a dynamic form of Bayesian optimization for machine learning models with the goal of rapidly finding good hyperparameter settings.
1 code implementation • 5 Feb 2014 • Jasper Snoek, Kevin Swersky, Richard S. Zemel, Ryan P. Adams
Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions.
1 code implementation • NeurIPS 2013 • Kevin Swersky, Jasper Snoek, Ryan P. Adams
We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset.
Ranked #94 on
Image Classification
on STL-10
2 code implementations • International Conference on Machine Learning 2013 • Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork
We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).
no code implementations • NeurIPS 2012 • Kevin Swersky, Ilya Sutskever, Daniel Tarlow, Richard S. Zemel, Ruslan R. Salakhutdinov, Ryan P. Adams
The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features.