Search Results for author: Kevin Swersky

Found 39 papers, 22 papers with code

Data-Driven Offline Optimization For Architecting Hardware Accelerators

1 code implementation ICLR 2022 Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine

An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations.

Computer Architecture and Systems

Pre-training helps Bayesian optimization too

2 code implementations16 Sep 2021 Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully.

Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks

1 code implementation12 Feb 2021 Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, Danai Koutra

In our theoretical analysis, we show that the common causes of the heterophily and oversmoothing problems--namely, the relative degree of a node and its heterophily level--trigger the node representations in consecutive layers to "move" closer to the original decision boundary, which increases the misclassification rate of node labels under certain constraints.

Node Classification

Oops I Took A Gradient: Scalable Sampling for Discrete Distributions

1 code implementation8 Feb 2021 Will Grathwohl, Kevin Swersky, Milad Hashemi, David Duvenaud, Chris J. Maddison

We propose a general and scalable approximate sampling strategy for probabilistic models with discrete variables.

Apollo: Transferable Architecture Exploration

no code implementations2 Feb 2021 Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon

We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements).

Human 3D keypoints via spatial uncertainty modeling

no code implementations18 Dec 2020 Francis Williams, Or Litany, Avneesh Sud, Kevin Swersky, Andrea Tagliasacchi

We introduce a technique for 3D human keypoint estimation that directly models the notion of spatial uncertainty of a keypoint.

Learned Hardware/Software Co-Design of Neural Accelerators

no code implementations5 Oct 2020 Zhan Shi, Chirag Sakhuja, Milad Hashemi, Kevin Swersky, Calvin Lin

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning.

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

no code implementations ICML 2020 Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.

Fairness Recommendation Systems

An Imitation Learning Approach for Cache Replacement

1 code implementation ICML 2020 Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn

While directly applying Belady's is infeasible since the future is unknown, we train a policy conditioned only on past accesses that accurately approximates Belady's even on diverse and complex access patterns, and call this approach Parrot.

Imitation Learning

Big Self-Supervised Models are Strong Semi-Supervised Learners

8 code implementations NeurIPS 2020 Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton

The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.

Self-Supervised Image Classification Semi-Supervised Image Classification

Neural Execution Engines: Learning to Execute Subroutines

1 code implementation NeurIPS 2020 Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

A significant effort has been made to train neural networks that replicate algorithmic reasoning, but they often fail to learn the abstract concepts underlying these algorithms.

Learning to Execute

SentenceMIM: A Latent Variable Language Model

1 code implementation18 Feb 2020 Micha Livne, Kevin Swersky, David J. Fleet

MIM learning encourages high mutual information between observations and latent variables, and is robust against posterior collapse.

 Ranked #1 on Question Answering on YahooCQA (using extra training data)

Language Modelling Question Answering +1

NEURAL EXECUTION ENGINES

no code implementations ICLR 2020 Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi

Turing complete computation and reasoning are often regarded as necessary pre- cursors to general intelligence.

MIM: Mutual Information Machine

1 code implementation8 Oct 2019 Micha Livne, Kevin Swersky, David J. Fleet

Experiments show that MIM learns representations with high mutual information, consistent encoding and decoding distributions, effective latent clustering, and data log likelihood comparable to VAE, while avoiding posterior collapse.

High Mutual Information in Representation Learning with Symmetric Variational Inference

no code implementations4 Oct 2019 Micha Livne, Kevin Swersky, David J. Fleet

We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework.

Representation Learning Variational Inference

Learning Execution through Neural Code Fusion

no code implementations ICLR 2020 Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi

In this work, we propose a new approach to use GNNs to learn fused representations of general source code and its execution.

Transfer Learning

Flexibly Fair Representation Learning by Disentanglement

no code implementations6 Jun 2019 Elliot Creager, David Madras, Jörn-Henrik Jacobsen, Marissa A. Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel

We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes.

Disentanglement Fairness +1

Learning Sparse Networks Using Targeted Dropout

2 code implementations31 May 2019 Aidan N. Gomez, Ivan Zhang, Siddhartha Rao Kamalakara, Divyam Madaan, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.

Network Pruning Neural Network Compression

Graph Normalizing Flows

1 code implementation NeurIPS 2019 Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky

We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.

Neural Networks for Modeling Source Code Edits

no code implementations4 Apr 2019 Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow

In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files.

Targeted Dropout

1 code implementation NIPS Workshop CDNNRIA 2018 Aidan N. Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, Geoffrey E. Hinton

Neural networks are extremely flexible models due to their large number of parameters, which is beneficial for learning, but also highly redundant.

Learning Memory Access Patterns

no code implementations ICML 2018 Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan

In this paper, we demonstrate the potential of deep learning to address the von Neumann bottleneck of memory performance.

Meta-Learning for Semi-Supervised Few-Shot Classification

8 code implementations ICLR 2018 Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel

To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes.

Classification General Classification +1

An online sequence-to-sequence model for noisy speech recognition

no code implementations16 Jun 2017 Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition

Learning Hard Alignments with Variational Inference

no code implementations16 May 2017 Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.

Hard Attention Image Captioning +4

Prototypical Networks for Few-shot Learning

40 code implementations NeurIPS 2017 Jake Snell, Kevin Swersky, Richard S. Zemel

We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class.

Few-Shot Image Classification General Classification +2

The Variational Fair Autoencoder

2 code implementations3 Nov 2015 Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, Richard Zemel

We investigate the problem of learning representations that are invariant to certain nuisance or sensitive factors of variation in the data while retaining as much of the remaining information as possible.

General Classification Sentiment Analysis

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions

no code implementations ICCV 2015 Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov

One of the main challenges in Zero-Shot Learning of visual categories is gathering semantic attributes to accompany images.

Zero-Shot Learning

Generative Moment Matching Networks

3 code implementations10 Feb 2015 Yujia Li, Kevin Swersky, Richard Zemel

We consider the problem of learning deep generative models from data.

Two-sample testing

Learning unbiased features

no code implementations17 Dec 2014 Yujia Li, Kevin Swersky, Richard Zemel

Different forms of representation learning can be derived from alternative definitions of unwanted bias, e. g., bias to particular tasks, domains, or irrelevant underlying data dimensions.

Domain Adaptation Representation Learning +1

Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces

no code implementations14 Sep 2014 Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, Michael A. Osborne

In practical Bayesian optimization, we must often search over structures with differing numbers of parameters.

Freeze-Thaw Bayesian Optimization

no code implementations16 Jun 2014 Kevin Swersky, Jasper Snoek, Ryan Prescott Adams

In this paper we develop a dynamic form of Bayesian optimization for machine learning models with the goal of rapidly finding good hyperparameter settings.

Input Warping for Bayesian Optimization of Non-stationary Functions

1 code implementation5 Feb 2014 Jasper Snoek, Kevin Swersky, Richard S. Zemel, Ryan P. Adams

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions.

Gaussian Processes

Multi-Task Bayesian Optimization

1 code implementation NeurIPS 2013 Kevin Swersky, Jasper Snoek, Ryan P. Adams

We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset.

Gaussian Processes Image Classification

Cardinality Restricted Boltzmann Machines

no code implementations NeurIPS 2012 Kevin Swersky, Ilya Sutskever, Daniel Tarlow, Richard S. Zemel, Ruslan R. Salakhutdinov, Ryan P. Adams

The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features.

Cannot find the paper you are looking for? You can Submit a new open access paper.