Search Results for author: Samy Bengio

Found 74 papers, 43 papers with code

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations24 Oct 2023 Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

no code implementations13 Oct 2023 Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).

Retrieval

Boolformer: Symbolic Regression of Logic Functions with Transformers

1 code implementation21 Sep 2023 Stéphane d'Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé

In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions.

Binary Classification regression +1

Transformers learn through gradual rank increase

no code implementations NeurIPS 2023 Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind

Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.

Incremental Learning

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

1 code implementation30 Jan 2023 Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization.

Out-of-Distribution Generalization

Continuous Soft Pseudo-Labeling in ASR

no code implementations11 Nov 2022 Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio

Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.

speech-recognition Speech Recognition

Continuous Pseudo-Labeling from the Start

no code implementations17 Oct 2022 Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko

Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization

2 code implementations27 Jul 2021 Chiyuan Zhang, Maithra Raghu, Jon Kleinberg, Samy Bengio

In PVR, this is done by having one part of the task input act as a pointer, giving instructions on a different input location, which forms the output.

Memorization Retrieval

Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss

1 code implementation NeurIPS 2021 Michael L. Iuzzolino, Michael C. Mozer, Samy Bengio

Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics.

Data Augmentation via Structured Adversarial Perturbations

no code implementations5 Nov 2020 Calvin Luo, Hossein Mobahi, Samy Bengio

The advantage of adversarial augmentation is that it replaces sampling with the use of a single, calculated perturbation that maximally increases the loss.

Data Augmentation

Characterising Bias in Compressed Models

no code implementations6 Oct 2020 Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, Emily Denton

However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE).

Fairness Quantization

Auto Completion of User Interface Layout Design Using Transformer-Based Tree Decoders

no code implementations14 Jan 2020 Yang Li, Julien Amelot, Xin Zhou, Samy Bengio, Si Si

While we focus on interface layout prediction, our model can be generally applicable for other layout prediction problems that involve tree structures and 2-dimensional placements.

Layout Design

Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks

no code implementations25 Sep 2019 Yijie Guo, Jongwook Choi, Marcin Moczulski, Samy Bengio, Mohammad Norouzi, Honglak Lee

We propose a new method of learning a trajectory-conditioned policy to imitate diverse trajectories from the agent's own past experiences and show that such self-imitation helps avoid myopic behavior and increases the chance of finding a globally optimal solution for hard-exploration tasks, especially when there are misleading rewards.

Imitation Learning

Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

no code implementations NeurIPS 2020 Yijie Guo, Jongwook Choi, Marcin Moczulski, Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee

Reinforcement learning with sparse rewards is challenging because an agent can rarely obtain non-zero rewards and hence, gradient-based optimization of parameterized policies can be incremental and slow.

Diversity Efficient Exploration +2

Parallel Scheduled Sampling

no code implementations11 Jun 2019 Daniel Duckworth, Arvind Neelakantan, Ben Goodrich, Lukasz Kaiser, Samy Bengio

Experimentally, we find the proposed technique leads to equivalent or better performance on image generation, summarization, dialog generation, and translation compared to teacher-forced training.

Image Generation Response Generation

A Closed-Form Learned Pooling for Deep Classification Networks

no code implementations10 Jun 2019 Vighnesh Birodkar, Hossein Mobahi, Dilip Krishnan, Samy Bengio

This operator can learn a strict super-set of what can be learned by average pooling or convolutions.

Classification Foveation +2

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

6 code implementations KDD 2019 Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh

Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99. 36 on the PPI dataset, while the previous best result was 98. 71 by [16].

Clustering Computational Efficiency +4

Neural Networks Trained on Natural Scenes Exhibit Gestalt Closure

1 code implementation4 Mar 2019 Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer

The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity.

Image Classification

Transfusion: Understanding Transfer Learning for Medical Imaging

2 code implementations NeurIPS 2019 Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio

Investigating the learned representations and features, we find that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse.

Image Classification Transfer Learning

Identity Crisis: Memorization and Generalization under Extreme Overparameterization

no code implementations ICLR 2020 Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer

We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task.

Memorization

Are All Layers Created Equal?

2 code implementations ICML Workshop Deep_Phenomen 2019 Chiyuan Zhang, Samy Bengio, Yoram Singer

Morally, layers of large deep neural networks can be categorized as either "robust" or "critical".

Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

no code implementations29 Jan 2019 Vighnesh Birodkar, Hossein Mobahi, Samy Bengio

Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data.

General Classification Image Classification +1

Unsupervised speech representation learning using WaveNet autoencoders

5 code implementations25 Jan 2019 Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms.

Acoustic Unit Discovery Decoder +3

You Look Twice: GaterNet for Dynamic Filter Selection in CNNs

no code implementations CVPR 2019 Zhourong Chen, Yang Li, Samy Bengio, Si Si

The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing.

Area Attention

1 code implementation ICLR 2019 Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e. g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences.

Image Captioning Machine Translation +1

Predicting the Generalization Gap in Deep Networks with Margin Distributions

2 code implementations ICLR 2019 Yiding Jiang, Dilip Krishnan, Hossein Mobahi, Samy Bengio

In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap.

Insights on representational similarity in neural networks with canonical correlation

2 code implementations NeurIPS 2018 Ari S. Morcos, Maithra Raghu, Samy Bengio

Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training.

A Study on Overfitting in Deep Reinforcement Learning

1 code implementation18 Apr 2018 Chiyuan Zhang, Oriol Vinyals, Remi Munos, Samy Bengio

We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.

Inductive Bias reinforcement-learning +1

Adversarial Attacks and Defences Competition

1 code implementation31 Mar 2018 Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe

To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.

BIG-bench Machine Learning

Tensor2Tensor for Neural Machine Translation

14 code implementations WS 2018 Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

Machine Translation Translation

Fast Decoding in Sequence Models using Discrete Latent Variables

no code implementations ICML 2018 Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models.

Machine Translation Translation

Discrete Autoencoders for Sequence Models

2 code implementations ICLR 2018 Łukasz Kaiser, Samy Bengio

We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space.

Language Modelling Machine Translation +1

On Using Backpropagation for Speech Texture Generation and Voice Conversion

no code implementations22 Dec 2017 Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances.

Image Generation speech-recognition +4

Time-Dependent Representation for Neural Event Sequence Prediction

no code implementations ICLR 2018 Yang Li, Nan Du, Samy Bengio

Because neural sequence models such as RNN are more amenable for handling token-like input, we propose two methods for time-dependent event representation, based on the intuition on how time is tokenized in everyday life and previous work on embedding contextualization.

Sentence

Device Placement Optimization with Reinforcement Learning

1 code implementation ICML 2017 Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.

Language Modelling Machine Translation +3

N-gram Language Modeling using Recurrent Neural Network Estimation

no code implementations31 Mar 2017 Ciprian Chelba, Mohammad Norouzi, Samy Bengio

Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the $n$-gram state when compared with feed-forward and vanilla RNN models.

Language Modelling Sentence

Sharp Minima Can Generalize For Deep Nets

no code implementations ICML 2017 Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio

Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice.

Revisiting Distributed Synchronous SGD

no code implementations19 Feb 2017 Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.

Stochastic Optimization

Context-aware Captions from Context-agnostic Supervision

1 code implementation CVPR 2017 Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation).

Image Captioning Language Modelling

An Online Sequence-to-Sequence Model Using Partial Conditioning

1 code implementation NeurIPS 2016 Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

Neural Combinatorial Optimization with Reinforcement Learning

10 code implementations29 Nov 2016 Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio

Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.

Combinatorial Optimization reinforcement-learning +2

Understanding deep learning requires rethinking generalization

7 code implementations10 Nov 2016 Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance.

Image Classification

Adversarial Machine Learning at Scale

7 code implementations4 Nov 2016 Alexey Kurakin, Ian Goodfellow, Samy Bengio

Adversarial examples are malicious inputs designed to fool machine learning models.

BIG-bench Machine Learning

Can Active Memory Replace Attention?

2 code implementations NeurIPS 2016 Łukasz Kaiser, Samy Bengio

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years.

Image Captioning Machine Translation +2

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

19 code implementations21 Sep 2016 Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.

Image Captioning Sentence +1

Adversarial examples in the physical world

6 code implementations8 Jul 2016 Alexey Kurakin, Ian Goodfellow, Samy Bengio

Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier.

BIG-bench Machine Learning

Density estimation using Real NVP

35 code implementations27 May 2016 Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio

Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning.

Ranked #24 on Image Generation on ImageNet 32x32 (bpd metric)

BIG-bench Machine Learning Density Estimation +1

Revisiting Distributed Synchronous SGD

4 code implementations4 Apr 2016 Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.

Stochastic Optimization

Generating Sentences from a Continuous Space

17 code implementations CONLL 2016 Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation.

Language Modelling Sentence

Order Matters: Sequence to sequence for sets

9 code implementations19 Nov 2015 Oriol Vinyals, Samy Bengio, Manjunath Kudlur

Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks.

A Neural Transducer

no code implementations16 Nov 2015 Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio

However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.

End-to-End Text-Dependent Speaker Verification

3 code implementations27 Sep 2015 Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer

In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time.

Text-Dependent Speaker Verification

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

9 code implementations NeurIPS 2015 Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.

Constituency Parsing Image Captioning +2

Zero-Shot Learning by Convex Combination of Semantic Embeddings

2 code implementations19 Dec 2013 Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean

In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.

Multi-label zero-shot learning

Using Web Co-occurrence Statistics for Improving Image Categorization

no code implementations19 Dec 2013 Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.

Common Sense Reasoning Image Categorization +1

Large-Scale Music Annotation and Retrieval: Learning to Rank in Joint Semantic Spaces

no code implementations26 May 2011 Jason Weston, Samy Bengio, Philippe Hamel

Music prediction tasks range from predicting tags given a song or clip of audio, predicting the name of the artist, or predicting related songs given a song, clip, artist name or tag.

Learning-To-Rank Multi-Task Learning +2

Label Embedding Trees for Large Multi-Class Tasks

no code implementations NeurIPS 2010 Samy Bengio, Jason Weston, David Grangier

Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible.

General Classification Multi-class Classification

An Online Algorithm for Large Scale Image Similarity Learning

no code implementations NeurIPS 2009 Gal Chechik, Uri Shalit, Varun Sharma, Samy Bengio

We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of non-zero features.

Cannot find the paper you are looking for? You can Submit a new open access paper.