1 code implementation • 10 Jun 2024 • Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi
Can Transformers predict new syllogisms by composing established ones?
no code implementations • 24 Oct 2023 • Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.
1 code implementation • 15 Oct 2023 • Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
We investigate the capabilities of transformer models on relational reasoning tasks.
no code implementations • 13 Oct 2023 • Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio
We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i. e., the depth of the computation graph).
1 code implementation • 21 Sep 2023 • Stéphane d'Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé
In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions.
no code implementations • NeurIPS 2023 • Enric Boix-Adsera, Etai Littwin, Emmanuel Abbe, Samy Bengio, Joshua Susskind
Our experiments support the theory and also show that phenomenon can occur in practice without the simplifying assumptions.
1 code implementation • 30 Jan 2023 • Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Kevin Rizk
This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization.
no code implementations • 11 Nov 2022 • Tatiana Likhomanenko, Ronan Collobert, Navdeep Jaitly, Samy Bengio
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition.
no code implementations • 17 Oct 2022 • Dan Berrebbi, Ronan Collobert, Samy Bengio, Navdeep Jaitly, Tatiana Likhomanenko
Nevertheless, these approaches still rely on bootstrapping the ST using an initial supervised learning phase where the model is trained on labeled data alone.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 26 May 2022 • Emmanuel Abbe, Samy Bengio, Elisabetta Cornacchia, Jon Kleinberg, Aryo Lotfi, Maithra Raghu, Chiyuan Zhang
More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks.
2 code implementations • 27 Jul 2021 • Chiyuan Zhang, Maithra Raghu, Jon Kleinberg, Samy Bengio
In PVR, this is done by having one part of the task input act as a pointer, giving instructions on a different input location, which forms the output.
1 code implementation • NeurIPS 2021 • Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio
Attentional mechanisms are order-invariant.
1 code implementation • NeurIPS 2021 • Michael L. Iuzzolino, Michael C. Mozer, Samy Bengio
Although deep feedforward neural networks share some characteristics with the primate visual system, a key distinction is their dynamics.
no code implementations • 14 Dec 2020 • Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur
Understanding generalization in deep learning is arguably one of the most important questions in deep learning.
no code implementations • 5 Nov 2020 • Calvin Luo, Hossein Mobahi, Samy Bengio
The advantage of adversarial augmentation is that it replaces sampling with the use of a single, calculated perturbation that maximally increases the loss.
no code implementations • 6 Oct 2020 • Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, Emily Denton
However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE).
no code implementations • 14 Jan 2020 • Yang Li, Julien Amelot, Xin Zhou, Samy Bengio, Si Si
While we focus on interface layout prediction, our model can be generally applicable for other layout prediction problems that involve tree structures and 2-dimensional placements.
3 code implementations • ICLR 2020 • Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, Samy Bengio
We present the first large scale study of generalization in deep networks.
no code implementations • 25 Sep 2019 • Yijie Guo, Jongwook Choi, Marcin Moczulski, Samy Bengio, Mohammad Norouzi, Honglak Lee
We propose a new method of learning a trajectory-conditioned policy to imitate diverse trajectories from the agent's own past experiences and show that such self-imitation helps avoid myopic behavior and increases the chance of finding a globally optimal solution for hard-exploration tasks, especially when there are misleading rewards.
2 code implementations • ICLR 2020 • Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals
We conclude with a discussion of the rapid learning vs feature reuse question for meta-learning algorithms more broadly.
no code implementations • NeurIPS 2020 • Yijie Guo, Jongwook Choi, Marcin Moczulski, Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee
Reinforcement learning with sparse rewards is challenging because an agent can rarely obtain non-zero rewards and hence, gradient-based optimization of parameterized policies can be incremental and slow.
no code implementations • 11 Jun 2019 • Daniel Duckworth, Arvind Neelakantan, Ben Goodrich, Lukasz Kaiser, Samy Bengio
Experimentally, we find the proposed technique leads to equivalent or better performance on image generation, summarization, dialog generation, and translation compared to teacher-forced training.
no code implementations • 10 Jun 2019 • Vighnesh Birodkar, Hossein Mobahi, Dilip Krishnan, Samy Bengio
This operator can learn a strict super-set of what can be learned by average pooling or convolutions.
6 code implementations • KDD 2019 • Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh
Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy---using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99. 36 on the PPI dataset, while the previous best result was 98. 71 by [16].
Ranked #1 on Node Classification on Amazon2M
1 code implementation • 4 Mar 2019 • Been Kim, Emily Reif, Martin Wattenberg, Samy Bengio, Michael C. Mozer
The Gestalt laws of perceptual organization, which describe how visual elements in an image are grouped and interpreted, have traditionally been thought of as innate despite their ecological validity.
2 code implementations • NeurIPS 2019 • Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio
Investigating the learned representations and features, we find that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse.
no code implementations • ICLR 2020 • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer
We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task.
2 code implementations • ICML Workshop Deep_Phenomen 2019 • Chiyuan Zhang, Samy Bengio, Yoram Singer
Morally, layers of large deep neural networks can be categorized as either "robust" or "critical".
no code implementations • 29 Jan 2019 • Vighnesh Birodkar, Hossein Mobahi, Samy Bengio
Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data.
5 code implementations • 25 Jan 2019 • Jan Chorowski, Ron J. Weiss, Samy Bengio, Aäron van den Oord
We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms.
no code implementations • CVPR 2019 • Zhourong Chen, Yang Li, Samy Bengio, Si Si
The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing.
1 code implementation • NeurIPS 2018 • Lajanugen Logeswaran, Honglak Lee, Samy Bengio
We propose an adversarial loss to enforce generated samples to be attribute compatible and realistic.
1 code implementation • ICLR 2019 • Yang Li, Lukasz Kaiser, Samy Bengio, Si Si
We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e. g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences.
2 code implementations • ICLR 2019 • Yiding Jiang, Dilip Krishnan, Hossein Mobahi, Samy Bengio
In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap.
2 code implementations • NeurIPS 2018 • Ari S. Morcos, Maithra Raghu, Samy Bengio
Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training.
1 code implementation • 18 Apr 2018 • Chiyuan Zhang, Oriol Vinyals, Remi Munos, Samy Bengio
We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.
1 code implementation • 31 Mar 2018 • Alexey Kurakin, Ian Goodfellow, Samy Bengio, Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu, Xiaolin Hu, Cihang Xie, Jian-Yu Wang, Zhishuai Zhang, Zhou Ren, Alan Yuille, Sangxia Huang, Yao Zhao, Yuzhe Zhao, Zhonglin Han, Junjiajia Long, Yerkebulan Berdibekov, Takuya Akiba, Seiya Tokui, Motoki Abe
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them.
14 code implementations • WS 2018 • Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit
Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.
2 code implementations • NeurIPS 2018 • Gamaleldin F. Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, Samy Bengio
We present a formulation of deep learning that aims at producing a large margin classifier.
no code implementations • ICML 2018 • Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer
Finally, we evaluate our model end-to-end on the task of neural machine translation, where it is an order of magnitude faster at decoding than comparable autoregressive models.
2 code implementations • ICLR 2018 • Łukasz Kaiser, Samy Bengio
We propose to improve the representation in sequence models by augmenting current approaches with an autoencoder that is forced to compress the sequence through an intermediate discrete latent space.
no code implementations • 22 Dec 2017 • Jan Chorowski, Ron J. Weiss, Rif A. Saurous, Samy Bengio
Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances.
no code implementations • ICLR 2018 • Yang Li, Nan Du, Samy Bengio
Because neural sequence models such as RNN are more amenable for handling token-like input, we propose two methods for time-dependent event representation, based on the intuition on how time is tokenized in everyday life and previous work on embedding contextualization.
1 code implementation • ICML 2017 • Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean
Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.
no code implementations • 31 Mar 2017 • Ciprian Chelba, Mohammad Norouzi, Samy Bengio
Experiments on a small corpus (UPenn Treebank, one million words of training data and 10k vocabulary) have found the LSTM cell with dropout to be the best model for encoding the $n$-gram state when compared with feed-forward and vanilla RNN models.
30 code implementations • 29 Mar 2017 • Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module.
Ranked #5 on Speech Synthesis on North American English
no code implementations • ICML 2017 • Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio
Despite their overwhelming capacity to overfit, deep learning architectures tend to generalize relatively well to unseen data, allowing them to be deployed in practice.
2 code implementations • 9 Mar 2017 • Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio
We present a large-scale life-long memory module for use in deep learning.
no code implementations • 19 Feb 2017 • Xinghao Pan, Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz
Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.
1 code implementation • CVPR 2017 • Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik
We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation).
1 code implementation • NeurIPS 2016 • Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
10 code implementations • 29 Nov 2016 • Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, Samy Bengio
Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes.
7 code implementations • 10 Nov 2016 • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance.
7 code implementations • 4 Nov 2016 • Alexey Kurakin, Ian Goodfellow, Samy Bengio
Adversarial examples are malicious inputs designed to fool machine learning models.
2 code implementations • NeurIPS 2016 • Łukasz Kaiser, Samy Bengio
Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years.
Ranked #54 on Machine Translation on WMT2014 English-French
19 code implementations • 21 Sep 2016 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing.
no code implementations • NeurIPS 2016 • Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans
A key problem in structured output prediction is direct optimization of the task reward function that matters for test evaluation.
6 code implementations • 8 Jul 2016 • Alexey Kurakin, Ian Goodfellow, Samy Bengio
Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier.
35 code implementations • 27 May 2016 • Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio
Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning.
Ranked #24 on Image Generation on ImageNet 32x32 (bpd metric)
4 code implementations • 4 Apr 2016 • Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz
Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.
17 code implementations • CONLL 2016 • Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation.
9 code implementations • 19 Nov 2015 • Oriol Vinyals, Samy Bengio, Manjunath Kudlur
Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks.
no code implementations • 16 Nov 2015 • Navdeep Jaitly, David Sussillo, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, Samy Bengio
However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences.
3 code implementations • 27 Sep 2015 • Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer
In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time.
9 code implementations • NeurIPS 2015 • Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer
Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning.
no code implementations • CVPR 2015 • Vignesh Ramanathan, Cong-Cong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang song, Samy Bengio, Charles Rosenberg, Li Fei-Fei
Human actions capture a wide variety of interactions between people and objects.
74 code implementations • CVPR 2015 • Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan
Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.
Ranked #3 on Image Retrieval with Multi-Modal Query on MIT-States
2 code implementations • 19 Dec 2013 • Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean
In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.
Ranked #8 on Multi-label zero-shot learning on Open Images V4
no code implementations • 19 Dec 2013 • Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer
Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.
no code implementations • NeurIPS 2013 • Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov
Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories.
Ranked #14 on Zero-Shot Action Recognition on Kinetics
no code implementations • 26 May 2011 • Jason Weston, Samy Bengio, Philippe Hamel
Music prediction tasks range from predicting tags given a song or clip of audio, predicting the name of the artist, or predicting related songs given a song, clip, artist name or tag.
no code implementations • NeurIPS 2010 • Samy Bengio, Jason Weston, David Grangier
Multi-class classification becomes challenging at test time when the number of classes is very large and testing against every possible class can become computationally infeasible.
no code implementations • NeurIPS 2009 • Gal Chechik, Uri Shalit, Varun Sharma, Samy Bengio
We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of non-zero features.
no code implementations • NeurIPS 2009 • Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Bag-of-words document representations are often used in text, image and video processing.