Search Results for author: Arthur Szlam

Found 62 papers, 27 papers with code

Fast Adaptation to New Environments via Policy-Dynamics Value Functions

no code implementations ICML 2020 Roberta Raileanu, Max Goldstein, Arthur Szlam, Facebook Rob Fergus

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

Beyond Goldfish Memory: Long-Term Open-Domain Conversation

no code implementations15 Jul 2021 Jing Xu, Arthur Szlam, Jason Weston

Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context.

Hash Layers For Large Sparse Models

no code implementations8 Jun 2021 Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.

Language Modelling

Not All Memories are Created Equal: Learning to Forget by Expiring

1 code implementation13 May 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.

Language Modelling

droidlet: modular, heterogenous, multi-modal agents

1 code implementation25 Jan 2021 Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale.

Not All Memories are Created Equal: Learning to Expire

1 code implementation1 Jan 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.

Language Modelling

Linguistic calibration through metacognition: aligning dialogue agent responses with expected correctness

no code implementations30 Dec 2020 Sabrina J. Mielke, Arthur Szlam, Y-Lan Boureau, Emily Dinan

Open-domain dialogue agents have vastly improved, but still confidently hallucinate knowledge or express doubt when asked straightforward questions.

Few-shot Sequence Learning with Transformers

no code implementations17 Dec 2020 Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.

Few-Shot Learning

CURI: A Benchmark for Productive Concept Learning Under Uncertainty

1 code implementation6 Oct 2020 Ramakrishna Vedantam, Arthur Szlam, Maximilian Nickel, Ari Morcos, Brenden Lake

Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head").

Meta-Learning Systematic Generalization

Deploying Lifelong Open-Domain Dialogue Learning

no code implementations18 Aug 2020 Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston

As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (Silver et al., 2013).

Fast Adaptation via Policy-Dynamics Value Functions

1 code implementation6 Jul 2020 Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

no code implementations22 Jun 2020 Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet.

Continual Learning

Residual Energy-Based Models for Text Generation

1 code implementation ICLR 2020 Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.

Language Modelling Machine Translation +2

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

no code implementations10 Apr 2020 Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.

Residual Energy-Based Models for Text

no code implementations6 Apr 2020 Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.

Generating Interactive Worlds with Text

no code implementations20 Nov 2019 Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston

We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.

Common Sense Reasoning

Why Build an Assistant in Minecraft?

1 code implementation22 Jul 2019 Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

Minecraft Natural Language Understanding

CraftAssist: A Framework for Dialogue-enabled Interactive Agents

3 code implementations19 Jul 2019 Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam

This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions.


Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

no code implementations ICLR 2019 Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time.

CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

no code implementations17 Apr 2019 Yacine Jernite, Kavya Srinet, Jonathan Gray, Arthur Szlam

We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft.

Minecraft Semantic Parsing

Learning to Speak and Act in a Fantasy Text Adventure Game

no code implementations IJCNLP 2019 Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

Planning with Arithmetic and Geometric Attributes

no code implementations6 Sep 2018 David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna

A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.

Lightweight Adaptive Mixture of Neural and N-gram Language Models

no code implementations20 Apr 2018 Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.

Language Modelling

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

1 code implementation ICML 2018 Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility.

Multi-agent Reinforcement Learning

Personalizing Dialogue Agents: I have a dog, do you have pets too?

9 code implementations ACL 2018 Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating.

Dialogue Generation

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

no code implementations ICLR 2018 Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment.

Grounded language learning

Optimizing the Latent Space of Generative Networks

5 code implementations ICML 2018 Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images.

Low-shot learning with large-scale diffusion

1 code implementation CVPR 2018 Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time.

graph construction

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

no code implementations CVPR 2017 Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks.

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

3 code implementations ICLR 2018 Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

Training Language Models Using Target-Propagation

1 code implementation15 Feb 2017 Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.

Automatic Rule Extraction from Long Short Term Memory Networks

no code implementations8 Feb 2017 W. James Murdoch, Arthur Szlam

Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear.

Question Answering Sentiment Analysis

Transformation-Based Models of Video Sequences

no code implementations29 Jan 2017 Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

In this work we propose a simple unsupervised approach for next frame prediction in video.

Tracking the World State with Recurrent Entity Networks

3 code implementations12 Dec 2016 Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann Lecun

The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting.

Question Answering

The Product Cut

1 code implementation NeurIPS 2016 Thomas Laurent, James Von Brecht, Xavier Bresson, Arthur Szlam

We introduce a theoretical and algorithmic framework for multi-way graph partitioning that relies on a multiplicative cut-based objective.

graph partitioning

Geometric deep learning: going beyond Euclidean data

no code implementations24 Nov 2016 Michael M. Bronstein, Joan Bruna, Yann Lecun, Arthur Szlam, Pierre Vandergheynst

In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques.

Recurrent Orthogonal Networks and Long-Memory Tasks

1 code implementation22 Feb 2016 Mikael Henaff, Arthur Szlam, Yann Lecun

Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research.

MazeBase: A Sandbox for Learning from Games

2 code implementations23 Nov 2015 Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning.


Convolutional networks and learning invariant to homogeneous multiplicative scalings

no code implementations26 Jun 2015 Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation.

Classification General Classification

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

1 code implementation18 Jun 2015 Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

In this paper we introduce a generative parametric model capable of producing high quality samples of natural images.

A mathematical motivation for complex-valued convolutional networks

no code implementations11 Mar 2015 Joan Bruna, Soumith Chintala, Yann Lecun, Serkan Piantino, Arthur Szlam, Mark Tygert

Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.

An Incremental Reseeding Strategy for Clustering

no code implementations15 Jun 2014 Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James Von Brecht

In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning.

graph partitioning

Better Feature Tracking Through Subspace Constraints

no code implementations CVPR 2014 Bryan Poling, Gilad Lerman, Arthur Szlam

Our approach does not require direct modeling of the structure or the motion of the scene, and runs in real time on a single CPU core.

Spectral Networks and Locally Connected Networks on Graphs

4 code implementations21 Dec 2013 Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann Lecun

Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain.


Unsupervised Feature Learning by Deep Sparse Coding

no code implementations20 Dec 2013 Yunlong He, Koray Kavukcuoglu, Yun Wang, Arthur Szlam, Yanjun Qi

In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks.

Object Recognition

Signal Recovery from Pooling Representations

no code implementations16 Nov 2013 Joan Bruna, Arthur Szlam, Yann Lecun

In this work we compute lower Lipschitz bounds of $\ell_p$ pooling operators for $p=1, 2, \infty$ as well as $\ell_p$ pooling operators preceded by half-rectification layers.


Cannot find the paper you are looking for? You can Submit a new open access paper.