Search Results for author: Arthur Szlam

Found 82 papers, 36 papers with code

Signal Recovery from Pooling Representations

no code implementations16 Nov 2013 Joan Bruna, Arthur Szlam, Yann Lecun

In this work we compute lower Lipschitz bounds of $\ell_p$ pooling operators for $p=1, 2, \infty$ as well as $\ell_p$ pooling operators preceded by half-rectification layers.

regression

Unsupervised Feature Learning by Deep Sparse Coding

no code implementations20 Dec 2013 Yunlong He, Koray Kavukcuoglu, Yun Wang, Arthur Szlam, Yanjun Qi

In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks.

Object Recognition

Spectral Networks and Locally Connected Networks on Graphs

4 code implementations21 Dec 2013 Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann Lecun

Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain.

Clustering Translation

Better Feature Tracking Through Subspace Constraints

no code implementations CVPR 2014 Bryan Poling, Gilad Lerman, Arthur Szlam

Our approach does not require direct modeling of the structure or the motion of the scene, and runs in real time on a single CPU core.

An Incremental Reseeding Strategy for Clustering

no code implementations15 Jun 2014 Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James Von Brecht

In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning.

Clustering graph partitioning

A mathematical motivation for complex-valued convolutional networks

no code implementations11 Mar 2015 Joan Bruna, Soumith Chintala, Yann Lecun, Serkan Piantino, Arthur Szlam, Mark Tygert

Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

1 code implementation18 Jun 2015 Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

In this paper we introduce a generative parametric model capable of producing high quality samples of natural images.

Convolutional networks and learning invariant to homogeneous multiplicative scalings

no code implementations26 Jun 2015 Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation.

Classification General Classification +1

MazeBase: A Sandbox for Learning from Games

2 code implementations23 Nov 2015 Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning.

Negation Reinforcement Learning (RL) +1

Recurrent Orthogonal Networks and Long-Memory Tasks

1 code implementation22 Feb 2016 Mikael Henaff, Arthur Szlam, Yann Lecun

Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research.

Geometric deep learning: going beyond Euclidean data

no code implementations24 Nov 2016 Michael M. Bronstein, Joan Bruna, Yann Lecun, Arthur Szlam, Pierre Vandergheynst

In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques.

The Product Cut

1 code implementation NeurIPS 2016 Thomas Laurent, James Von Brecht, Xavier Bresson, Arthur Szlam

We introduce a theoretical and algorithmic framework for multi-way graph partitioning that relies on a multiplicative cut-based objective.

graph partitioning

Tracking the World State with Recurrent Entity Networks

5 code implementations12 Dec 2016 Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann Lecun

The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting.

Procedural Text Understanding Question Answering

Transformation-Based Models of Video Sequences

no code implementations29 Jan 2017 Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

In this work we propose a simple unsupervised approach for next frame prediction in video.

Automatic Rule Extraction from Long Short Term Memory Networks

no code implementations8 Feb 2017 W. James Murdoch, Arthur Szlam

Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear.

Question Answering Sentiment Analysis

Training Language Models Using Target-Propagation

1 code implementation15 Feb 2017 Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps.

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

3 code implementations ICLR 2018 Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

no code implementations CVPR 2017 Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks.

Optimizing the Latent Space of Generative Networks

6 code implementations ICML 2018 Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images.

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

no code implementations ICLR 2018 Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment.

Grounded language learning

Personalizing Dialogue Agents: I have a dog, do you have pets too?

15 code implementations ACL 2018 Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating.

Ranked #5 on Dialogue Generation on Persona-Chat (using extra training data)

Conversational Response Selection Dialogue Generation +1

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

1 code implementation ICML 2018 Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility.

Multi-agent Reinforcement Learning reinforcement-learning +1

Lightweight Adaptive Mixture of Neural and N-gram Language Models

no code implementations20 Apr 2018 Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

It is often the case that the best performing language model is an ensemble of a neural language model with n-grams.

Language Modelling

Planning with Arithmetic and Geometric Attributes

no code implementations6 Sep 2018 David Folqué, Sainbayar Sukhbaatar, Arthur Szlam, Joan Bruna

A desirable property of an intelligent agent is its ability to understand its environment to quickly generalize to novel tasks and compose simpler tasks into more complex ones.

GenEval: A Benchmark Suite for Evaluating Generative Models

no code implementations27 Sep 2018 Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we aim at addressing this problem by introducing a new benchmark evaluation suite, dubbed \textit{GenEval}.

Learning to Speak and Act in a Fantasy Text Adventure Game

1 code implementation IJCNLP 2019 Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

Retrieval

CraftAssist Instruction Parsing: Semantic Parsing for a Minecraft Assistant

no code implementations17 Apr 2019 Yacine Jernite, Kavya Srinet, Jonathan Gray, Arthur Szlam

We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft.

Semantic Parsing

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

no code implementations ICLR 2019 Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time.

CraftAssist: A Framework for Dialogue-enabled Interactive Agents

3 code implementations19 Jul 2019 Jonathan Gray, Kavya Srinet, Yacine Jernite, Haonan Yu, Zhuoyuan Chen, Demi Guo, Siddharth Goyal, C. Lawrence Zitnick, Arthur Szlam

This paper describes an implementation of a bot assistant in Minecraft, and the tools and platform allowing players to interact with the bot and to record those interactions.

Why Build an Assistant in Minecraft?

1 code implementation22 Jul 2019 Arthur Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe Kiela, Haonan Yu, Zhuoyuan Chen, Siddharth Goyal, Demi Guo, Danielle Rothermel, C. Lawrence Zitnick, Jason Weston

In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

Natural Language Understanding

Agent as Scientist: Learning to Verify Hypotheses

no code implementations25 Sep 2019 Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta

In order to train the agents, we exploit the underlying structure in the majority of hypotheses -- they can be formulated as triplets (pre-condition, action sequence, post-condition).

Generating Interactive Worlds with Text

no code implementations20 Nov 2019 Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston

We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.

BIG-bench Machine Learning Common Sense Reasoning

Residual Energy-Based Models for Text

no code implementations6 Apr 2020 Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

Current large-scale auto-regressive language models display impressive fluency and can generate convincing text.

Learning to Visually Navigate in Photorealistic Environments Without any Supervision

no code implementations10 Apr 2020 Lina Mezghani, Sainbayar Sukhbaatar, Arthur Szlam, Armand Joulin, Piotr Bojanowski

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training.

Navigate Position

Residual Energy-Based Models for Text Generation

1 code implementation ICLR 2020 Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level.

Language Modelling Machine Translation +2

Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

no code implementations22 Jun 2020 Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff, Pratik Ringshia, Kurt Shuster, Eric Michael Smith, Arthur Szlam, Jack Urbanek, Mary Williamson

We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet.

Continual Learning

CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant

no code implementations ACL 2020 Kavya Srinet, Yacine Jernite, Jonathan Gray, Arthur Szlam

We propose a semantic parsing dataset focused on instruction-driven communication with an agent in the game Minecraft.

Semantic Parsing

Fast Adaptation via Policy-Dynamics Value Functions

1 code implementation6 Jul 2020 Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

Deploying Lifelong Open-Domain Dialogue Learning

no code implementations18 Aug 2020 Kurt Shuster, Jack Urbanek, Emily Dinan, Arthur Szlam, Jason Weston

As argued in de Vries et al. (2020), crowdsourced data has the issues of lack of naturalness and relevance to real-world use cases, while the static dataset paradigm does not allow for a model to learn from its experiences of using language (Silver et al., 2013).

CURI: A Benchmark for Productive Concept Learning Under Uncertainty

1 code implementation6 Oct 2020 Ramakrishna Vedantam, Arthur Szlam, Maximilian Nickel, Ari Morcos, Brenden Lake

Humans can learn and reason under substantial uncertainty in a space of infinitely many concepts, including structured relational concepts ("a scene with objects that have the same color") and ad-hoc categories defined through goals ("objects that could fall on one's head").

Meta-Learning Systematic Generalization

Few-shot Sequence Learning with Transformers

no code implementations17 Dec 2020 Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples.

Few-Shot Learning

Reducing conversational agents' overconfidence through linguistic calibration

no code implementations30 Dec 2020 Sabrina J. Mielke, Arthur Szlam, Emily Dinan, Y-Lan Boureau

While improving neural dialogue agents' factual accuracy is the object of much research, another important aspect of communication, less studied in the setting of neural dialogue, is transparency about ignorance.

Not All Memories are Created Equal: Learning to Expire

1 code implementation1 Jan 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason E Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve state of the art results on long-context language modeling, reinforcement learning, and algorithmic tasks.

Language Modelling

droidlet: modular, heterogenous, multi-modal agents

1 code implementation25 Jan 2021 Anurag Pratik, Soumith Chintala, Kavya Srinet, Dhiraj Gandhi, Rebecca Qian, Yuxuan Sun, Ryan Drew, Sara Elkafrawy, Anoushka Tiwari, Tucker Hart, Mary Williamson, Abhinav Gupta, Arthur Szlam

In recent years, there have been significant advances in building end-to-end Machine Learning (ML) systems that learn at scale.

Not All Memories are Created Equal: Learning to Forget by Expiring

1 code implementation13 May 2021 Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, Jason Weston, Angela Fan

We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality.

Language Modelling

Hash Layers For Large Sparse Models

no code implementations NeurIPS 2021 Stephen Roller, Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston

We investigate the training of sparse layers that use different parameters for different inputs based on hashing in large Transformer models.

Language Modelling

Beyond Goldfish Memory: Long-Term Open-Domain Conversation

no code implementations ACL 2022 Jing Xu, Arthur Szlam, Jason Weston

Despite recent improvements in open-domain dialogue models, state of the art models are trained and evaluated on short conversations with little context.

Retrieval

Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity

no code implementations Findings (NAACL) 2022 Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston

State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction.

Can I see an Example? Active Learning the Long Tail of Attributes and Relations

no code implementations11 Mar 2022 Tyler L. Hayes, Maximilian Nickel, Christopher Kanan, Ludovic Denoyer, Arthur Szlam

Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.

Active Learning

Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion

1 code implementation24 Mar 2022 Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston

We show that, when using SeeKeR as a dialogue model, it outperforms the state-of-the-art model BlenderBot 2 (Chen et al., 2021) on open-domain knowledge-grounded conversations for the same number of parameters, in terms of consistency, knowledge and per-turn engagingness.

Language Modelling Retrieval

Many Episode Learning in a Modular Embodied Agent via End-to-End Interaction

no code implementations19 Apr 2022 Yuxuan Sun, Ethan Carlson, Rebecca Qian, Kavya Srinet, Arthur Szlam

In this work we give a case study of an embodied machine-learning (ML) powered agent that improves itself via interactions with crowd-workers.

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

2 code implementations5 Aug 2022 Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks.

Continual Learning

CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

2 code implementations11 Oct 2022 Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam

We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization.

Segmentation Semantic Segmentation +1

Infusing Commonsense World Models with Graph Knowledge

no code implementations13 Jan 2023 Alexander Gurung, Mojtaba Komeili, Arthur Szlam, Jason Weston, Jack Urbanek

While language models have become more capable of producing compelling language, we find there are still gaps in maintaining consistency, especially when describing events in a dynamically changing world.

Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models

no code implementations26 Apr 2023 Jimmy Wei, Kurt Shuster, Arthur Szlam, Jason Weston, Jack Urbanek, Mojtaba Komeili

We compare models trained on our new dataset to existing pairwise-trained dialogue models, as well as large language models with few-shot prompting.

A Data Source for Reasoning Embodied Agents

1 code implementation14 Sep 2023 Jack Lanchantin, Sainbayar Sukhbaatar, Gabriel Synnaeve, Yuxuan Sun, Kavya Srinet, Arthur Szlam

In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent.

DiLoCo: Distributed Low-Communication Training of Language Models

no code implementations14 Nov 2023 Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected.

Distributed Optimization

Asynchronous Local-SGD Training for Language Modeling

1 code implementation17 Jan 2024 Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.

Distributed Optimization Language Modelling

Fast Adaptation to New Environments via Policy-Dynamics Value Functions

no code implementations ICML 2020 Roberta Raileanu, Max Goldstein, Arthur Szlam, Facebook Rob Fergus

An ensemble of conventional RL policies is used to gather experience on training environments, from which embeddings of both policies and environments can be learned.

Cannot find the paper you are looking for? You can Submit a new open access paper.