Search Results for author: Caglar Gulcehre

Found 52 papers, 27 papers with code

Simple Hierarchical Planning with Diffusion

no code implementations5 Jan 2024 Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

no code implementations21 Jul 2023 Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.

Computational Efficiency Position

An Empirical Study of Implicit Regularization in Deep Offline RL

no code implementations5 Jul 2022 Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet

Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank.

Offline RL

Active Offline Policy Selection

1 code implementation NeurIPS 2021 Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.

Bayesian Optimization Off-policy evaluation

On Instrumental Variable Regression for Deep Offline Policy Evaluation

1 code implementation21 May 2021 Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

regression Reinforcement Learning (RL)

Regularized Behavior Value Estimation

no code implementations17 Mar 2021 Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning.

Offline RL

Addressing Extrapolation Error in Deep Offline Reinforcement Learning

no code implementations1 Jan 2021 Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning.

Offline RL reinforcement-learning +1

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning

1 code implementation NeurIPS 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Offline RL reinforcement-learning +1

Offline Learning from Demonstrations and Unlabeled Experience

no code implementations27 Nov 2020 Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.

Continuous Control Imitation Learning

Hyperparameter Selection for Offline Reinforcement Learning

no code implementations17 Jul 2020 Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

Therefore, in this work, we focus on \textit{offline hyperparameter selection}, i. e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data.

Offline RL reinforcement-learning +1

Critic Regularized Regression

5 code implementations NeurIPS 2020 Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL regression +1

Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver

no code implementations25 Jun 2020 Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019.

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations24 Jun 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +3

Stabilizing Transformers for Reinforcement Learning

5 code implementations ICML 2020 Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting.

General Reinforcement Learning Language Modelling +4

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

1 code implementation ICLR 2020 Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

counterfactual Counterfactual Reasoning +2

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

counterfactual Counterfactual Reasoning +3

Hyperbolic Attention Networks

no code implementations ICLR 2019 Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure.

Machine Translation Question Answering +2

Memory Augmented Neural Networks for Natural Language Processing

no code implementations EMNLP 2017 Caglar Gulcehre, Ch, Sarath ar

We will present a unified architecture for Memory Augmented Neural Networks (MANN) and discuss the ways in which one can address the external memory and hence read/write from it.

Language Modelling Question Answering +1

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

1 code implementation13 Jun 2017 Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.

Machine Translation Translation

Gated Orthogonal Recurrent Units: On Learning to Forget

1 code implementation8 Jun 2017 Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.

Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)

Denoising Question Answering

A Robust Adaptive Stochastic Gradient Method for Deep Learning

1 code implementation2 Mar 2017 Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.

Memory Augmented Neural Networks with Wormhole Connections

no code implementations30 Jan 2017 Caglar Gulcehre, Sarath Chandar, Yoshua Bengio

We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences.

Mollifying Networks

no code implementations17 Aug 2016 Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.

Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes

no code implementations30 Jun 2016 Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio

We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller.

Natural Language Inference Question Answering

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation9 May 2016 The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

Pointing the Unknown Words

no code implementations ACL 2016 Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bo-Wen Zhou, Yoshua Bengio

At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.

Machine Translation Sentence +2

Noisy Activation Functions

1 code implementation1 Mar 2016 Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

4 code implementations CONLL 2016 Ramesh Nallapati, Bo-Wen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang

In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora.

Abstractive Text Summarization Sentence +2

Policy Distillation

1 code implementation19 Nov 2015 Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance.

reinforcement-learning Reinforcement Learning (RL)

On Using Monolingual Corpora in Neural Machine Translation

no code implementations11 Mar 2015 Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.

Machine Translation Translation

Gated Feedback Recurrent Neural Networks

no code implementations9 Feb 2015 Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

In this work, we propose a novel recurrent neural network (RNN) architecture.

Language Modelling

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

no code implementations23 Dec 2014 Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations NeurIPS 2014 Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

How to Construct Deep Recurrent Neural Networks

no code implementations20 Dec 2013 Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996).

Language Modelling

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

no code implementations7 Nov 2013 Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio

In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks.

Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.