no code implementations • 7 Apr 2025 • Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre
Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years.
1 code implementation • 2 Apr 2025 • Adrien Schurger-Foy, Rafal Dariusz Kocielnik, Caglar Gulcehre, R. Michael Alvarez
The detrimental effects of toxicity in competitive online video games are widely acknowledged, prompting publishers to monitor player chat conversations.
1 code implementation • 14 Feb 2025 • Marco Bondaschi, Nived Rajaraman, Xiuying Wei, Kannan Ramchandran, Razvan Pascanu, Caglar Gulcehre, Michael Gastpar, Ashok Vardhan Makkuva
To explain this, we theoretically characterize the representation capacity of Mamba and reveal the fundamental role of convolution in enabling it to represent the optimal Laplacian smoothing.
no code implementations • 4 Feb 2025 • Daniil Karzanov, Rubén Garzón, Mikhail Terekhov, Caglar Gulcehre, Thomas Raffinot, Marcin Detyniecki
This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO).
no code implementations • 12 Nov 2024 • Kyoungmin Kim, Kijae Hong, Caglar Gulcehre, Anastasia Ailamaki
LLMs are increasingly used inside database systems and in database applications for better complexity management and decision-making, where LLM inferences require significant GPU costs.
1 code implementation • 28 Oct 2024 • Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre
We investigated the possibility of using SAEs to learn interpretable features for a few-step text-to-image diffusion models, such as SDXL Turbo.
1 code implementation • 28 Oct 2024 • Justin Deschenaux, Caglar Gulcehre
Moreover, we demonstrate the efficacy of our approach for diffusion language models with up to 860M parameters.
1 code implementation • 24 Oct 2024 • Shivam Adarsh, Kumar Shridhar, Caglar Gulcehre, Nicholas Monath, Mrinmaya Sachan
While LLMs can accurately solve reasoning tasks through a variety of strategies, even without fine-tuning, smaller models are not expressive enough to fit the LLMs distribution on all strategies when distilled and tend to prioritize one strategy over the others.
1 code implementation • 11 Sep 2024 • Denis Tarasov, Anja Surina, Caglar Gulcehre
Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities.
no code implementations • 23 Jul 2024 • Mikhail Terekhov, Caglar Gulcehre
Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices.
1 code implementation • 13 Jul 2024 • Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre
State-of-the-art LLMs often rely on scale with high computational costs, which has sparked a research agenda to reduce parameter counts and costs without significantly impacting performance.
no code implementations • 12 Jul 2024 • Federico Arangath Joseph, Kilian Konstantin Haefeli, Noah Liniger, Caglar Gulcehre
Specifically, we find an explicit weight construction for continuous SSMs and provide an asymptotic error bound on the derivative approximation.
1 code implementation • 9 Jul 2024 • Tim R. Davidson, Viacheslav Surkov, Veniamin Veselovsky, Giuseppe Russo, Robert West, Caglar Gulcehre
Instead, our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin.
1 code implementation • 24 Jun 2024 • Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre
State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive.
1 code implementation • 20 Jun 2024 • Dominik Stammbach, Philine Widmer, Eunjung Cho, Caglar Gulcehre, Elliott Ash
Models aligned with this data can generate more accurate political viewpoints from Swiss parties, compared to commercial models such as ChatGPT.
no code implementations • 17 Jun 2024 • Justin Deschenaux, Caglar Gulcehre
The modern autoregressive Large Language Models (LLMs) have achieved outstanding performance on NLP benchmarks, and they are deployed in the real world.
1 code implementation • 10 Jun 2024 • Chang Chen, Junyeob Baek, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn
At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals.
no code implementations • 7 May 2024 • Akhil Arora, Lars Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West
Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks.
1 code implementation • 1 May 2024 • Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar Gulcehre
For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss.
3 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.
no code implementations • 5 Jan 2024 • Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
no code implementations • 17 Aug 2023 • Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas
Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences.
1 code implementation • 7 Aug 2023 • Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution.
no code implementations • 21 Jul 2023 • Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith
Deep neural networks based on linear RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches for sequence modeling.
11 code implementations • 11 Mar 2023 • Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train.
no code implementations • 5 Jul 2022 • Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet
Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank.
1 code implementation • NeurIPS 2021 • Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas
We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.
1 code implementation • 21 May 2021 • Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet
By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.
no code implementations • 17 Mar 2021 • Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas
Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning.
no code implementations • 1 Jan 2021 • Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas
These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning.
1 code implementation • NeurIPS 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
no code implementations • 27 Nov 2020 • Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed
Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.
no code implementations • 17 Jul 2020 • Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas
Therefore, in this work, we focus on \textit{offline hyperparameter selection}, i. e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data.
5 code implementations • NeurIPS 2020 • Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas
Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.
no code implementations • 25 Jun 2020 • Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli
Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019.
2 code implementations • 24 Jun 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
5 code implementations • 1 Jun 2020 • Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Abe Friesen, Ruba Haroun, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas
These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research.
1 code implementation • ICML 2020 • Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, Razvan Pascanu
Gating mechanisms are widely used in neural network models, where they allow gradients to backpropagate more easily through depth or time.
5 code implementations • ICML 2020 • Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell
Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting.
1 code implementation • ICLR 2020 • Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team
This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.
no code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.
3 code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.
no code implementations • ICLR 2019 • Yutian Chen, Yannis Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask, Ben Laurie, Caglar Gulcehre, Aäron van den Oord, Oriol Vinyals, Nando de Freitas
Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers.
31 code implementations • 4 Jun 2018 • Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, Razvan Pascanu
As a companion to this paper, we have released an open-source software library for building graph networks, with demonstrations of how to use them in practice.
no code implementations • ICLR 2019 • Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas
We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure.
1 code implementation • NeurIPS 2017 • Francis Dutil, Caglar Gulcehre, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into sequence-to-sequence models using attention.
no code implementations • EMNLP 2017 • Caglar Gulcehre, Ch, Sarath ar
We will present a unified architecture for Memory Augmented Neural Networks (MANN) and discuss the ways in which one can address the external memory and hence read/write from it.
no code implementations • WS 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into an encoder-decoder architecture with attention.
1 code implementation • 13 Jun 2017 • Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio
We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.
1 code implementation • 8 Jun 2017 • Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio
We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.
Ranked #7 on
Question Answering
on bAbi
(Accuracy (trained on 1k) metric)
4 code implementations • WS 2017 • Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler
We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers.
1 code implementation • 2 Mar 2017 • Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio
The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.
no code implementations • 30 Jan 2017 • Caglar Gulcehre, Sarath Chandar, Yoshua Bengio
We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences.
no code implementations • 17 Aug 2016 • Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio
The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.
no code implementations • 30 Jun 2016 • Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio
We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller.
Ranked #5 on
Question Answering
on bAbi
1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang
Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.
no code implementations • ACL 2016 • Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bo-Wen Zhou, Yoshua Bengio
At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.
1 code implementation • ACL 2016 • Iulian Vlad Serban, Alberto García-Durán, Caglar Gulcehre, Sungjin Ahn, Sarath Chandar, Aaron Courville, Yoshua Bengio
Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances.
1 code implementation • 1 Mar 2016 • Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio
Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).
4 code implementations • CONLL 2016 • Ramesh Nallapati, Bo-Wen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang
In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora.
Ranked #10 on
Text Summarization
on DUC 2004 Task 1
1 code implementation • 19 Nov 2015 • Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell
Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance.
no code implementations • 11 Mar 2015 • Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio
Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.
no code implementations • 5 Mar 2015 • Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio
The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies.
no code implementations • 9 Feb 2015 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
In this work, we propose a novel recurrent neural network (RNN) architecture.
no code implementations • 23 Dec 2014 • Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio
The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.
14 code implementations • 11 Dec 2014 • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
In this paper we compare different types of recurrent units in recurrent neural networks (RNNs).
Ranked #10 on
Music Modeling
on JSB Chorales
4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
42 code implementations • 3 Jun 2014 • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).
Ranked #48 on
Machine Translation
on WMT2014 English-French
no code implementations • 20 Dec 2013 • Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio
Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996).
no code implementations • 7 Nov 2013 • Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio
In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks.