Search Results for author: Caglar Gulcehre

Found 71 papers, 39 papers with code

Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

no code implementations7 Apr 2025 Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre

Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years.

Combinatorial Optimization reinforcement-learning +2

Context-Aware Toxicity Detection in Multiplayer Games: Integrating Domain-Adaptive Pretraining and Match Metadata

1 code implementation2 Apr 2025 Adrien Schurger-Foy, Rafal Dariusz Kocielnik, Caglar Gulcehre, R. Michael Alvarez

The detrimental effects of toxicity in competitive online video games are widely acknowledged, prompting publishers to monitor player chat conversations.

Dota 2

From Markov to Laplace: How Mamba In-Context Learns Markov Chains

1 code implementation14 Feb 2025 Marco Bondaschi, Nived Rajaraman, Xiuying Wei, Kannan Ramchandran, Razvan Pascanu, Caglar Gulcehre, Michael Gastpar, Ashok Vardhan Makkuva

To explain this, we theoretically characterize the representation capacity of Mamba and reveal the fundamental role of convolution in enabling it to represent the optimal Laplacian smoothing.

In-Context Learning Language Modeling +2

Optimizing LLM Inference for Database Systems: Cost-Aware Scheduling for Concurrent Requests

no code implementations12 Nov 2024 Kyoungmin Kim, Kijae Hong, Caglar Gulcehre, Anastasia Ailamaki

LLMs are increasingly used inside database systems and in database applications for better complexity management and decision-making, where LLM inferences require significant GPU costs.

Decision Making Management +1

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

1 code implementation28 Oct 2024 Viacheslav Surkov, Chris Wendler, Mikhail Terekhov, Justin Deschenaux, Robert West, Caglar Gulcehre

We investigated the possibility of using SAEs to learn interpretable features for a few-step text-to-image diffusion models, such as SDXL Turbo.

Denoising

Beyond Autoregression: Fast LLMs via Self-Distillation Through Time

1 code implementation28 Oct 2024 Justin Deschenaux, Caglar Gulcehre

Moreover, we demonstrate the efficacy of our approach for diffusion language models with up to 860M parameters.

Automated Theorem Proving Code Generation +2

SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning

1 code implementation24 Oct 2024 Shivam Adarsh, Kumar Shridhar, Caglar Gulcehre, Nicholas Monath, Mrinmaya Sachan

While LLMs can accurately solve reasoning tasks through a variety of strategies, even without fine-tuning, smaller models are not expressive enough to fit the LLMs distribution on all strategies when distilled and tend to prioritize one strategy over the others.

Knowledge Distillation Mathematical Reasoning

The Role of Deep Learning Regularizations on Actors in Offline RL

1 code implementation11 Sep 2024 Denis Tarasov, Anja Surina, Caglar Gulcehre

Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities.

D4RL Offline RL +1

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

no code implementations23 Jul 2024 Mikhail Terekhov, Caglar Gulcehre

Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices.

Multi-Objective Reinforcement Learning Q-Learning

Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

1 code implementation13 Jul 2024 Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre

State-of-the-art LLMs often rely on scale with high computational costs, which has sparked a research agenda to reduce parameter counts and costs without significantly impacting performance.

HiPPO-Prophecy: State-Space Models can Provably Learn Dynamical Systems in Context

no code implementations12 Jul 2024 Federico Arangath Joseph, Kilian Konstantin Haefeli, Noah Liniger, Caglar Gulcehre

Specifically, we find an explicit weight construction for continuous SSMs and provide an asymptotic error bound on the derivative approximation.

In-Context Learning State Space Models

Self-Recognition in Language Models

1 code implementation9 Jul 2024 Tim R. Davidson, Viacheslav Surkov, Veniamin Veselovsky, Giuseppe Russo, Robert West, Caglar Gulcehre

Instead, our results suggest that given a set of alternatives, LMs seek to pick the "best" answer, regardless of its origin.

Multiple-choice

Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers

1 code implementation24 Jun 2024 Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre

State-of-the-art results in large language models (LLMs) often rely on scale, which becomes computationally expensive.

Aligning Large Language Models with Diverse Political Viewpoints

1 code implementation20 Jun 2024 Dominik Stammbach, Philine Widmer, Eunjung Cho, Caglar Gulcehre, Elliott Ash

Models aligned with this data can generate more accurate political viewpoints from Swiss parties, compared to commercial models such as ChatGPT.

Promises, Outlooks and Challenges of Diffusion Language Modeling

no code implementations17 Jun 2024 Justin Deschenaux, Caglar Gulcehre

The modern autoregressive Large Language Models (LLMs) have achieved outstanding performance on NLP benchmarks, and they are deployed in the real world.

ARC HellaSwag +3

Fleet of Agents: Coordinated Problem Solving with Large Language Models using Genetic Particle Filtering

no code implementations7 May 2024 Akhil Arora, Lars Klein, Nearchos Potamitis, Roland Aydin, Caglar Gulcehre, Robert West

Large language models (LLMs) have significantly evolved, moving from simple output generation to complex reasoning and from stand-alone usage to being embedded into broader frameworks.

Navigate

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

1 code implementation1 May 2024 Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar Gulcehre

For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss.

MuJoCo Reinforcement Learning (RL)

Simple Hierarchical Planning with Diffusion

no code implementations5 Jan 2024 Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.

An Empirical Study of Implicit Regularization in Deep Offline RL

no code implementations5 Jul 2022 Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet

Also, we empirically identify three phases of learning that explain the impact of implicit regularization on the learning dynamics and found that bootstrapping alone is insufficient to explain the collapse of the effective rank.

Offline RL

Active Offline Policy Selection

1 code implementation NeurIPS 2021 Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas

We use multiple benchmarks, including real-world robotics, with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.

Bayesian Optimization Off-policy evaluation

On Instrumental Variable Regression for Deep Offline Policy Evaluation

1 code implementation21 May 2021 Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques.

regression Reinforcement Learning (RL)

Regularized Behavior Value Estimation

no code implementations17 Mar 2021 Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

Due to bootstrapping, these errors get amplified during training and can lead to divergence, thereby crippling learning.

Offline RL

Addressing Extrapolation Error in Deep Offline Reinforcement Learning

no code implementations1 Jan 2021 Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, Nando de Freitas

These errors can be compounded by bootstrapping when the function approximator overestimates, leading the value function to *grow unbounded*, thereby crippling learning.

Offline RL reinforcement-learning +2

RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning

1 code implementation NeurIPS 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Offline RL reinforcement-learning +2

Offline Learning from Demonstrations and Unlabeled Experience

no code implementations27 Nov 2020 Konrad Zolna, Alexander Novikov, Ksenia Konyushkova, Caglar Gulcehre, Ziyu Wang, Yusuf Aytar, Misha Denil, Nando de Freitas, Scott Reed

Behavior cloning (BC) is often practical for robot learning because it allows a policy to be trained offline without rewards, by supervised learning on expert demonstrations.

continuous-control Continuous Control +1

Hyperparameter Selection for Offline Reinforcement Learning

no code implementations17 Jul 2020 Tom Le Paine, Cosmin Paduraru, Andrea Michi, Caglar Gulcehre, Konrad Zolna, Alexander Novikov, Ziyu Wang, Nando de Freitas

Therefore, in this work, we focus on \textit{offline hyperparameter selection}, i. e. methods for choosing the best policy from a set of many policies trained using different hyperparameters, given only logged data.

Offline RL reinforcement-learning +2

Critic Regularized Regression

5 code implementations NeurIPS 2020 Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL regression +1

Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver

no code implementations25 Jun 2020 Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019.

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations24 Jun 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +4

Stabilizing Transformers for Reinforcement Learning

5 code implementations ICML 2020 Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

Harnessing the transformer's ability to process long time horizons of information could provide a similar performance boost in partially observable reinforcement learning (RL) domains, but the large-scale transformers used in NLP have yet to be successfully applied to the RL setting.

General Reinforcement Learning Language Modeling +6

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

1 code implementation ICLR 2020 Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

counterfactual Counterfactual Reasoning +3

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

counterfactual Counterfactual Reasoning +4

Hyperbolic Attention Networks

no code implementations ICLR 2019 Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure.

Machine Translation Question Answering +2

Memory Augmented Neural Networks for Natural Language Processing

no code implementations EMNLP 2017 Caglar Gulcehre, Ch, Sarath ar

We will present a unified architecture for Memory Augmented Neural Networks (MANN) and discuss the ways in which one can address the external memory and hence read/write from it.

AI Agent Language Modeling +3

Plan, Attend, Generate: Character-level Neural Machine Translation with Planning in the Decoder

1 code implementation13 Jun 2017 Caglar Gulcehre, Francis Dutil, Adam Trischler, Yoshua Bengio

We investigate the integration of a planning mechanism into an encoder-decoder architecture with an explicit alignment for character-level machine translation.

Decoder Machine Translation +1

Gated Orthogonal Recurrent Units: On Learning to Forget

1 code implementation8 Jun 2017 Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

We present a novel recurrent neural network (RNN) based model that combines the remembering ability of unitary RNNs with the ability of gated RNNs to effectively forget redundant/irrelevant information in its memory.

Ranked #7 on Question Answering on bAbi (Accuracy (trained on 1k) metric)

Denoising Question Answering

A Robust Adaptive Stochastic Gradient Method for Deep Learning

1 code implementation2 Mar 2017 Caglar Gulcehre, Jose Sotelo, Marcin Moczulski, Yoshua Bengio

The information about the element-wise curvature of the loss function is estimated from the local statistics of the stochastic first order gradients.

Deep Learning

Memory Augmented Neural Networks with Wormhole Connections

no code implementations30 Jan 2017 Caglar Gulcehre, Sarath Chandar, Yoshua Bengio

We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences.

Mollifying Networks

no code implementations17 Aug 2016 Caglar Gulcehre, Marcin Moczulski, Francesco Visin, Yoshua Bengio

The optimization of deep neural networks can be more challenging than traditional convex optimization problems due to the highly non-convex nature of the loss function, e. g. it can involve pathological landscapes such as saddle-surfaces that can be difficult to escape for algorithms based on simple gradient descent.

Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes

no code implementations30 Jun 2016 Caglar Gulcehre, Sarath Chandar, Kyunghyun Cho, Yoshua Bengio

We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRUcontroller.

Natural Language Inference Question Answering

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation9 May 2016 The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

Pointing the Unknown Words

no code implementations ACL 2016 Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Bo-Wen Zhou, Yoshua Bengio

At each time-step, the decision of which softmax layer to use choose adaptively made by an MLP which is conditioned on the context.~We motivate our work from a psychological evidence that humans naturally have a tendency to point towards objects in the context or the environment when the name of an object is not known.~We observe improvements on two tasks, neural machine translation on the Europarl English to French parallel corpora and text summarization on the Gigaword dataset using our proposed model.

Machine Translation Sentence +2

Noisy Activation Functions

1 code implementation1 Mar 2016 Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only).

Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

4 code implementations CONLL 2016 Ramesh Nallapati, Bo-Wen Zhou, Cicero Nogueira dos santos, Caglar Gulcehre, Bing Xiang

In this work, we model abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora.

Abstractive Text Summarization Decoder +3

Policy Distillation

1 code implementation19 Nov 2015 Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance.

Deep Reinforcement Learning reinforcement-learning +1

On Using Monolingual Corpora in Neural Machine Translation

no code implementations11 Mar 2015 Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Recent work on end-to-end neural network-based architectures for machine translation has shown promising results for En-Fr and En-De translation.

de-en Machine Translation +1

ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient

no code implementations23 Dec 2014 Caglar Gulcehre, Marcin Moczulski, Yoshua Bengio

The convergence of SGD depends on the careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations NeurIPS 2014 Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

How to Construct Deep Recurrent Neural Networks

no code implementations20 Dec 2013 Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996).

Language Modeling Language Modelling

Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks

no code implementations7 Nov 2013 Caglar Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio

In this paper we propose and investigate a novel nonlinear unit, called $L_p$ unit, for deep neural networks.

Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.