Search Results for author: Yuhuai Wu

Found 51 papers, 29 papers with code

OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning

no code implementations • ICML 2020 • Alexander Vezhnevets, Yuhuai Wu, Maria Eckstein, Rémi Leblond, Joel Z. Leibo

This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn't seen during training.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization

1 code implementation • 26 Mar 2024 • Jin Peng Zhou, Charles Staats, Wenda Li, Christian Szegedy, Kilian Q. Weinberger, Yuhuai Wu

Large language models (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems.

Automated Theorem Proving GSM8K +1

Paper
Code

REFACTOR: Learning to Extract Theorems from Proofs

1 code implementation • 26 Feb 2024 • Jin Peng Zhou, Yuhuai Wu, Qiyang Li, Roger Grosse

With newly extracted theorems, we show that the existing proofs in the MetaMath database can be refactored.

Automated Theorem Proving

Paper
Code

Focused Transformer: Contrastive Training for Context Scaling

1 code implementation • NeurIPS 2023 • Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś

This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length.

Contrastive Learning

1,431

Paper
Code

Length Generalization in Arithmetic Transformers

no code implementations • 27 Jun 2023 • Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.

Position

Paper
Add Code

Evaluating Language Models for Mathematics through Interactions

1 code implementation • 2 Jun 2023 • Katherine M. Collins, Albert Q. Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B. Tenenbaum, William Hart, Timothy Gowers, Wenda Li, Adrian Weller, Mateja Jamnik

There is much excitement about the opportunity to harness the power of large language models (LLMs) when building problem-solving assistants.

Language Modelling Mathematical Reasoning +1

Paper
Code

PaLM 2 Technical Report

1 code implementation • 17 May 2023 • Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Ranked #1 on Question Answering on StrategyQA

Code Generation Common Sense Reasoning +6

Paper
Code

Magnushammer: A Transformer-Based Approach to Premise Selection

no code implementations • 8 Mar 2023 • Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu

By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57. 0\%$ to $71. 0\%$ on the PISA benchmark using $4$x fewer parameters.

Automated Theorem Proving Language Modelling +1

Paper
Add Code

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

no code implementations • 18 Nov 2022 • Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse

Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances.

Paper
Add Code

Holistic Evaluation of Language Models

1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda

We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.

Fairness Question Answering

1,640

Paper
Code

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

3 code implementations • 21 Oct 2022 • Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample

In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems.

Ranked #3 on Automated Theorem Proving on miniF2F-valid (Pass@100 metric)

Automated Theorem Proving Language Modelling

Paper
Code

Language Model Cascades

1 code implementation • 21 Jul 2022 • David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-Dickstein, Kevin Murphy, Charles Sutton

Prompted models have demonstrated impressive few-shot learning abilities.

Few-Shot Learning Language Modelling +1

182

Paper
Code

Exploring Length Generalization in Large Language Models

no code implementations • 11 Jul 2022 • Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare.

Automated Theorem Proving In-Context Learning +1

Paper
Add Code

Solving Quantitative Reasoning Problems with Language Models

1 code implementation • 29 Jun 2022 • Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding.

Ranked #5 on Math Word Problem Solving on MATH

Arithmetic Reasoning Language Modelling +4

277

Paper
Code

Insights into Pre-training via Simpler Synthetic Tasks

1 code implementation • 21 Jun 2022 • Yuhuai Wu, Felix Li, Percy Liang

Second, to our surprise, we find that pre-training on a simple and generic synthetic task defined by the Set function achieves $65\%$ of the benefits, almost matching LIME.

Paper
Code

Fast and Precise: Adjusting Planning Horizon with Adaptive Subgoal Search

1 code implementation • 1 Jun 2022 • Michał Zawalski, Michał Tyrolski, Konrad Czechowski, Tomasz Odrzygóźdź, Damian Stachura, Piotr Piękos, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

Complex reasoning problems contain states that vary in the computational cost required to determine a good action plan.

Rubik's Cube

Paper
Code

Autoformalization with Large Language Models

no code implementations • 25 May 2022 • Yuhuai Wu, Albert Q. Jiang, Wenda Li, Markus N. Rabe, Charles Staats, Mateja Jamnik, Christian Szegedy

Autoformalization is the process of automatically translating from natural language mathematics to formal specifications and proofs.

Ranked #1 on Automated Theorem Proving on miniF2F-test (using extra training data)

Automated Theorem Proving Program Synthesis

Paper
Add Code

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

no code implementations • 22 May 2022 • Albert Q. Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr Miłoś, Yuhuai Wu, Mateja Jamnik

Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8. 2\%$ of problems neither language models nor automated theorem provers are able to solve on their own.

Ranked #2 on Automated Theorem Proving on miniF2F-test

Automated Theorem Proving

Paper
Add Code

STaR: Bootstrapping Reasoning With Reasoning

1 code implementation • 28 Mar 2022 • Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman

We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA.

Ranked #17 on Common Sense Reasoning on CommonsenseQA

Common Sense Reasoning Language Modelling +1

Paper
Code

Memorizing Transformers

3 code implementations • ICLR 2022 • Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian Szegedy

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights.

Language Modelling Math

609

Paper
Code

Block-Recurrent Transformers

3 code implementations • 11 Mar 2022 • DeLesley Hutchins, Imanol Schlag, Yuhuai Wu, Ethan Dyer, Behnam Neyshabur

It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens.

Language Modelling

233

Paper
Code

Hierarchical Transformers Are More Efficient Language Models

3 code implementations • Findings (NAACL) 2022 • Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski

Transformer models yield impressive results on many NLP and sequence modeling tasks.

Ranked #4 on Image Generation on ImageNet 32x32 (bpd metric)

Image Generation Language Modelling

48,096

Paper
Code

Invariant Causal Representation Learning for Out-of-Distribution Generalization

no code implementations • ICLR 2022 • Chaochao Lu, Yuhuai Wu, José Miguel Hernández-Lobato, Bernhard Schölkopf

Extensive experiments on both synthetic and real-world datasets show that our approach outperforms a variety of baseline methods.

Out-of-Distribution Generalization Representation Learning

Paper
Add Code

Learning to Give Checkable Answers with Prover-Verifier Games

no code implementations • 27 Aug 2021 • Cem Anil, Guodong Zhang, Yuhuai Wu, Roger Grosse

We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover.

Paper
Add Code

Subgoal Search For Complex Reasoning Tasks

1 code implementation • NeurIPS 2021 • Konrad Czechowski, Tomasz Odrzygóźdź, Marek Zbysiński, Michał Zawalski, Krzysztof Olejnik, Yuhuai Wu, Łukasz Kuciński, Piotr Miłoś

In this paper, we implement kSubS using a transformer-based subgoal module coupled with the classical best-first search framework.

Rubik's Cube

Paper
Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

847

Paper
Code

Nonlinear Invariant Risk Minimization: A Causal Approach

no code implementations • 24 Feb 2021 • Chaochao Lu, Yuhuai Wu, Jośe Miguel Hernández-Lobato, Bernhard Schölkopf

Finally, in the discussion, we further explore the aforementioned assumption and propose a more general hypothesis, called the Agnostic Hypothesis: there exist a set of hidden causal factors affecting both inputs and outcomes.

BIG-bench Machine Learning Representation Learning

Paper
Add Code

Proof Artifact Co-training for Theorem Proving with Language Models

4 code implementations • ICLR 2022 • Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, Stanislas Polu

Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built.

Ranked #7 on Automated Theorem Proving on miniF2F-test

Automated Theorem Proving Imitation Learning +1

Paper
Code

LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

1 code implementation • 15 Jan 2021 • Yuhuai Wu, Markus Rabe, Wenda Li, Jimmy Ba, Roger Grosse, Christian Szegedy

While designing inductive bias in neural architectures has been widely studied, we hypothesize that transformer networks are flexible enough to learn inductive bias from suitable generic tasks.

Inductive Bias Mathematical Reasoning

Paper
Code

Invariant Causal Representation Learning

no code implementations • 1 Jan 2021 • Chaochao Lu, Yuhuai Wu, José Miguel Hernández-Lobato, Bernhard Schölkopf

As an alternative, we propose Invariant Causal Representation Learning (ICRL), a learning paradigm that enables out-of-distribution generalization in the nonlinear setting (i. e., nonlinear representations and nonlinear classifiers).

Out-of-Distribution Generalization Representation Learning

Paper
Add Code

The Scattering Compositional Learner: Discovering Objects, Attributes, Relationships in Analogical Reasoning

3 code implementations • 8 Jul 2020 • Yuhuai Wu, Honghua Dong, Roger Grosse, Jimmy Ba

In this work, we focus on an analogical reasoning task that contains rich compositional structures, Raven's Progressive Matrices (RPM).

Zero-shot Generalization

Paper
Code

Learning Branching Heuristics for Propositional Model Counting

no code implementations • 7 Jul 2020 • Pashootan Vaezipoor, Gil Lederman, Yuhuai Wu, Chris J. Maddison, Roger Grosse, Sanjit A. Seshia, Fahiem Bacchus

In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.

Paper
Add Code

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

1 code implementation • ICLR 2021 • Yuhuai Wu, Albert Qiaochu Jiang, Jimmy Ba, Roger Grosse

In learning-assisted theorem proving, one of the most critical challenges is to generalize to theorems unlike those seen at training time.

Automated Theorem Proving

Paper
Code

IsarStep: a Benchmark for High-level Mathematical Reasoning

2 code implementations • ICLR 2021 • Wenda Li, Lei Yu, Yuhuai Wu, Lawrence C. Paulson

In this paper, we present a benchmark for high-level mathematical reasoning and study the reasoning capabilities of neural sequence-to-sequence models.

Mathematical Proofs Mathematical Reasoning +1

Paper
Code

A Non-asymptotic comparison of SVRG and SGD: tradeoffs between compute and speed

no code implementations • 25 Sep 2019 • Qingru Zhang, Yuhuai Wu, Fartash Faghri, Tianzong Zhang, Jimmy Ba

In this paper, we present a non-asymptotic analysis of SVRG under a noisy least squares regression problem.

Computational Efficiency regression +1

Paper
Add Code

Options as responses: Grounding behavioural hierarchies in multi-agent RL

no code implementations • 4 Jun 2019 • Alexander Sasha Vezhnevets, Yuhuai Wu, Remi Leblond, Joel Z. Leibo

This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn't seen during training.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

ACTRCE: Augmenting Experience via Teacher’s Advice

no code implementations • ICLR 2019 • Yuhuai Wu, Harris Chan, Jamie Kiros, Sanja Fidler, Jimmy Ba

Sparse reward is one of the most challenging problems in reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Concurrent Meta Reinforcement Learning

1 code implementation • 7 Mar 2019 • Emilio Parisotto, Soham Ghosh, Sai Bhargav Yalamanchi, Varsha Chinnaobireddy, Yuhuai Wu, Ruslan Salakhutdinov

In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other.

Ranked #1 on Meta Reinforcement Learning on 3-Reacher

Efficient Exploration Meta-Learning +4

Paper
Code

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

no code implementations • 12 Feb 2019 • Harris Chan, Yuhuai Wu, Jamie Kiros, Sanja Fidler, Jimmy Ba

We first analyze the differences among goal representation, and show that ACTRCE can efficiently solve difficult reinforcement learning problems in challenging 3D navigation tasks, whereas HER with non-language goal representation failed to learn.

Multi-Goal Reinforcement Learning reinforcement-learning +1

Paper
Add Code

The Importance of Sampling inMeta-Reinforcement Learning

no code implementations • NeurIPS 2018 • Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Understanding Short-Horizon Bias in Stochastic Meta-Optimization

1 code implementation • ICLR 2018 • Yuhuai Wu, Mengye Ren, Renjie Liao, Roger Grosse

Careful tuning of the learning rate, or even schedules thereof, can be crucial to effective neural net training.

Paper
Code

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

7 code implementations • ICLR 2018 • Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

We consider the problem of exploration in meta reinforcement learning.

Meta Reinforcement Learning reinforcement-learning +1

224

Paper
Code

An Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients

no code implementations • 17 Jan 2018 • Jiaming Song, Yuhuai Wu

In this technical report, we consider an approach that combines the PPO objective and K-FAC natural gradient optimization, for which we call PPOKFAC.

Paper
Add Code

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

7 code implementations • ICLR 2018 • Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud

Gradient-based optimization is the foundation of deep learning and reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

159

Paper
Code

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

8 code implementations • NeurIPS 2017 • Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

Atari Games Continuous Control +2

15,338

Paper
Code

Sticking the Landing: Simple, Lower-Variance Gradient Estimators for Variational Inference

1 code implementation • NeurIPS 2017 • Geoffrey Roeder, Yuhuai Wu, David Duvenaud

We propose a simple and general variant of the standard reparameterized gradient estimator for the variational evidence lower bound.

Variational Inference

Paper
Code

On the Quantitative Analysis of Decoder-Based Generative Models

2 code implementations • 14 Nov 2016 • Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse

The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities.

128

Paper
Code

On Multiplicative Integration with Recurrent Neural Networks

no code implementations • NeurIPS 2016 • Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov

We introduce a general and simple structural design called Multiplicative Integration (MI) to improve recurrent neural networks (RNNs).

Language Modelling

Paper
Add Code

Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations

no code implementations • NeurIPS 2016 • Behnam Neyshabur, Yuhuai Wu, Ruslan Salakhutdinov, Nathan Srebro

We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations.

Paper
Add Code

Architectural Complexity Measures of Recurrent Neural Networks

no code implementations • NeurIPS 2016 • Saizheng Zhang, Yuhuai Wu, Tong Che, Zhouhan Lin, Roland Memisevic, Ruslan Salakhutdinov, Yoshua Bengio

In this paper, we systematically analyze the connecting architectures of recurrent neural networks (RNNs).

Ranked #23 on Language Modelling on Text8

Language Modelling

Paper
Add Code

STDP as presynaptic activity times rate of change of postsynaptic activity

no code implementations • 19 Sep 2015 • Yoshua Bengio, Thomas Mesnard, Asja Fischer, Saizheng Zhang, Yuhuai Wu

We introduce a weight update formula that is expressed only in terms of firing rates and their derivatives and that results in changes consistent with those associated with spike-timing dependent plasticity (STDP) rules and biological observations, even though the explicit timing of spikes is not needed.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.