Search Results for author: David Wu

Found 15 papers, 8 papers with code

The Virtues of Pessimism in Inverse Reinforcement Learning

no code implementations4 Feb 2024 David Wu, Gokul Swamy, J. Andrew Bagnell, Zhiwei Steven Wu, Sanjiban Choudhury

Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations.

Offline RL reinforcement-learning +1

Accelerating Inverse Reinforcement Learning with Expert Bootstrapping

no code implementations4 Feb 2024 David Wu, Sanjiban Choudhury

Existing inverse reinforcement learning methods (e. g. MaxEntIRL, $f$-IRL) search over candidate reward functions and solve a reinforcement learning problem in the inner loop.

Imitation Learning reinforcement-learning

CryptOpt: Automatic Optimization of Straightline Code

1 code implementation31 May 2023 Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus Wagner, Yuval Yarom

Manual engineering of high-performance implementations typically consumes many resources and requires in-depth knowledge of the hardware.

Robust Risk-Aware Option Hedging

no code implementations27 Mar 2023 David Wu, Sebastian Jaimungal

The objectives of option hedging/trading extend beyond mere protection against downside risks, with a desire to seek gains also driving agent's strategies.

Reinforcement Learning (RL)

Improving Chess Commentaries by Combining Language Models with Symbolic Reasoning Engines

no code implementations15 Dec 2022 Andrew Lee, David Wu, Emily Dinan, Mike Lewis

Despite many recent advancements in language modeling, state-of-the-art language models lack grounding in the real world and struggle with tasks involving complex reasoning.

Language Modelling

CryptOpt: Verified Compilation with Randomized Program Search for Cryptographic Primitives (full version)

1 code implementation19 Nov 2022 Joel Kuepper, Andres Erbsen, Jason Gross, Owen Conoly, Chuyue Sun, Samuel Tian, David Wu, Adam Chlipala, Chitchanok Chuengsatiansup, Daniel Genkin, Markus Wagner, Yuval Yarom

Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language.

Benchmarking C++ code

Self-Explaining Deviations for Coordination

no code implementations13 Jul 2022 Hengyuan Hu, Samuel Sokota, David Wu, Anton Bakhtin, Andrei Lupu, Brandon Cui, Jakob N. Foerster

Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world.

$AIR^2$ for Interaction Prediction

1 code implementation16 Nov 2021 David Wu, Yunnan Wu

The 2021 Waymo Interaction Prediction Challenge introduced a problem of predicting the future trajectories and confidences of two interacting agents jointly.

motion prediction

QK Iteration: A Self-Supervised Representation Learning Algorithm for Image Similarity

no code implementations15 Nov 2021 David Wu, Yunnan Wu

Previous work in contrastive self-supervised learning has identified the importance of being able to optimize representations while ``pushing'' against a large number of negative examples.

Copy Detection Image Retrieval +2

No-Press Diplomacy from Scratch

1 code implementation NeurIPS 2021 Anton Bakhtin, David Wu, Adam Lerer, Noam Brown

Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.

Starcraft

Likelihood-based estimation and prediction for a measles outbreak in Samoa

2 code implementations30 Mar 2021 David Wu, Helen Petousis-Harris, Janine Paynter, Vinod Suresh, Oliver J. Maclaren

Stochastic models can help with misspecification but are even more expensive to simulate and perform inference with.

Uncertainty Quantification

Off-Belief Learning

5 code implementations6 Mar 2021 Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.

Maximum a Posteriori Inference of Random Dot Product Graphs via Conic Programming

no code implementations6 Jan 2021 David Wu, David R. Palmer, Daryl R. Deford

We present a convex cone program to infer the latent probability matrix of a random dot product graph (RDPG).

Bayesian Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.