Search Results for author: Ziang Song

Found 5 papers, 1 papers with code

Reward Collapse in Aligning Large Language Models

1 code implementation • 28 May 2023 • Ziang Song, Tianle Cai, Jason D. Lee, Weijie J. Su

This insight allows us to derive closed-form expressions for the reward distribution associated with a set of utility functions in an asymptotic regime.

Paper
Code

Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent

no code implementations • 30 May 2022 • Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu

A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).

Paper
Add Code

Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games

no code implementations • 15 May 2022 • Ziang Song, Song Mei, Yu Bai

We then design an uncoupled no-regret algorithm that finds an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2)$ iterations in the full feedback setting, where $X_i$ and $A_i$ are the number of information sets and actions for the $i$-th player.

Paper
Add Code

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

no code implementations • Findings (NAACL) 2022 • Yukun Feng, Feng Li, Ziang Song, Boyuan Zheng, Philipp Koehn

We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0. 91 s-BLEU over the sentence-level baseline.

Document Level Machine Translation Machine Translation +2

Paper
Add Code

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

no code implementations • ICLR 2022 • Ziang Song, Song Mei, Yu Bai

First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes.

Multi-agent Reinforcement Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.