1 code implementation • 27 Mar 2025 • Shuaiyu Zhang, Xun Lin, Rongxiang Zhang, Yu Bai, Yong Xu, Tao Tan, Xunbin Zheng, Zitong Yu
Furthermore, we introduce a survival prediction benchmark designed to resolve scenarios with missing modalities, mirroring real-world clinical conditions.
no code implementations • 24 Mar 2025 • Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Chendi Li, Jinghua Yan, Yu Bai, Ponnuswamy Sadayappan, Xia Hu, Bo Yuan
Vision-Language Models (VLMs) demand substantial computational resources during inference, largely due to the extensive visual input tokens for representing visual information.
no code implementations • 16 Mar 2025 • Haoqi Yuan, Yu Bai, Yuhui Fu, Bohan Zhou, Yicheng Feng, Xinrun Xu, Yi Zhan, Börje F. Karlsson, Zongqing Lu
The Connector enhances the FM's embodied capabilities by translating language-based plans into actionable skill commands and dynamically coordinating locomotion and manipulation to improve task success.
no code implementations • 3 Mar 2025 • Harshita Mhaske, Aniket Mandhare, Jidong Huang, Yu Bai
Designing robots capable of traversing uneven terrain and overcoming physical obstacles has been a longstanding challenge in the field of robotics.
1 code implementation • 21 Feb 2025 • Hongjie Zhu, Zeyu Zhang, Guansong Pang, Xu Wang, Shimin Wen, Yu Bai, Daji Ergu, Ying Cai, Yang Zhao
This alignment of activation responses with semantic information strengthens the propagation and decoupling of target features, enabling the generated embeddings to more accurately represent target features in high-level semantic space.
Weakly supervised Semantic Segmentation
Weakly-Supervised Semantic Segmentation
no code implementations • 21 Dec 2024 • OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich, Andrey Mishchenko, Andy Applebaum, Angela Jiang, Ashvin Nair, Barret Zoph, Behrooz Ghorbani, Ben Rossen, Benjamin Sokolowsky, Boaz Barak, Bob McGrew, Borys Minaiev, Botao Hao, Bowen Baker, Brandon Houghton, Brandon McKinzie, Brydon Eastman, Camillo Lugaresi, Cary Bassin, Cary Hudson, Chak Ming Li, Charles de Bourcy, Chelsea Voss, Chen Shen, Chong Zhang, Chris Koch, Chris Orsinger, Christopher Hesse, Claudia Fischer, Clive Chan, Dan Roberts, Daniel Kappler, Daniel Levy, Daniel Selsam, David Dohan, David Farhi, David Mely, David Robinson, Dimitris Tsipras, Doug Li, Dragos Oprica, Eben Freeman, Eddie Zhang, Edmund Wong, Elizabeth Proehl, Enoch Cheung, Eric Mitchell, Eric Wallace, Erik Ritter, Evan Mays, Fan Wang, Felipe Petroski Such, Filippo Raso, Florencia Leoni, Foivos Tsimpourlas, Francis Song, Fred von Lohmann, Freddie Sulit, Geoff Salmon, Giambattista Parascandolo, Gildas Chabot, Grace Zhao, Greg Brockman, Guillaume Leclerc, Hadi Salman, Haiming Bao, Hao Sheng, Hart Andrin, Hessam Bagherinezhad, Hongyu Ren, Hunter Lightman, Hyung Won Chung, Ian Kivlichan, Ian O'Connell, Ian Osband, Ignasi Clavera Gilaberte, Ilge Akkaya, Ilya Kostrikov, Ilya Sutskever, Irina Kofman, Jakub Pachocki, James Lennon, Jason Wei, Jean Harb, Jerry Twore, Jiacheng Feng, Jiahui Yu, Jiayi Weng, Jie Tang, Jieqi Yu, Joaquin Quiñonero Candela, Joe Palermo, Joel Parish, Johannes Heidecke, John Hallman, John Rizzo, Jonathan Gordon, Jonathan Uesato, Jonathan Ward, Joost Huizinga, Julie Wang, Kai Chen, Kai Xiao, Karan Singhal, Karina Nguyen, Karl Cobbe, Katy Shi, Kayla Wood, Kendra Rimbach, Keren Gu-Lemberg, Keren GuLemberg, Kevin Liu, Kevin Lu, Kevin Stone, Kevin Yu, Lama Ahmad, Lauren Yang, Leo Liu, Leon Maksin, Leyton Ho, Liam Fedus, Lilian Weng, Linden Li, Lindsay McCallum, Lindsey Held, Lorenz Kuhn, Lukas Kondraciuk, Lukasz Kaiser, Luke Metz, Madelaine Boyd, Maja Trebacz, Manas Joglekar, Mark Chen, Marko Tintor, Mason Meyer, Matt Jones, Matt Kaufer, Max Schwarzer, Meghan Shah, Mehmet Yatbaz, Melody Guan, Mengyuan Xu, Mengyuan Yan, Mia Glaese, Mianna Chen, Michael Lampe, Michael Malek, Michele Wang, Michelle Fradin, Mike McClay, Mikhail Pavlov, Miles Wang, Mingxuan Wang, Mira Murati, Mo Bavarian, Mostafa Rohaninejad, Nat McAleese, Neil Chowdhury, Nick Ryder, Nikolas Tezak, Noam Brown, Ofir Nachum, Oleg Boiko, Oleg Murk, Olivia Watkins, Patrick Chao, Paul Ashbourne, Pavel Izmailov, Peter Zhokhov, Rachel Dias, Rahul Arora, Randall Lin, Rapha Gontijo Lopes, Raz Gaon, Reah Miyara, Reimar Leike, Renny Hwang, Rhythm Garg, Robin Brown, Roshan James, Rui Shu, Ryan Cheu, Ryan Greene, Saachi Jain, Sam Altman, Sam Toizer, Sam Toyer, Samuel Miserendino, Sandhini Agarwal, Santiago Hernandez, Sasha Baker, Scott McKinney, Scottie Yan, Shengjia Zhao, Shengli Hu, Shibani Santurkar, Shraman Ray Chaudhuri, Shuyuan Zhang, Siyuan Fu, Spencer Papay, Steph Lin, Suchir Balaji, Suvansh Sanjeev, Szymon Sidor, Tal Broda, Aidan Clark, Tao Wang, Taylor Gordon, Ted Sanders, Tejal Patwardhan, Thibault Sottiaux, Thomas Degry, Thomas Dimson, Tianhao Zheng, Timur Garipov, Tom Stasi, Trapit Bansal, Trevor Creech, Troy Peterson, Tyna Eloundou, Valerie Qi, Vineet Kosaraju, Vinnie Monaco, Vitchyr Pong, Vlad Fomenko, Weiyi Zheng, Wenda Zhou, Wes McCabe, Wojciech Zaremba, Yann Dubois, Yinghai Lu, Yining Chen, Young Cha, Yu Bai, Yuchen He, Yuchen Zhang, Yunyun Wang, Zheng Shao, Zhuohan Li
The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought.
2 code implementations • 9 Dec 2024 • Cunshi Wang, Xinjie Hu, Yu Zhang, Xunhao Chen, Pengliang Du, Yiming Mao, Rui Wang, Yuyang Li, Ying Wu, Hang Yang, Yansong Li, Beichuan Wang, Haiyang Mu, Zheng Wang, Jianfeng Tian, Liang Ge, Yongna Mao, Shengming Li, Xiaomeng Lu, Jinhang Zou, Yang Huang, Ningchen Sun, Jie Zheng, Min He, Yu Bai, Junjie Jin, Hong Wu, Chaohui Shang, Jifeng Liu
Additionally, the integration of AI agents within the system provides online accessibility, saving astronomers' time and encouraging greater participation from amateur astronomers in the NGSS project.
no code implementations • 21 Nov 2024 • Yu Bai, Boxuan Xie, Ruifan Zhu, Zheng Chang, Riku Jantti
Backscatter communication (BC) becomes a promising energy-efficient solution for future wireless sensor networks (WSNs).
1 code implementation • 17 Oct 2024 • Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, Song Mei
Next, we extend our analysis to pretrained LLMs, including Llama and OLMo, showing that many attention heads exhibit a similar active-dormant mechanism as in the BB task, and that the mutual reinforcement mechanism also governs the emergence of extreme-token phenomena during LLM pretraining.
no code implementations • 2 Aug 2024 • Mingshuo Liu, Shiyi Luo, Kevin Han, Bo Yuan, Ronald F. DeMara, Yu Bai
The fast development of object detection techniques has attracted attention to developing efficient Deep Neural Networks (DNNs).
no code implementations • 30 Jul 2024 • Chaofeng Guan, Yaohui Zhu, Yu Bai, Lingyun Wang
This kind of category descriptions contain the characteristics of the aspect categories, guiding the construction of discriminative category prototypes.
1 code implementation • 24 Jul 2024 • Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, HaoNing Wu, J. H. Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin
Moreover, we explored the ability of LVLMs to perceive image sequences within the context of our multi-image association task.
no code implementations • 21 Jun 2024 • Yu Bai, Yukai Miao, Li Chen, Dawei Wang, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai
To address this task, we propose Pistis-RAG, a new RAG framework designed with a content-centric approach to better align LLMs with human preferences.
1 code implementation • 17 Jun 2024 • Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung
Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences.
no code implementations • 17 Jun 2024 • Heyan Huang, Yinghao Li, Huashan Sun, Yu Bai, Yang Gao
Recent studies have demonstrated that In-Context Learning (ICL), through the use of specific demonstrations, can align Large Language Models (LLMs) with human preferences known as In-Context Alignment (ICA), indicating that models can comprehend human instructions without requiring parameter adjustments.
no code implementations • 27 May 2024 • Tengbo Wang, Yu Bai
In this paper, we propose Boundary-Assisted Instance Segmentation (BAISeg), which is a novel paradigm for WSIS that realizes instance segmentation with pixel-level annotations.
1 code implementation • 19 Apr 2024 • Yukang Wei, Yu Bai
Temperature plays a pivotal role in moderating label softness in the realm of knowledge distillation (KD).
1 code implementation • 16 Apr 2024 • Yu-Yang Li, Yu Bai, Cunshi Wang, Mengwei Qu, Ziteng Lu, Roberto Soria, Jifeng Liu
Light curves serve as a valuable source of information on stellar formation and evolution.
1 code implementation • 8 Apr 2024 • Ruiqi Zhang, Licong Lin, Yu Bai, Song Mei
LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tasks.
no code implementations • 8 Feb 2024 • Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese
Natural language serves as a common and straightforward signal for humans to interact seamlessly with machines.
no code implementations • 20 Jan 2024 • Yu Bai, Heyan Huang, Cesare Spinoso-Di Piano, Marc-Antoine Rondeau, Sanxing Chen, Yang Gao, Jackie Chi Kit Cheung
In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing.
1 code implementation • CVPR 2024 • Shuai Shao, Yu Bai, Yan Wang, BaoDi Liu, Yicong Zhou
Open-World Few-Shot Learning (OFSL) is a critical field of research concentrating on the precise identification of target samples in environments with scarce data and unreliable labels thus possessing substantial practical significance.
no code implementations • 29 Nov 2023 • Lei Zhao, Mengdi Wang, Yu Bai
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an \emph{expert policy} -- plays a critical role in developing intelligent systems.
no code implementations • 16 Oct 2023 • Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai
Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting.
1 code implementation • 12 Oct 2023 • Licong Lin, Yu Bai, Song Mei
This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.
no code implementations • 11 Sep 2023 • Yukai Miao, Yu Bai, Li Chen, Dan Li, Haifeng Sun, Xizheng Wang, Ziqiu Luo, Yanyu Ren, Dapeng Sun, Xiuting Xu, Qi Zhang, Chao Xiang, Xinchi Li
Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry.
no code implementations • 6 Jul 2023 • Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai
This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case.
no code implementations • 29 Jun 2023 • Simone Wills, Yu Bai, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik
Voicebots have provided a new avenue for supporting the development of language skills, particularly within the context of second language learning.
no code implementations • 7 Jun 2023 • Yu Bai, Cristian Tejedor-Garcia, Ferdy Hubers, Catia Cucchiarini, Helmer Strik
We saw that ASR has potential at this stage of the reading process, as the results suggested that pupils made progress in reading accuracy and fluency by using the software.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 2 Jun 2023 • Minshuo Chen, Jie Meng, Yu Bai, Yinyu Ye, H. Vincent Poor, Mengdi Wang
We present algorithms and establish near-optimal regret upper and lower bounds, of the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in the delayed and missing observation settings.
2 code implementations • 15 Feb 2023 • Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Yu Bai
We prove that our methods achieve near-optimal strongly adaptive regret for all interval lengths simultaneously, and approximately valid coverage.
no code implementations • 13 Feb 2023 • Yuanhao Wang, Qinghua Liu, Yu Bai, Chi Jin
A unique challenge in Multi-Agent Reinforcement Learning (MARL) is the curse of multiagency, where the description length of the game as well as the complexity of many existing learning algorithms scale exponentially with the number of agents.
no code implementations • 6 Feb 2023 • Yuheng Zhang, Yu Bai, Nan Jiang
We study offline multi-agent reinforcement learning (RL) in Markov games, where the goal is to learn an approximate equilibrium -- such as Nash equilibrium and (Coarse) Correlated Equilibrium -- from an offline dataset pre-collected from the game.
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 2 Feb 2023 • Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai
However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds.
no code implementations • 23 Oct 2022 • Prafulla Kumar Choubey, Yu Bai, Chien-Sheng Wu, Wenhao Liu, Nazneen Rajani
Pre-trained language models (PLMs) have been shown effective for zero-shot (0shot) text classification.
no code implementations • 20 Oct 2022 • Yuanhao Wang, Dingwen Kong, Yu Bai, Chi Jin
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) whose sample complexities are polynomial in all problem parameters including the number of players.
no code implementations • 9 Oct 2022 • Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade
Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.
no code implementations • 29 Sep 2022 • Fan Chen, Yu Bai, Song Mei
Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions.
no code implementations • 23 Sep 2022 • Fan Chen, Song Mei, Yu Bai
We make progress on this question by developing a unified algorithm framework for a large class of learning goals, building on the Decision-Estimation Coefficient (DEC) framework.
1 code implementation • 8 Jun 2022 • Eshaan Nichani, Yu Bai, Jason D. Lee
Next, we show that a wide two-layer neural network can jointly use the NTK and QuadNTK to fit target functions consisting of a dense low-degree term and a sparse high-degree term -- something neither the NTK nor the QuadNTK can do on their own.
no code implementations • 6 Jun 2022 • Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, Yu Bai
Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate.
no code implementations • 30 May 2022 • Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu
A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).
no code implementations • 15 May 2022 • Ziang Song, Song Mei, Yu Bai
We then design an uncoupled no-regret algorithm that finds an $\varepsilon$-approximate $K$-EFCE within $\widetilde{\mathcal{O}}(\max_{i}X_iA_i^{K}/\varepsilon^2)$ iterations in the full feedback setting, where $X_i$ and $A_i$ are the number of information sets and actions for the $i$-th player.
no code implementations • COLING 2022 • Xiaochen Liu, Yang Gao, Yu Bai, Jiawei Li, Yinan Hu, Heyan Huang, Boxing Chen
Few-shot abstractive summarization has become a challenging task in natural language generation.
1 code implementation • 30 Mar 2022 • Jiaao Zhan, Qian Chen, Boxing Chen, Wen Wang, Yu Bai, Yang Gao
We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input.
1 code implementation • ICLR 2022 • Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, Caiming Xiong
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly over existing approaches in several applications such as prediction intervals with improved length, minimum-volume prediction sets for multi-output regression, and label prediction sets for image classification.
no code implementations • 13 Feb 2022 • Hao Wang, Yu Bai, Guangmin Sun, Jie Liu
Powerful recognition algorithms are widely used in the Internet or important medical systems, which poses a serious threat to personal privacy.
no code implementations • 3 Feb 2022 • Yu Bai, Chi Jin, Song Mei, Tiancheng Yu
This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors.
no code implementations • 3 Jan 2022 • Michael Curry, Alexander Trott, Soham Phade, Yu Bai, Stephan Zheng
We validate the learned solutions are $\epsilon$-meta-equilibria through best-response analyses, show that they align with economic intuitions, and show our approach can learn a spectrum of qualitatively distinct $\epsilon$-meta-equilibria in open RBC models.
Deep Reinforcement Learning
Multi-agent Reinforcement Learning
+2
1 code implementation • 15 Oct 2021 • Yu Bai, Heyan Huang, Kai Fan, Yang Gao, Yiming Zhu, Jiaao Zhan, Zewen Chi, Boxing Chen
Through introducing compression rate, the information ratio between the source and the target text, we regard the MT task as a special CLS task with a compression rate of 100%.
no code implementations • ICLR 2022 • Ziang Song, Song Mei, Yu Bai
First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes.
no code implementations • 23 Sep 2021 • Zewen Chi, Heyan Huang, Luyang Liu, Yu Bai, Xian-Ling Mao
The success of pretrained cross-lingual language models relies on two essential abilities, i. e., generalization ability for learning downstream tasks in a source language, and cross-lingual transferability for transferring the task knowledge to other languages.
no code implementations • NeurIPS 2021 • Yu Bai, Song Mei, Huan Wang, Caiming Xiong
Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input.
no code implementations • NeurIPS 2021 • Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai
This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL.
1 code implementation • ACL 2021 • Yu Bai, Yang Gao, Heyan Huang
Employing one unified decoder to generate the sequential concatenation of monolingual and cross-lingual summaries, MCLAS makes the monolingual summarization task a prerequisite of the cross-lingual summarization (CLS) task.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+2
no code implementations • 30 Mar 2021 • Bo Dong, Hao liu, Yu Bai, Jinbiao Lin, Zhuoran Xu, Xinyu Xu, Qi Kong
Predicting future trajectories of surrounding obstacles is a crucial task for autonomous driving cars to achieve a high degree of road safety.
no code implementations • 8 Mar 2021 • Zitong Yang, Yu Bai, Song Mei
We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to $\infty$), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating solutions.
no code implementations • NeurIPS 2021 • Yu Bai, Chi Jin, Huan Wang, Caiming Xiong
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum.
no code implementations • 22 Feb 2021 • Rachel Luo, Aadyot Bhatnagar, Yu Bai, Shengjia Zhao, Huan Wang, Caiming Xiong, Silvio Savarese, Stefano Ermon, Edward Schmerling, Marco Pavone
In this work, we propose the local calibration error (LCE) to span the gap between average and individual reliability.
no code implementations • 15 Feb 2021 • Yu Bai, Song Mei, Huan Wang, Caiming Xiong
Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident.
no code implementations • NeurIPS 2021 • Ming Yin, Yu Bai, Yu-Xiang Wang
Our main result shows that OPDVR provably identifies an $\epsilon$-optimal policy with $\widetilde{O}(H^2/d_m\epsilon^2)$ episodes of offline data in the finite-horizon stationary transition setting, where $H$ is the horizon length and $d_m$ is the minimal marginal state-action distribution induced by the behavior policy.
no code implementations • 1 Jan 2021 • Yu Bai, Tengyu Ma, Huan Wang, Caiming Xiong
In this paper, we propose Neural Rank Preserving Transforms (NRPT), a new post-calibration method that adjusts the output probabilities of a trained classifier using a calibrator of higher capacity, while maintaining its prediction accuracy.
no code implementations • 12 Oct 2020 • Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong
A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.
no code implementations • 4 Oct 2020 • Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches.
Model-based Reinforcement Learning
Multi-agent Reinforcement Learning
+3
no code implementations • 7 Jul 2020 • Ming Yin, Yu Bai, Yu-Xiang Wang
The problem of Offline Policy Evaluation (OPE) in Reinforcement Learning (RL) is a critical step towards applying RL in real-life applications.
no code implementations • NeurIPS 2020 • Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher
When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.
no code implementations • NeurIPS 2020 • Yu Bai, Chi Jin, Tiancheng Yu
This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games.
no code implementations • ICML 2020 • Yu Bai, Chi Jin
We introduce a self-play algorithm---Value Iteration with Upper/Lower Confidence Bound (VI-ULCB)---and show that it achieves regret $\tilde{\mathcal{O}}(\sqrt{T})$ after playing $T$ steps of the game, where the regret is measured by the agent's performance against a \emph{fully adversarial} opponent who can exploit the agent's strategy at \emph{any} step.
no code implementations • 10 Feb 2020 • Yu Bai, Ben Krause, Huan Wang, Caiming Xiong, Richard Socher
We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width.
no code implementations • 21 Oct 2019 • Ke Zhan, Shimiao Jiang, Yu Bai, Yi Li, Xu Liu, Zhuoran Xu
Eltwise layer is a commonly used structure in the multi-branch deep learning network.
no code implementations • ICLR 2020 • Yu Bai, Jason D. Lee
Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by he Neural Tangent Kernels (NTKs).
no code implementations • NeurIPS 2019 • Yu Bai, Tengyang Xie, Nan Jiang, Yu-Xiang Wang
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization.
no code implementations • 1 Mar 2019 • Yu Bai, John Duchi, Song Mei
We study a family of (potentially non-convex) constrained optimization problems with convex composite structure.
1 code implementation • ICLR 2019 • Yu Bai, Qijia Jiang, Ju Sun
This paper concerns dictionary learning, i. e., sparse coding, a fundamental representation learning problem.
1 code implementation • ICLR 2019 • Yu Bai, Yu-Xiang Wang, Edo Liberty
To make deep neural networks feasible in resource-constrained environments (such as mobile devices), it is beneficial to quantize models by using low-precision weights.
no code implementations • ICLR 2019 • Yu Bai, Tengyu Ma, Andrej Risteski
Our preliminary experiments show that on synthetic datasets the test IPM is well correlated with KL divergence or the Wasserstein distance, indicating that the lack of diversity in GANs may be caused by the sub-optimality in optimization instead of statistical inefficiency.
no code implementations • 29 Aug 2017 • Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yi-Peng Zhang, Jian Tang, Qinru Qiu, Xue Lin, Bo Yuan
As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy.
no code implementations • 10 Jul 2017 • Yu Bai, Sally Goldman, Li Zhang
TAPAS is a novel adaptive sampling method for the softmax model.
no code implementations • 22 Jul 2016 • Song Mei, Yu Bai, Andrea Montanari
We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors).