1 code implementation • 9 Jun 2025 • Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar
In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout.
no code implementations • 2 Jun 2025 • Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, Sainbayar Sukhbaatar
In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself.
1 code implementation • 21 Apr 2025 • Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
Scaling inference-time computation has substantially improved the reasoning capabilities of language models.
1 code implementation • 19 Mar 2025 • Yifei Zhou, Song Jiang, Yuandong Tian, Jason Weston, Sergey Levine, Sainbayar Sukhbaatar, Xian Li
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks.
1 code implementation • 13 Feb 2025 • Hao Bai, Yifei Zhou, Li Erran Li, Sergey Levine, Aviral Kumar
To make the VLM features amenable for representing the Q-function, we need to employ an initial phase of fine-tuning to amplify coverage over actionable information needed for value function.
no code implementations • 17 Dec 2024 • Yifei Zhou, Qianlan Yang, Kaixiang Lin, Min Bai, Xiong Zhou, Yu-Xiong Wang, Sergey Levine, Erran Li
We validate PAE on challenging vision-based web navigation, using both real-world and self-hosted websites from WebVoyager and WebArena. To the best of our knowledge, this work represents the first effective learning system to apply autonomous task proposal with RL for agents that generalizes real-world human-annotated benchmarks with SOTA performances.
no code implementations • 21 Sep 2024 • Grace Tang, Swetha Rajkumar, Yifei Zhou, Homer Rich Walke, Sergey Levine, Kuan Fang
Building generalist robotic systems involves effectively endowing robots with the capabilities to handle novel objects in an open-world setting.
1 code implementation • 14 Jun 2024 • Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL.
1 code implementation • 10 Jun 2024 • Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang
To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system.
no code implementations • 16 May 2024 • Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann Lecun, Yi Ma, Sergey Levine
Finally, our framework uses these task rewards to fine-tune the entire VLM with RL.
1 code implementation • 9 Apr 2024 • Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr
We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control.
2 code implementations • 29 Feb 2024 • Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar
In this paper, we develop a framework for building multi-turn RL algorithms for fine-tuning LLMs, that preserves the flexibility of existing single-turn RL methods for LLMs (e. g., proximal policy optimization), while accommodating multiple turns, long horizons, and delayed rewards effectively.
1 code implementation • 14 Nov 2023 • Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun
In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data.
2 code implementations • 22 Feb 2023 • Yifei Zhou, Juntao Ren, Fengyu Li, Ramin Zabih, Ser-Nam Lim
Advances in the field of vision-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations.
1 code implementation • ICCV 2023 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim
In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.
1 code implementation • 8 Nov 2022 • Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim
In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.
1 code implementation • 13 Oct 2022 • Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
1 code implementation • 10 Oct 2022 • Yujie Zhang, Qi Yang, Yifei Zhou, Xiaozhong Xu, Le Yang, Yiling Xu
The goal of objective point cloud quality assessment (PCQA) research is to develop quantitative metrics that measure point cloud quality in a perceptually consistent manner.
1 code implementation • 5 Oct 2022 • Yifei Zhou, Renyu Li, Hayden Housen, Ser-Nam Lim
Paraphrase Identification is a fundamental task in Natural Language Processing.
no code implementations • Findings (NAACL) 2022 • Yifei Zhou, Yansong Feng
Recent works show that discourse analysis benefits from modeling intra- and inter-sentential levels separately, where proper representations for text units of different granularities are desired to capture both the meaning of text units and their relations to the context.
no code implementations • 21 Apr 2022 • Shrenik Zinage, Suyash Jadhav, Yifei Zhou, Ilias Bilionis, Peter Meckl
The objective of this paper is to develop a model to predict the transient and steady-state behavior of the turbine using the Koopman operator which can be helpful for control design and analysis.