Search Results for author: Yifei Zhou

Found 21 papers, 15 papers with code

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

1 code implementation9 Jun 2025 Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar

In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout.

Reinforcement Learning (RL)

Self-Challenging Language Model Agents

no code implementations2 Jun 2025 Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, Sainbayar Sukhbaatar

In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself.

Language Modeling Language Modelling +1

Learning Adaptive Parallel Reasoning with Language Models

1 code implementation21 Apr 2025 Jiayi Pan, Xiuyu Li, Long Lian, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr

Scaling inference-time computation has substantially improved the reasoning capabilities of language models.

4k

Digi-Q: Learning Q-Value Functions for Training Device-Control Agents

1 code implementation13 Feb 2025 Hao Bai, Yifei Zhou, Li Erran Li, Sergey Levine, Aviral Kumar

To make the VLM features amenable for representing the Q-function, we need to employ an initial phase of fine-tuning to amplify coverage over actionable information needed for value function.

Q-Learning Reinforcement Learning (RL)

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

no code implementations17 Dec 2024 Yifei Zhou, Qianlan Yang, Kaixiang Lin, Min Bai, Xiong Zhou, Yu-Xiong Wang, Sergey Levine, Erran Li

We validate PAE on challenging vision-based web navigation, using both real-world and self-hosted websites from WebVoyager and WebArena. To the best of our knowledge, this work represents the first effective learning system to apply autonomous task proposal with RL for agents that generalizes real-world human-annotated benchmarks with SOTA performances.

KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data

no code implementations21 Sep 2024 Grace Tang, Swetha Rajkumar, Yifei Zhou, Homer Rich Walke, Sergey Levine, Kuan Fang

Building generalist robotic systems involves effectively endowing robots with the capabilities to handle novel objects in an open-world setting.

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

1 code implementation14 Jun 2024 Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar

This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents through fine-tuning a pre-trained VLM in two stages: offline RL to initialize the model, followed by offline-to-online RL.

Offline RL

Aligning Large Language Models with Representation Editing: A Control Perspective

1 code implementation10 Jun 2024 Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system.

Autonomous Evaluation and Refinement of Digital Agents

1 code implementation9 Apr 2024 Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control.

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

2 code implementations29 Feb 2024 Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

In this paper, we develop a framework for building multi-turn RL algorithms for fine-tuning LLMs, that preserves the flexibility of existing single-turn RL methods for LLMs (e. g., proximal policy optimization), while accommodating multiple turns, long horizons, and delayed rewards effectively.

Language Modeling Language Modelling +1

Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

1 code implementation14 Nov 2023 Yifei Zhou, Ayush Sekhari, Yuda Song, Wen Sun

In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data.

Offline RL

Test-Time Distribution Normalization for Contrastively Learned Vision-language Models

2 code implementations22 Feb 2023 Yifei Zhou, Juntao Ren, Fengyu Li, Ramin Zabih, Ser-Nam Lim

Advances in the field of vision-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations.

Contrastive Learning

BT^2: Backward-compatible Training with Basis Transformation

1 code implementation ICCV 2023 Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

$BT^2$: Backward-compatible Training with Basis Transformation

1 code implementation8 Nov 2022 Yifei Zhou, Zilu Li, Abhinav Shrivastava, Hengshuang Zhao, Antonio Torralba, Taipeng Tian, Ser-Nam Lim

In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling.

Retrieval

Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

1 code implementation13 Oct 2022 Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.

Montezuma's Revenge Q-Learning

TCDM: Transformational Complexity Based Distortion Metric for Perceptual Point Cloud Quality Assessment

1 code implementation10 Oct 2022 Yujie Zhang, Qi Yang, Yifei Zhou, Xiaozhong Xu, Le Yang, Yiling Xu

The goal of objective point cloud quality assessment (PCQA) research is to develop quantitative metrics that measure point cloud quality in a perceptually consistent manner.

Point Cloud Quality Assessment

Improve Discourse Dependency Parsing with Contextualized Representations

no code implementations Findings (NAACL) 2022 Yifei Zhou, Yansong Feng

Recent works show that discourse analysis benefits from modeling intra- and inter-sentential levels separately, where proper representations for text units of different granularities are desired to capture both the meaning of text units and their relations to the context.

Articles Dependency Parsing

Data Driven Modeling of Turbocharger Turbine using Koopman Operator

no code implementations21 Apr 2022 Shrenik Zinage, Suyash Jadhav, Yifei Zhou, Ilias Bilionis, Peter Meckl

The objective of this paper is to develop a model to predict the transient and steady-state behavior of the turbine using the Koopman operator which can be helpful for control design and analysis.

Cannot find the paper you are looking for? You can Submit a new open access paper.