Search Results for author: Shaofei Cai

Found 16 papers, 8 papers with code

Open-World Skill Discovery from Unsegmented Demonstrations

no code implementations11 Mar 2025 Jingwen Deng, ZiHao Wang, Shaofei Cai, Anji Liu, Yitao Liang

Unlike existing methods that rely on sequence sampling or human labeling, we have developed a self-supervised learning-based approach to segment these long videos into a series of semantic-aware and skill-consistent segments.

Boundary Detection Event Segmentation +5

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

no code implementations4 Mar 2025 Shaofei Cai, Zhancun Mu, Anji Liu, Yitao Liang

We aim to develop a goal specification method that is semantically clear, spatially sensitive, and intuitive for human users to guide agent interactions in embodied environments.

Minecraft Spatial Reasoning

MineStudio: A Streamlined Package for Minecraft AI Agent Development

1 code implementation24 Dec 2024 Shaofei Cai, Zhancun Mu, Kaichen He, Bowei Zhang, Xinyue Zheng, Anji Liu, Yitao Liang

Minecraft has emerged as a valuable testbed for embodied intelligence and sequential decision-making research, yet the development and validation of novel agents remains hindered by significant engineering challenges.

AI Agent Decision Making +2

GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents

no code implementations7 Dec 2024 Shaofei Cai, Bowei Zhang, ZiHao Wang, Haowei Lin, Xiaojian Ma, Anji Liu, Yitao Liang

Developing agents that can follow multimodal instructions remains a fundamental challenge in robotics and AI.

Instruction Following

Optimizing Latent Goal by Learning from Trajectory Preference

no code implementations3 Dec 2024 Guangyu Zhao, Kewei Lian, Haowei Lin, Haobo Fu, Qiang Fu, Shaofei Cai, ZiHao Wang, Yitao Liang

Then we use preference learning to fine-tune the initial goal latent representation with the categorized trajectories while keeping the policy backbone frozen.

Continual Learning Instruction Following +1

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

1 code implementation23 Oct 2024 Shaofei Cai, ZiHao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, Yitao Liang

Using this approach, we train ROCKET-1, a low-level policy that predicts actions based on concatenated visual observations and segmentation masks, supported by real-time object tracking from SAM-2.

Decision Making Minecraft +3

OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

no code implementations27 Jun 2024 ZiHao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

First, we introduce a self-supervised approach to learn a behavior encoder that produces discretized tokens for behavior trajectories $\tau = \{o_0, a_0, \dots\}$ and an imitation learning policy decoder conditioned on these tokens.

Decoder Imitation Learning +3

JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models

1 code implementation10 Nov 2023 ZiHao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents.

Minecraft

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

no code implementations12 Oct 2023 Shaofei Cai, Bowei Zhang, ZiHao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations.

Decoder Instruction Following +1

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

2 code implementations CVPR 2023 Shaofei Cai, ZiHao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment for developing human-level multi-task agents.

Diversity Minecraft +2

Automatic Relation-aware Graph Network Proliferation

1 code implementation CVPR 2022 Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha, Qingming Huang

However, the currently used graph search space overemphasizes learning node features and neglects mining hierarchical relational information.

Graph Classification Graph Learning +5

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

1 code implementation22 Sep 2021 Bingchuan Li, Shaofei Cai, Wei Liu, Peng Zhang, Qian He, Miao Hua, Zili Yi

To address these limitations, we design a Dynamic Style Manipulation Network (DyStyle) whose structure and parameters vary by input samples, to perform nonlinear and adaptive manipulation of latent codes for flexible and precise attribute control.

Attribute Contrastive Learning

Edge-featured Graph Neural Architecture Search

no code implementations3 Sep 2021 Shaofei Cai, Liang Li, Xinzhe Han, Zheng-Jun Zha, Qingming Huang

Recently, researchers study neural architecture search (NAS) to reduce the dependence of human expertise and explore better GNN architectures, but they over-emphasize entity features and ignore latent relation information concealed in the edges.

Neural Architecture Search

Rethinking Graph Neural Architecture Search from Message-passing

1 code implementation CVPR 2021 Shaofei Cai, Liang Li, Jincan Deng, Beichen Zhang, Zheng-Jun Zha, Li Su, Qingming Huang

Inspired by the strong searching capability of neural architecture search (NAS) in CNN, this paper proposes Graph Neural Architecture Search (GNAS) with novel-designed search space.

feature selection Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.