no code implementations • 11 Mar 2025 • Jingwen Deng, ZiHao Wang, Shaofei Cai, Anji Liu, Yitao Liang
Unlike existing methods that rely on sequence sampling or human labeling, we have developed a self-supervised learning-based approach to segment these long videos into a series of semantic-aware and skill-consistent segments.
no code implementations • 4 Mar 2025 • Shaofei Cai, Zhancun Mu, Anji Liu, Yitao Liang
We aim to develop a goal specification method that is semantically clear, spatially sensitive, and intuitive for human users to guide agent interactions in embodied environments.
1 code implementation • 24 Dec 2024 • Shaofei Cai, Zhancun Mu, Kaichen He, Bowei Zhang, Xinyue Zheng, Anji Liu, Yitao Liang
Minecraft has emerged as a valuable testbed for embodied intelligence and sequential decision-making research, yet the development and validation of novel agents remains hindered by significant engineering challenges.
no code implementations • 7 Dec 2024 • Shaofei Cai, Bowei Zhang, ZiHao Wang, Haowei Lin, Xiaojian Ma, Anji Liu, Yitao Liang
Developing agents that can follow multimodal instructions remains a fundamental challenge in robotics and AI.
no code implementations • 3 Dec 2024 • Guangyu Zhao, Kewei Lian, Haowei Lin, Haobo Fu, Qiang Fu, Shaofei Cai, ZiHao Wang, Yitao Liang
Then we use preference learning to fine-tune the initial goal latent representation with the categorized trajectories while keeping the policy backbone frozen.
1 code implementation • 23 Oct 2024 • Shaofei Cai, ZiHao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, Yitao Liang
Using this approach, we train ROCKET-1, a low-level policy that predicts actions based on concatenated visual observations and segmentation masks, supported by real-time object tracking from SAM-2.
no code implementations • 27 Jun 2024 • ZiHao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang
First, we introduce a self-supervised approach to learn a behavior encoder that produces discretized tokens for behavior trajectories $\tau = \{o_0, a_0, \dots\}$ and an imitation learning policy decoder conditioned on these tokens.
1 code implementation • 10 Nov 2023 • ZiHao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang
Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents.
no code implementations • 12 Oct 2023 • Shaofei Cai, Bowei Zhang, ZiHao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations.
1 code implementation • 3 Feb 2023 • ZiHao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, Yitao Liang
We investigate the challenge of task planning for multi-task embodied agents in open-world environments.
2 code implementations • CVPR 2023 • Shaofei Cai, ZiHao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
We study the problem of learning goal-conditioned policies in Minecraft, a popular, widely accessible yet challenging open-ended environment for developing human-level multi-task agents.
1 code implementation • CVPR 2022 • Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha, Qingming Huang
However, the currently used graph search space overemphasizes learning node features and neglects mining hierarchical relational information.
Ranked #2 on
Link Prediction
on TSP/HCP Benchmark set
no code implementations • 2 Apr 2022 • Zhenhuan Liu, Jincan Deng, Liang Li, Shaofei Cai, Qianqian Xu, Shuhui Wang, Qingming Huang
Conditional image generation is an active research topic including text2image and image translation.
Conditional Image Generation
Generative Adversarial Network
+1
1 code implementation • 22 Sep 2021 • Bingchuan Li, Shaofei Cai, Wei Liu, Peng Zhang, Qian He, Miao Hua, Zili Yi
To address these limitations, we design a Dynamic Style Manipulation Network (DyStyle) whose structure and parameters vary by input samples, to perform nonlinear and adaptive manipulation of latent codes for flexible and precise attribute control.
no code implementations • 3 Sep 2021 • Shaofei Cai, Liang Li, Xinzhe Han, Zheng-Jun Zha, Qingming Huang
Recently, researchers study neural architecture search (NAS) to reduce the dependence of human expertise and explore better GNN architectures, but they over-emphasize entity features and ignore latent relation information concealed in the edges.
1 code implementation • CVPR 2021 • Shaofei Cai, Liang Li, Jincan Deng, Beichen Zhang, Zheng-Jun Zha, Li Su, Qingming Huang
Inspired by the strong searching capability of neural architecture search (NAS) in CNN, this paper proposes Graph Neural Architecture Search (GNAS) with novel-designed search space.