Search Results for author: Xueyan Zou

Found 18 papers, 11 papers with code

LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models

no code implementations4 Feb 2025 Yuto Kojima, Jiarui Xu, Xueyan Zou, Xiaolong Wang

The rapid advancements in vision-language models (VLMs), such as CLIP, have intensified the need to address distribution shifts between training and testing datasets.

Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation

no code implementations30 Jan 2025 Yuelei Li, Ge Yan, Annabella Macaluso, Mazeyu Ji, Xueyan Zou, Xiaolong Wang

In aligning high-level and low-level control for robot actions, language embeddings representing the high-level policy are jointly attended with the 3D feature field in the 3D transformer for seamless integration.

Memorization Scene Understanding +1

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

no code implementations5 Dec 2024 An-Chieh Cheng, Yandong Ji, Zhaojing Yang, Zaitian Gongye, Xueyan Zou, Jan Kautz, Erdem Biyik, Hongxu Yin, Sifei Liu, Xiaolong Wang

This paper proposes to solve the problem of Vision-and-Language Navigation with legged robots, which not only provides a flexible way for humans to command but also allows the robot to navigate through more challenging and cluttered scenes.

Navigate Vision and Language Navigation

WildLMa: Long Horizon Loco-Manipulation in the Wild

no code implementations22 Nov 2024 Ri-Zhao Qiu, Yuchen Song, Xuanbin Peng, Sai Aneesh Suryadevara, Ge Yang, Minghuan Liu, Mazeyu Ji, Chengzhe Jia, Ruihan Yang, Xueyan Zou, Xiaolong Wang

`In-the-wild' mobile manipulation aims to deploy robots in diverse real-world environments, which requires the robot to (1) have skills that generalize across object configurations; (2) be capable of long-horizon task execution in diverse environments; and (3) perform complex manipulation beyond pick-and-place.

Imitation Learning

GraspSplats: Efficient Manipulation with 3D Feature Splatting

no code implementations3 Sep 2024 Mazeyu Ji, Ri-Zhao Qiu, Xueyan Zou, Xiaolong Wang

With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings.

Feature Splatting NeRF

$\textbf{PLUM}$: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases

no code implementations11 Jun 2024 Dylan Zhang, Shizhe Diao, Xueyan Zou, Hao Peng

Recent findings demonstrate that on-policy data is the key to successful preference learning, where the preference data is collected using the same policy LM being trained.

Code Generation HumanEval +1

Interfacing Foundation Models' Embeddings

1 code implementation12 Dec 2023 Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Junyi Wei, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity.

Decoder Image Segmentation +3

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation5 Dec 2023 Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

Decoder

Visual In-Context Prompting

3 code implementations CVPR 2024 Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Decoder Segmentation +1

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

3 code implementations17 Oct 2023 Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V.

Interactive Segmentation Referring Expression +4

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation10 Jul 2023 Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

Segment Everything Everywhere All at Once

3 code implementations NeurIPS 2023 Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Decoder Image Segmentation +5

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations ICCV 2023 Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

 Ranked #1 on Instance Segmentation on Cityscapes val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

End-to-End Instance Edge Detection

no code implementations6 Apr 2022 Xueyan Zou, Haotian Liu, Yong Jae Lee

We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection.

Decoder Edge Detection +6

Progressive Temporal Feature Alignment Network for Video Inpainting

1 code implementation CVPR 2021 Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee

To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content.

Optical Flow Estimation Video Inpainting

Delving Deeper into Anti-aliasing in ConvNets

2 code implementations21 Aug 2020 Xueyan Zou, Fanyi Xiao, Zhiding Yu, Yong Jae Lee

Aliasing refers to the phenomenon that high frequency signals degenerate into completely different ones after sampling.

Instance Segmentation Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.