1 code implementation • 27 Feb 2025 • Chuofan Ma, Yi Jiang, Junfeng Wu, Jihan Yang, Xin Yu, Zehuan Yuan, Bingyue Peng, Xiaojuan Qi
To bridge this gap, we introduce UniTok, a discrete visual tokenizer that encodes fine-grained details for generation while also capturing high-level semantics for understanding.
no code implementations • 28 Jan 2025 • Tianzhe Chu, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V. Le, Sergey Levine, Yi Ma
We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains.
1 code implementation • 18 Dec 2024 • Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie
Humans possess the visual-spatial intelligence to remember spaces from sequential visual observations.
1 code implementation • 24 Jun 2024 • Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Ziteng Wang, Rob Fergus, Yann Lecun, Saining Xie
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
1 code implementation • 21 Mar 2024 • Weipeng Deng, Jihan Yang, Runyu Ding, Jiahui Liu, Yijiang Li, Xiaojuan Qi, Edith Ngai
To test the language understandability of 3D-VL models, we first propose a language robustness task for systematically assessing 3D-VL models across various tasks, benchmarking their performance when presented with different language style variants.
1 code implementation • 5 Feb 2024 • Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie
There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created.
no code implementations • 1 Aug 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes.
Ranked #3 on
3D Open-Vocabulary Instance Segmentation
on S3DIS
1 code implementation • CVPR 2024 • Jihan Yang, Runyu Ding, Weipeng Deng, Zhe Wang, Xiaojuan Qi
We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely \textbf{RegionPLC}, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories.
1 code implementation • CVPR 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.
Ranked #2 on
3D Open-Vocabulary Instance Segmentation
on S3DIS
3D Open-Vocabulary Instance Segmentation
Contrastive Learning
+4
1 code implementation • 30 May 2022 • Jihan Yang, Shaoshuai Shi, Runyu Ding, Zhe Wang, Xiaojuan Qi
Then, we build a benchmark to assess existing KD methods developed in the 2D domain for 3D object detection upon six well-constructed teacher-student pairs.
1 code implementation • 4 Apr 2022 • Runyu Ding, Jihan Yang, Li Jiang, Xiaojuan Qi
Deep learning approaches achieve prominent success in 3D semantic segmentation.
1 code implementation • CVPR 2022 • Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, Xiaojuan Qi
Large-scale pre-training has been proven to be crucial for various computer vision tasks.
no code implementations • 15 Aug 2021 • Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi
These specific designs enable the detector to be trained on meticulously refined pseudo labeled target data with denoised training signals, and thus effectively facilitate adapting an object detector to a target domain without requiring annotations.
1 code implementation • ICCV 2021 • Ruifei He, Jihan Yang, Xiaojuan Qi
In this paper, we present a simple and yet effective Distribution Alignment and Random Sampling (DARS) method to produce unbiased pseudo labels that match the true class distribution estimated from the labeled data.
1 code implementation • CVPR 2021 • Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi
Then, the detector is iteratively improved on the target domain by alternatively conducting two steps, which are the pseudo label updating with the developed quality-aware triplet memory bank and the model training with curriculum data augmentation.
1 code implementation • 28 Aug 2020 • Shaoshuai Shi, Chaoxu Guo, Jihan Yang, Hongsheng Li
In this technical report, we present the top-performing LiDAR-only solutions for 3D detection, 3D tracking and domain adaptation three tracks in Waymo Open Dataset Challenges 2020.
no code implementations • 18 Dec 2019 • Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin
In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations.
3 code implementations • ICCV 2019 • Ruijia Xu, Guanbin Li, Jihan Yang, Liang Lin
Domain adaptation enables the learner to safely generalize into novel environments by mitigating domain shifts across distributions.
Ranked #8 on
Domain Adaptation
on ImageCLEF-DA