no code implementations • 18 Oct 2024 • Yuxiang Lu, Shengcao Cao, Yu-Xiong Wang
Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks.
no code implementations • 10 Oct 2024 • Shengcao Cao, Liang-Yan Gui, Yu-Xiong Wang
Contrary to the common practice that fine-tunes LMMs with additional grounding supervision, we find that the grounding ability can in fact emerge in LMMs trained without explicit grounding supervision.
no code implementations • 18 Apr 2024 • Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang
Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.
no code implementations • CVPR 2024 • Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang
The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes.
Ranked #1 on Zero-shot 3D Point Cloud Classification on ScanObjectNN (Pretrained on ShapeNet) (using extra training data)
1 code implementation • NeurIPS 2023 • Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang
The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects.
no code implementations • 25 Sep 2023 • Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell
Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.
no code implementations • 17 Aug 2023 • Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui
To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt.
1 code implementation • CVPR 2023 • Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang
Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain).
no code implementations • 13 May 2021 • Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani
To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures.
no code implementations • 7 Mar 2021 • Shengcao Cao, Xiaofang Wang, Kris Kitani
Using a sampling-based search algorithm and parallel computing, our method can find an architecture which is better than DARTS and with an 80% reduction in wall-clock search time.
1 code implementation • ICCV 2021 • Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani
DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge.
2 code implementations • ICLR 2019 • Shengcao Cao, Xiaofang Wang, Kris M. Kitani
We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training.