Search Results for author: Shengcao Cao

Found 12 papers, 4 papers with code

Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision

no code implementations10 Oct 2024 Shengcao Cao, Liang-Yan Gui, Yu-Xiong Wang

Contrary to the common practice that fine-tunes LMMs with additional grounding supervision, we find that the grounding ability can in fact emerge in LMMs trained without explicit grounding supervision.

Question Answering Visual Question Answering

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

no code implementations18 Apr 2024 Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks.

Segmentation

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

no code implementations CVPR 2024 Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes.

3D Shape Representation Representation Learning +3

HASSOD: Hierarchical Adaptive Self-Supervised Object Detection

1 code implementation NeurIPS 2023 Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects.

Object object-detection +2

Aligning Large Multimodal Models with Factually Augmented RLHF

no code implementations25 Sep 2023 Zhiqing Sun, Sheng Shen, Shengcao Cao, Haotian Liu, Chunyuan Li, Yikang Shen, Chuang Gan, Liang-Yan Gui, Yu-Xiong Wang, Yiming Yang, Kurt Keutzer, Trevor Darrell

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in "hallucination", generating textual outputs that are not grounded by the multimodal information in context.

Hallucination Image Captioning +1

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

no code implementations17 Aug 2023 Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui

To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt.

Edge-computing Instance Segmentation +5

Contrastive Mean Teacher for Domain Adaptive Object Detectors

1 code implementation CVPR 2023 Shengcao Cao, Dhiraj Joshi, Liang-Yan Gui, Yu-Xiong Wang

Object detectors often suffer from the domain gap between training (source domain) and real-world applications (target domain).

Contrastive Learning Object +4

Neighborhood-Aware Neural Architecture Search

no code implementations13 May 2021 Xiaofang Wang, Shengcao Cao, Mengtian Li, Kris M. Kitani

To facilitate the application to gradient-based algorithms, we also propose a differentiable representation for the neighborhood of architectures.

Neural Architecture Search

Efficient Model Performance Estimation via Feature Histories

no code implementations7 Mar 2021 Shengcao Cao, Xiaofang Wang, Kris Kitani

Using a sampling-based search algorithm and parallel computing, our method can find an architecture which is better than DARTS and with an 80% reduction in wall-clock search time.

Image Classification Neural Architecture Search

Rethinking Transformer-based Set Prediction for Object Detection

1 code implementation ICCV 2021 Zhiqing Sun, Shengcao Cao, Yiming Yang, Kris Kitani

DETR is a recently proposed Transformer-based method which views object detection as a set prediction problem and achieves state-of-the-art performance but demands extra-long training time to converge.

Object object-detection +1

Learnable Embedding Space for Efficient Neural Architecture Compression

2 code implementations ICLR 2019 Shengcao Cao, Xiaofang Wang, Kris M. Kitani

We also demonstrate that the learned embedding space can be transferred to new settings for architecture search, such as a larger teacher network or a teacher network in a different architecture family, without any training.

Bayesian Optimization Neural Architecture Search +1

Cannot find the paper you are looking for? You can Submit a new open access paper.