Search Results for author: Yuhang Cao

Found 17 papers, 11 papers with code

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models

1 code implementation22 Feb 2024 Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang

We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.

Hallucination

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

no code implementations28 Sep 2023 Xiang Lyu, Yuhang Cao, Qing Wang, JingJing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts.

Action Detection Activity Detection +3

DiaCorrect: Error Correction Back-end For Speaker Diarization

1 code implementation15 Sep 2023 Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.

Automatic Speech Recognition speaker-diarization +3

V3Det: Vast Vocabulary Visual Detection Dataset

no code implementations ICCV 2023 Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin

2) Hierarchical Category Organization: The vast vocabulary of V3Det is organized by a hierarchical category tree which annotates the inclusion relationship among categories, encouraging the exploration of category relationships in vast and open vocabulary object detection.

Chatbot Object +2

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection

no code implementations6 May 2022 Yuhang Cao, Jiaqi Wang, Yiqi Lin, Dahua Lin

The offline mining mechanism leverages a self-supervised discriminative model to collaboratively mine implicit novel instances with a trained FSOD network.

Few-Shot Object Detection object-detection

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

no code implementations10 Feb 2022 Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.

Action Detection Activity Detection +2

Few-Shot Object Detection via Association and DIscrimination

1 code implementation NeurIPS 2021 Yuhang Cao, Jiaqi Wang, Ying Jin, Tong Wu, Kai Chen, Ziwei Liu, Dahua Lin

1) In the association step, in contrast to implicitly leveraging multiple base classes, we construct a compact novel class feature space via explicitly imitating a specific base class feature space.

Few-Shot Object Detection Object +3

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection

no code implementations21 May 2021 Shijie Fang, Yuhang Cao, Xinjiang Wang, Kai Chen, Dahua Lin, Wayne Zhang

The performance of object detection, to a great extent, depends on the availability of large annotated datasets.

object-detection Object Detection +2

Feature Pyramid Grids

1 code implementation7 Apr 2020 Kai Chen, Yuhang Cao, Chen Change Loy, Dahua Lin, Christoph Feichtenhofer

Feature pyramid networks have been widely adopted in the object detection literature to improve feature representations for better handling of variations in scale.

Neural Architecture Search object-detection +2

Side-Aware Boundary Localization for More Precise Object Detection

3 code implementations ECCV 2020 Jiaqi Wang, Wenwei Zhang, Yuhang Cao, Kai Chen, Jiangmiao Pang, Tao Gong, Jianping Shi, Chen Change Loy, Dahua Lin

To tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket.

Object object-detection +2

Prime Sample Attention in Object Detection

1 code implementation CVPR 2020 Yuhang Cao, Kai Chen, Chen Change Loy, Dahua Lin

Our experiments demonstrate that it is often more effective to focus on prime samples than hard samples when training a detector.

Object object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.