Search Results for author: Ying Cheng

Found 9 papers, 3 papers with code

Modality-Aware Contrastive Instance Learning with Self-Distillation for Weakly-Supervised Audio-Visual Violence Detection

1 code implementation12 Jul 2022 Jiashuo Yu, Jinyu Liu, Ying Cheng, Rui Feng, Yuejie Zhang

In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning.

Anomaly Detection In Surveillance Videos audio-visual learning +1

Learning Music-Dance Representations through Explicit-Implicit Rhythm Synchronization

no code implementations7 Jul 2022 Jiashuo Yu, Junfu Pu, Ying Cheng, Rui Feng, Ying Shan

Although audio-visual representation has been proved to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated.

Contrastive Learning Representation Learning +2

Domain Adaptive Cascade R-CNN for MItosis DOmain Generalization (MIDOG) Challenge

no code implementations1 Sep 2021 Xi Long, Ying Cheng, Xiao Mu, Lian Liu, Jingxin Liu

We present a summary of the domain adaptive cascade R-CNN method for mitosis detection of digital histopathology images.

Data Augmentation Domain Generalization +1

MPN: Multimodal Parallel Network for Audio-Visual Event Localization

no code implementations7 Apr 2021 Jiashuo Yu, Ying Cheng, Rui Feng

The localization subnetwork consists of Multimodal Bottleneck Attention Module (MBAM), which is designed to extract fine-grained segment-level contents.

audio-visual event localization General Classification

Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning

no code implementations13 Aug 2020 Ying Cheng, Ruize Wang, Zhihao Pan, Rui Feng, Yuejie Zhang

When watching videos, the occurrence of a visual event is often accompanied by an audio event, e. g., the voice of lip motion, the music of playing instruments.

Action Recognition Audio-Visual Synchronization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.