Search Results for author: Yufei zha

Found 12 papers, 3 papers with code

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

no code implementations • 2 Mar 2024 • Junwen Xiong, Peng Zhang, Tao You, Chuanyue Li, Wei Huang, Yufei zha

Audio-visual saliency prediction can draw support from diverse modality complements, but further performance enhancement is still challenged by customized architectures as well as task-specific loss functions.

Denoising Saliency Prediction

Paper
Add Code

UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection

no code implementations • 15 Sep 2023 • Junwen Xiong, Peng Zhang, Chuanyue Li, Wei Huang, Yufei zha, Tao You

While many approaches have crafted task-specific training paradigms for either video saliency prediction or video salient object detection tasks, few attention has been devoted to devising a generalized saliency modeling framework that seamlessly bridges both these distinct tasks.

object-detection Saliency Prediction +3

Paper
Add Code

Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

1 code implementation • 9 Aug 2023 • Tianyu Liu, Peng Zhang, Wei Huang, Yufei zha, Tao You, Yanning Zhang

By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently.

Contrastive Learning

Paper
Code

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

no code implementations • 8 Jul 2023 • Ganglai Wang, Peng Zhang, Junwen Xiong, Feihan Yang, Wei Huang, Yufei zha

DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved.

Face Detection Face Swapping +2

Paper
Add Code

CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective

no code implementations • 11 Mar 2023 • Junwen Xiong, Ganglai Wang, Peng Zhang, Wei Huang, Yufei zha, Guangtao Zhai

Incorporating the audio stream enables Video Saliency Prediction (VSP) to imitate the selective attention mechanism of human brain.

Saliency Prediction Video Saliency Prediction

Paper
Add Code

CASP-Net: Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual Perspective

no code implementations • CVPR 2023 • Junwen Xiong, Ganglai Wang, Peng Zhang, Wei Huang, Yufei zha, Guangtao Zhai

Incorporating the audio stream enables Video Saliency Prediction (VSP) to imitate the selective attention mechanism of human brain.

Saliency Prediction Video Saliency Prediction

Paper
Add Code

An Audio-Visual Attention Based Multimodal Network for Fake Talking Face Videos Detection

no code implementations • 10 Mar 2022 • Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

DeepFake based digital facial forgery is threatening the public media security, especially when lip manipulation has been used in talking face generation, the difficulty of fake video detection is further improved.

Decision Making Face Detection +2

Paper
Add Code

Attention-Based Lip Audio-Visual Synthesis for Talking Face Generation in the Wild

no code implementations • 8 Mar 2022 • Ganglai Wang, Peng Zhang, Lei Xie, Wei Huang, Yufei zha

Rather than focusing on the unimportant regions of the face image, the proposed AttnWav2Lip model is able to pay more attention on the lip region reconstruction.

Talking Face Generation

Paper
Add Code

Audio-visual speech separation based on joint feature representation with cross-modal attention

no code implementations • 5 Mar 2022 • Junwen Xiong, Peng Zhang, Lei Xie, Wei Huang, Yufei zha, Yanning Zhang

Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments.

Optical Flow Estimation Speech Separation

Paper
Add Code

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

1 code implementation • 4 Mar 2022 • Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei zha

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding.

Multi-Task Learning Speech Enhancement

Paper
Code

Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking

1 code implementation • 31 Jul 2021 • Jingxian Sun, Lichao Zhang, Yufei zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, Yanning Zhang

To solve this problem, we propose to distill representations of the TIR modality from the RGB modality with Cross-Modal Distillation (CMD) on a large amount of unlabeled paired RGB-TIR data.

Transfer Learning

Paper
Code

Push for Quantization: Deep Fisher Hashing

no code implementations • 31 Aug 2019 • Yunqiang Li, Wenjie Pei, Yufei zha, Jan van Gemert

In this paper we push for quantization: We optimize maximum class separability in the binary space.

Quantization Semantic Similarity +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.