Search Results for author: JianHua Tao

Found 50 papers, 16 papers with code

Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

no code implementations • 22 Mar 2024 • Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, JianHua Tao

Multimodal fusion is a significant method for most multimodal tasks.

Paper
Add Code

Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning

no code implementations • 18 Feb 2024 • Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, JianHua Tao

To address data scarcity, this paper proposes a new data collection pipeline.

Deception Detection

Paper
Add Code

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

1 code implementation • 19 Jan 2024 • Cunhang Fan, Yujie Chen, Jun Xue, Yonghui Kong, JianHua Tao, Zhao Lv

This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly reduce the complexity of pre-trained models.

Knowledge Graph Completion Language Modelling +1

Paper
Code

HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

1 code implementation • 11 Jan 2024 • Licai Sun, Zheng Lian, Bin Liu, JianHua Tao

Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in recent years for its critical role in creating emotion-ware intelligent machines.

Ranked #3 on Dynamic Facial Expression Recognition on MAFW

Contrastive Learning Dynamic Facial Expression Recognition +3

Paper
Code

SVFAP: Self-supervised Video Facial Affect Perceiver

1 code implementation • 31 Dec 2023 • Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, JianHua Tao

Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction.

Ranked #3 on Dynamic Facial Expression Recognition on FERV39k

Dynamic Facial Expression Recognition Emotion Recognition +2

Paper
Code

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

1 code implementation • 15 Dec 2023 • Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, JianHua Tao

The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.

Continual Learning DeepFake Detection +3

Paper
Code

GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition

1 code implementation • 7 Dec 2023 • Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, JianHua Tao

To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition.

Facial Emotion Recognition Micro Expression Recognition +3

Paper
Code

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

no code implementations • 7 Sep 2023 • Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, JianHua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu

Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals.

EEG

Paper
Add Code

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

1 code implementation • 7 Aug 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenglong Wang, Chuyuan Zhang

The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.

Continual Learning Speech Emotion Recognition

Paper
Code

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition

1 code implementation • 5 Jul 2023 • Licai Sun, Zheng Lian, Bin Liu, JianHua Tao

Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines.

Ranked #2 on Dynamic Facial Expression Recognition on FERV39k

Dynamic Facial Expression Recognition Facial Expression Recognition

Paper
Code

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

no code implementations • 9 Jun 2023 • Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu

Self-supervised speech models are a rapidly developing research topic in fake audio detection.

Paper
Add Code

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

no code implementations • 9 Jun 2023 • Haogeng Liu, Tao Wang, Jie Cao, Ran He, JianHua Tao

When decreasing the number of sampling steps (i. e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations.

Denoising Speech Synthesis

Paper
Add Code

Adaptive Fake Audio Detection with Low-Rank Model Squeezing

no code implementations • 8 Jun 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu

During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.

Paper
Add Code

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis

no code implementations • 3 May 2023 • Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, JianHua Tao, Jianqing Sun, Jiaen Liang

However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphasis.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

3 code implementations • 18 Apr 2023 • Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao

The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.

Multi-Label Learning Multimodal Emotion Recognition

Paper
Code

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

no code implementations • 10 Jan 2023 • Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, JianHua Tao

Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality.

Quantization Voice Conversion

Paper
Add Code

Emotion Selectable End-to-End Text-based Speech Editing

no code implementations • 20 Dec 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen, Chu Yuan Zhang

To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech.

Data Augmentation

Paper
Add Code

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection

1 code implementation • 11 Nov 2022 • Jiangyan Yi, Chenglong Wang, JianHua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu

Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.

Speech Enhancement

Paper
Code

IRNet: Iterative Refinement Network for Noisy Partial Label Learning

1 code implementation • 9 Nov 2022 • Zheng Lian, Mingyu Xu, Lan Chen, Licai Sun, Bin Liu, JianHua Tao

In this paper, we relax this assumption and focus on a more general problem, noisy PLL, where the ground-truth label may not exist in the candidate set.

Data Augmentation Partial Label Learning +1

Paper
Code

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

no code implementations • 6 Oct 2022 • Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, JianHua Tao

Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

no code implementations • 21 Aug 2022 • Xinrui Yan, Jiangyan Yi, Chenglong Wang, JianHua Tao, Junzuo Zhou, Hao Gu, Ruibo Fu

The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation.

Face Swapping Speech Synthesis

Paper
Add Code

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

no code implementations • 20 Aug 2022 • Xinrui Yan, Jiangyan Yi, JianHua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu

Many effective attempts have been made for fake audio detection.

Paper
Add Code

Fully Automated End-to-End Fake Audio Detection

no code implementations • 20 Aug 2022 • Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.

Paper
Add Code

Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis

1 code implementation • 16 Aug 2022 • Licai Sun, Zheng Lian, Bin Liu, JianHua Tao

With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.

Multimodal Sentiment Analysis Representation Learning

Paper
Code

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

no code implementations • 2 Aug 2022 • Jun Xue, Cunhang Fan, Zhao Lv, JianHua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately.

DeepFake Detection Face Swapping

Paper
Add Code

Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge

no code implementations • 23 Jul 2022 • Haiyang Sun, Zheng Lian, Bin Liu, JianHua Tao, Licai Sun, Cong Cai

In this paper, we propose the solution to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.

Multi-Task Learning Vocal Bursts Valence Prediction

Paper
Add Code

Adaptive Pseudo-Siamese Policy Network for Temporal Knowledge Prediction

no code implementations • 26 Apr 2022 • Pengpeng Shao, Tong Liu, Feihu Che, Dawei Zhang, JianHua Tao

Specifically, we design the policy network in our model as a pseudo-siamese policy network that consists of two sub-policy networks.

Knowledge Graphs Link Prediction +1

Paper
Add Code

EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition

no code implementations • 25 Mar 2022 • Haiyang Sun, Zheng Lian, Bin Liu, Ying Li, Licai Sun, Cong Cai, JianHua Tao, Meng Wang, Yuan Cheng

Speech emotion recognition (SER) is an important research topic in human-computer interaction.

Neural Architecture Search Vocal Bursts Valence Prediction

Paper
Add Code

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

no code implementations • 5 Mar 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen

We have also verified through experiments that this method can effectively control the noise components in the predicted speech and adjust the SNR of speech.

Paper
Add Code

GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation

1 code implementation • 4 Mar 2022 • Zheng Lian, Lan Chen, Licai Sun, Bin Liu, JianHua Tao

To this end, we propose a novel framework for incomplete multimodal learning in conversations, called "Graph Complete Network (GCNet)", filling the gap of existing works.

Paper
Code

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing

1 code implementation • 21 Feb 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen

It can solve unnatural prosody in the edited region and synthesize the speech corresponding to the unseen words in the transcript.

Few-Shot Learning Sentence

163

Paper
Code

MixKG: Mixing for harder negative samples in knowledge graph

no code implementations • 19 Feb 2022 • Feihu Che, Guohua Yang, Pengpeng Shao, Dawei Zhang, JianHua Tao

The representations of entities and relations are learned via contrasting the positive and negative triplets.

Knowledge Graph Embedding Knowledge Graphs

Paper
Add Code

ADD 2022: the First Audio Deep Synthesis Detection Challenge

no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.

Audio Generation DeepFake Detection +1

Paper
Add Code

Singing-Tacotron: Global duration control attention and dynamic filter for End-to-end singing voice synthesis

no code implementations • 16 Feb 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen

Firstly, we propose a global duration control attention mechanism for the SVS model.

Singing Voice Synthesis

Paper
Add Code

Reducing language context confusion for end-to-end code-switching automatic speech recognition

no code implementations • 28 Jan 2022 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, JianHua Tao, Yu Ting Yeung, Liqun Deng

We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Knowledge graph enhanced recommender system

no code implementations • 17 Dec 2021 • Zepeng Huai, JianHua Tao, Feihu Che, Guohua Yang, Dawei Zhang

This is attributed to the rich attribute information contained in KG to improve item and user representations as side information.

Attribute Knowledge Graphs +1

Paper
Add Code

Multi-Level Graph Contrastive Learning

no code implementations • 6 Jul 2021 • Pengpeng Shao, Tong Liu, Dawei Zhang, JianHua Tao, Feihu Che, Guohua Yang

In this paper, we propose a Multi-Level Graph Contrastive Learning (MLGCL) framework for learning robust representation of graph data by contrasting space views of graphs.

Contrastive Learning Graph Representation Learning +3

Paper
Add Code

Continual Learning for Fake Audio Detection

no code implementations • 15 Apr 2021 • Haoxin Ma, Jiangyan Yi, JianHua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang

However, fine-tuning leads to performance degradation on previous data.

Continual Learning Knowledge Distillation +1

Paper
Add Code

Half-Truth: A Partially Fake Audio Detection Dataset

1 code implementation • 8 Apr 2021 • Jiangyan Yi, Ye Bai, JianHua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu

Therefore, this paper develops such a dataset for half-truth audio detection (HAD).

Speech Synthesis

Paper
Code

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

no code implementations • 7 Apr 2021 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence.

Position speech-recognition +1

Paper
Add Code

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

1 code implementation • 4 Apr 2021 • Zhengkun Tian, Jiangyan Yi, JianHua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model.

speech-recognition Speech Recognition +1

Paper
Code

Fast End-to-End Speech Recognition via Non-Autoregressive Models and Cross-Modal Knowledge Transferring from BERT

no code implementations • 15 Feb 2021 • Ye Bai, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Based on this idea, we propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once).

Language Modelling Position +3

Paper
Add Code

Tucker decomposition-based Temporal Knowledge Graph Completion

1 code implementation • 16 Nov 2020 • Pengpeng Shao, Guohua Yang, Dawei Zhang, JianHua Tao, Feihu Che, Tong Liu

Developing the model for temporal knowledge graphs completion is an increasingly important task.

Link Prediction Temporal Knowledge Graph Completion +1

Paper
Code

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

no code implementations • 11 Nov 2020 • Cunhang Fan, Bin Liu, JianHua Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song

This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.

Speech Enhancement

Paper
Add Code

Self-supervised Graph Representation Learning via Bootstrapping

no code implementations • 10 Nov 2020 • Feihu Che, Guohua Yang, Dawei Zhang, JianHua Tao, Pengpeng Shao, Tong Liu

In addition, we summarize three kinds of augmentation methods for graph-structured data and apply them to the DGB.

Graph Representation Learning

Paper
Add Code

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

no code implementations • 9 Nov 2020 • Cunhang Fan, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen

The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

no code implementations • 28 Oct 2020 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder and the two-stage inference method into the streaming CTC model.

Language Modelling speech-recognition +1

Paper
Add Code

Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition

no code implementations • Interspeech 2020 • Zheng Lian, JianHua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li

Emotion recognition remains a complex task due to speaker variations and low-resource training samples.

Ranked #1 on Speech Emotion Recognition on IEMOCAP (using extra training data)

Multimodal Emotion Recognition Speech Emotion Recognition

Paper
Add Code

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

no code implementations • 28 Oct 2020 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, JianHua Tao, Zhengqi Wen

In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Deep imitator: Handwriting calligraphy imitation via deep attention networks

no code implementations • Pattern Recognition 2020 • Bocheng Zhao, JianHua Tao, Minghao Yang, Zhengkun Tian, Cunhang Fan, Ye Bai

Calligraphy imitation (CI) from a handful of target handwriting samples is such a challenging task that most of the existing writing style analysis or handwriting generation methods do not exhibit satisfactory performance.

Deep Attention Handwriting generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.