no code implementations • 18 Mar 2025 • Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, JianHua Tao, Tao Yu
To stimulate the reconstruction of immersive volumetric videos, we introduce ImViD, a multi-view, multi-modal dataset featuring complete space-oriented data capture and various indoor/outdoor scenarios.
no code implementations • 4 Feb 2025 • Jinyang Wu, Mingkuan Feng, Shuai Zhang, Ruihan Jin, Feihu Che, Zengqi Wen, JianHua Tao
Multimodal large language models (MLLMs) exhibit impressive capabilities but still face challenges in complex visual reasoning.
no code implementations • 29 Jan 2025 • Mingkuan Feng, Jinyang Wu, Shuai Zhang, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, JianHua Tao, Feihu Che
Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs.
no code implementations • 12 Jan 2025 • Kaiying Yan, Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, JianHua Tao, Xuefei Liu, Guanjun Li
Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information.
1 code implementation • 16 Dec 2024 • Yujie Chen, Jiangyan Yi, Cunhang Fan, JianHua Tao, Yong Ren, Siding Zeng, Chu Yuan Zhang, Xinrui Yan, Hao Gu, Jun Xue, Chenglong Wang, Zhao Lv, Xiaohui Zhang
To address this issue, we propose a continual learning method named Region-Based Optimization (RegO) for audio deepfake detection.
no code implementations • 2 Dec 2024 • Xinrui Yan, Jiangyan Yi, JianHua Tao, Yujie Chen, Hao Gu, Guanjun Li, Junzuo Zhou, Yong Ren, Tao Xu
To address the issues, we propose a novel framework for open set model attribution of deepfake audio with rejection threshold adaptation (ReTA).
no code implementations • 27 Nov 2024 • Jinyang Wu, Mingkuan Feng, Shuai Zhang, Feihu Che, Zengqi Wen, JianHua Tao
In-context Learning (ICL) enables large language models (LLMs) to tackle downstream tasks through sophisticated prompting and high-quality demonstrations.
no code implementations • 24 Nov 2024 • Haojie Zhang, Zhihao Liang, Ruibo Fu, Zhengqi Wen, Xuefei Liu, Chenxing Li, JianHua Tao, Yaling Liang
Then we propose a suitable solution according to the modality differences of image, audio, and video generation.
1 code implementation • 15 Oct 2024 • Sheng Yan, Cunhang Fan, Hongyu Zhang, Xiaoke Yang, JianHua Tao, Zhao Lv
To address these issues, this paper proposes a dual attention refinement network with spatiotemporal construction for AAD, named DARNet, which consists of the spatiotemporal construction module, dual attention refinement module, and feature fusion \& classifier module.
no code implementations • 18 Sep 2024 • Xin Qi, Ruibo Fu, Zhengqi Wen, Tao Wang, Chunyu Qiang, JianHua Tao, Chenxing Li, Yi Lu, Shuchen Shi, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Xuefei Liu, Guanjun Li
In recent years, speech diffusion models have advanced rapidly.
no code implementations • 14 Sep 2024 • Chenxu Xiong, Ruibo Fu, Shuchen Shi, Zhengqi Wen, JianHua Tao, Tao Wang, Chenxing Li, Chunyu Qiang, Yuankun Xie, Xin Qi, Guanjun Li, Zizheng Yang
Additionally, the Sound Event Reference Style Transfer Dataset (SERST) is introduced for the proposed target style audio generation task, enabling dual-prompt audio generation using both text and audio references.
no code implementations • 24 Aug 2024 • Jinyang Wu, Feihu Che, Chuyuan Zhang, JianHua Tao, Shuai Zhang, Pengpeng Shao
Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs).
1 code implementation • 20 Aug 2024 • Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, JianHua Tao, Guanjun Li, Long Ye
Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs.
no code implementations • 11 Aug 2024 • Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, JianHua Tao
For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the speech modality.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 9 Aug 2024 • Jiangyan Yi, Chu Yuan Zhang, JianHua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou
The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area.
1 code implementation • 17 Jul 2024 • Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, JianHua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li
We believe that MDPE will become a valuable resource for promoting research in the field of affective computing.
no code implementations • 11 Jul 2024 • Siding Zeng, Jiangyan Yi, JianHua Tao, Yujie Chen, Shan Liang, Yong Ren, Xiaohui Zhang
However, they overlook the characteristics in target domain that are absent in source domain.
no code implementations • 7 Jul 2024 • Ruibo Fu, Xin Qi, Zhengqi Wen, JianHua Tao, Tao Wang, Chunyu Qiang, Zhiyong Wang, Yi Lu, Xiaopeng Wang, Shuchen Shi, Yukun Liu, Xuefei Liu, Shuai Zhang
The results indicate that the ASRRL method significantly outperforms traditional fine-tuning approaches, achieving higher speaker similarity and better overall speech quality with limited reference speeches.
no code implementations • 2 Jul 2024 • Ruihan Jin, Ruibo Fu, Zhengqi Wen, Shuai Zhang, Yukun Liu, JianHua Tao
To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN).
1 code implementation • 15 Jun 2024 • Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, JianHua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li
Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation.
no code implementations • 12 Jun 2024 • Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, JianHua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi
To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform.
no code implementations • 5 Jun 2024 • Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, JianHua Tao
For effective OOD detection, we first explore current post-hoc OOD methods and propose NSD, a novel OOD approach in identifying novel deepfake algorithms through the similarity consideration of both feature and logits scores.
no code implementations • 9 May 2024 • Jinyang Wu, Feihu Che, Xinxin Zheng, Shuai Zhang, Ruihan Jin, Shuai Nie, Pengpeng Shao, JianHua Tao
Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents.
1 code implementation • 8 May 2024 • Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, JianHua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods.
2 code implementations • 26 Apr 2024 • Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels.
no code implementations • 24 Apr 2024 • Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, JianHua Tao
Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks.
no code implementations • 22 Mar 2024 • Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, JianHua Tao
Multimodal fusion is a significant method for most multimodal tasks.
no code implementations • 18 Feb 2024 • Kang Chen, Zheng Lian, Haiyang Sun, Rui Liu, Jiangyan Yi, Bin Liu, JianHua Tao
Deception detection has attracted increasing attention due to its importance in real-world scenarios.
1 code implementation • 19 Jan 2024 • Cunhang Fan, Yujie Chen, Jun Xue, Yonghui Kong, JianHua Tao, Zhao Lv
This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly reduce the complexity of pre-trained models.
1 code implementation • 11 Jan 2024 • Licai Sun, Zheng Lian, Bin Liu, JianHua Tao
Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in recent years for its critical role in creating emotion-ware intelligent machines.
Ranked #7 on
Dynamic Facial Expression Recognition
on MAFW
Contrastive Learning
Dynamic Facial Expression Recognition
+3
1 code implementation • 31 Dec 2023 • Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, JianHua Tao
Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction.
Ranked #6 on
Dynamic Facial Expression Recognition
on FERV39k
Dynamic Facial Expression Recognition
Emotion Recognition
+2
1 code implementation • 15 Dec 2023 • Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, JianHua Tao
The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms.
1 code implementation • 7 Dec 2023 • Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, JianHua Tao
To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition.
no code implementations • 7 Sep 2023 • Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, JianHua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu
Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals.
1 code implementation • 7 Aug 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenglong Wang, Chuyuan Zhang
The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.
1 code implementation • 5 Jul 2023 • Licai Sun, Zheng Lian, Bin Liu, JianHua Tao
Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines.
Ranked #4 on
Dynamic Facial Expression Recognition
on FERV39k
Dynamic Facial Expression Recognition
Facial Expression Recognition
no code implementations • 9 Jun 2023 • Haogeng Liu, Tao Wang, Jie Cao, Ran He, JianHua Tao
When decreasing the number of sampling steps (i. e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations.
no code implementations • 9 Jun 2023 • Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, JianHua Tao, Le Xu, Ruibo Fu
Self-supervised speech models are a rapidly developing research topic in fake audio detection.
no code implementations • 8 Jun 2023 • Xiaohui Zhang, Jiangyan Yi, JianHua Tao, Chenlong Wang, Le Xu, Ruibo Fu
During the inference stage, these adaptation matrices are combined with the existing model to generate the final prediction output.
no code implementations • 3 May 2023 • Jinlong Xue, Yayue Deng, Fengping Wang, Ya Li, Yingming Gao, JianHua Tao, Jianqing Sun, Jiaen Liang
However, it is still a challenge to comprehensively model the conversation, and a majority of conversational TTS systems only focus on extracting global information and omit local prosody features, which contain important fine-grained information like keywords and emphasis.
3 code implementations • 18 Apr 2023 • Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
no code implementations • 10 Jan 2023 • Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, JianHua Tao
Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality.
no code implementations • 20 Dec 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen, Chu Yuan Zhang
To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech.
2 code implementations • 11 Nov 2022 • Jiangyan Yi, Chenglong Wang, JianHua Tao, Chu Yuan Zhang, Cunhang Fan, Zhengkun Tian, Haoxin Ma, Ruibo Fu
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
1 code implementation • 9 Nov 2022 • Zheng Lian, Mingyu Xu, Lan Chen, Licai Sun, Bin Liu, JianHua Tao
In this paper, we relax this assumption and focus on a more general problem, noisy PLL, where the ground-truth label may not exist in the candidate set.
no code implementations • 6 Oct 2022 • Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, JianHua Tao
Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research.
no code implementations • 21 Aug 2022 • Xinrui Yan, Jiangyan Yi, JianHua Tao, Jie Chen
To address the challenges of attribution of continuously emerging unknown audio generation tools in the real world, we propose the Class-Representation Multi-Center Learning (CRML) method for open-set audio deepfake attribution (OSADA).
no code implementations • 20 Aug 2022 • Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu
The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.
no code implementations • 20 Aug 2022 • Xinrui Yan, Jiangyan Yi, JianHua Tao, Chenglong Wang, Haoxin Ma, Tao Wang, Shiming Wang, Ruibo Fu
Many effective attempts have been made for fake audio detection.
1 code implementation • 16 Aug 2022 • Licai Sun, Zheng Lian, Bin Liu, JianHua Tao
With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
no code implementations • 2 Aug 2022 • Jun Xue, Cunhang Fan, Zhao Lv, JianHua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao
Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately.
no code implementations • 23 Jul 2022 • Haiyang Sun, Zheng Lian, Bin Liu, JianHua Tao, Licai Sun, Cong Cai
In this paper, we propose the solution to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.
no code implementations • 26 Apr 2022 • Pengpeng Shao, Tong Liu, Feihu Che, Dawei Zhang, JianHua Tao
Specifically, we design the policy network in our model as a pseudo-siamese policy network that consists of two sub-policy networks.
no code implementations • 25 Mar 2022 • Haiyang Sun, Zheng Lian, Bin Liu, Ying Li, Licai Sun, Cong Cai, JianHua Tao, Meng Wang, Yuan Cheng
Speech emotion recognition (SER) is an important research topic in human-computer interaction.
no code implementations • 5 Mar 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen
We have also verified through experiments that this method can effectively control the noise components in the predicted speech and adjust the SNR of speech.
1 code implementation • 4 Mar 2022 • Zheng Lian, Lan Chen, Licai Sun, Bin Liu, JianHua Tao
To this end, we propose a novel framework for incomplete multimodal learning in conversations, called "Graph Complete Network (GCNet)", filling the gap of existing works.
3 code implementations • 21 Feb 2022 • Tao Wang, Jiangyan Yi, Ruibo Fu, JianHua Tao, Zhengqi Wen
It can solve unnatural prosody in the edited region and synthesize the speech corresponding to the unseen words in the transcript.
no code implementations • 19 Feb 2022 • Feihu Che, Guohua Yang, Pengpeng Shao, Dawei Zhang, JianHua Tao
The representations of entities and relations are learned via contrasting the positive and negative triplets.
no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Xiaohui Zhang, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu
Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.
no code implementations • 16 Feb 2022 • Tao Wang, Ruibo Fu, Jiangyan Yi, JianHua Tao, Zhengqi Wen
Firstly, we propose a global duration control attention mechanism for the SVS model.
no code implementations • 28 Jan 2022 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, JianHua Tao, Yu Ting Yeung, Liqun Deng
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 17 Dec 2021 • Zepeng Huai, JianHua Tao, Feihu Che, Guohua Yang, Dawei Zhang
This is attributed to the rich attribute information contained in KG to improve item and user representations as side information.
no code implementations • 6 Jul 2021 • Pengpeng Shao, Tong Liu, Dawei Zhang, JianHua Tao, Feihu Che, Guohua Yang
In this paper, we propose a Multi-Level Graph Contrastive Learning (MLGCL) framework for learning robust representation of graph data by contrasting space views of graphs.
no code implementations • 15 Apr 2021 • Haoxin Ma, Jiangyan Yi, JianHua Tao, Ye Bai, Zhengkun Tian, Chenglong Wang
However, fine-tuning leads to performance degradation on previous data.
1 code implementation • 8 Apr 2021 • Jiangyan Yi, Ye Bai, JianHua Tao, Haoxin Ma, Zhengkun Tian, Chenglong Wang, Tao Wang, Ruibo Fu
Therefore, this paper develops such a dataset for half-truth audio detection (HAD).
no code implementations • 7 Apr 2021 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen
It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence.
1 code implementation • 4 Apr 2021 • Zhengkun Tian, Jiangyan Yi, JianHua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu
To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model.
no code implementations • 15 Feb 2021 • Ye Bai, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang
Based on this idea, we propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once).
1 code implementation • 16 Nov 2020 • Pengpeng Shao, Guohua Yang, Dawei Zhang, JianHua Tao, Feihu Che, Tong Liu
Developing the model for temporal knowledge graphs completion is an increasingly important task.
no code implementations • 11 Nov 2020 • Cunhang Fan, Bin Liu, JianHua Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
no code implementations • 10 Nov 2020 • Feihu Che, Guohua Yang, Dawei Zhang, JianHua Tao, Pengpeng Shao, Tong Liu
In addition, we summarize three kinds of augmentation methods for graph-structured data and apply them to the DGB.
no code implementations • 9 Nov 2020 • Cunhang Fan, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen
The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 28 Oct 2020 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, JianHua Tao, Zhengqi Wen
In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • Interspeech 2020 • Zheng Lian, JianHua Tao, Bin Liu, Jian Huang, Zhanlei Yang, Rongjun Li
Emotion recognition remains a complex task due to speaker variations and low-resource training samples.
Ranked #2 on
Multimodal Emotion Recognition
on IEMOCAP-4
(Accuracy metric)
no code implementations • 28 Oct 2020 • Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen
Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder and the two-stage inference method into the streaming CTC model.
no code implementations • Pattern Recognition 2020 • Bocheng Zhao, JianHua Tao, Minghao Yang, Zhengkun Tian, Cunhang Fan, Ye Bai
Calligraphy imitation (CI) from a handful of target handwriting samples is such a challenging task that most of the existing writing style analysis or handwriting generation methods do not exhibit satisfactory performance.