This paper presents our system for the Multi-Task Learning (MTL) Challenge in the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.
End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder.
In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9, 082 turns and 24, 449 utterances.
Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity.
Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora.
However, in real-world applications, we often encounter the problem of missing modality, and which modalities will be missing is uncertain.
Emotion recognition in conversation (ERC) is a crucial component in affective dialogue systems, which helps the system understand users' emotions and generate empathetic responses.
2 code implementations • 11 Mar 2021 • Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, ShiZhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen
We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model.
Ranked #1 on Image-to-Text Retrieval on AIC-ICC
Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains.