2 code implementations • 11 Mar 2021 • Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, ShiZhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen
We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model.
Ranked #1 on Image Retrieval on RUC-CAS-WenLan
3 code implementations • 18 Apr 2023 • Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
1 code implementation • ACL 2022 • Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li
In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9, 082 turns and 24, 449 utterances.
Cultural Vocal Bursts Intensity Prediction Emotion Recognition
1 code implementation • ACL 2021 • Jingwen Hu, Yuchen Liu, Jinming Zhao, Qin Jin
Emotion recognition in conversation (ERC) is a crucial component in affective dialogue systems, which helps the system understand users' emotions and generate empathetic responses.
1 code implementation • ACL 2021 • Jinming Zhao, Ruichen Li, Qin Jin
However, in real-world applications, we often encounter the problem of missing modality, and which modalities will be missing is uncertain.
1 code implementation • 17 Jul 2020 • Jinming Zhao, Ming Liu, Longxiang Gao, Yuan Jin, Lan Du, He Zhao, He Zhang, Gholamreza Haffari
Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains.
1 code implementation • 27 Oct 2022 • Haolin Zuo, Rui Liu, Jinming Zhao, Guanglai Gao, Haizhou Li
Multimodal emotion recognition leverages complementary information across modalities to gain performance.
1 code implementation • 17 Oct 2022 • Tongtong Wu, Guitao Wang, Jinming Zhao, Zhaoran Liu, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari
We explore speech relation extraction via two approaches: the pipeline approach conducting text-based extraction with a pretrained ASR module, and the end2end approach via a new proposed encoder-decoder model, or what we called SpeechRE.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 19 Jul 2022 • Tenggan Zhang, Chuanhe Liu, Xiaolong Liu, Yuchen Liu, Liyu Meng, Lei Sun, Wenqiang Jiang, Fengyuan Zhang, Jinming Zhao, Qin Jin
This paper presents our system for the Multi-Task Learning (MTL) Challenge in the 4th Affective Behavior Analysis in-the-wild (ABAW) competition.
1 code implementation • 15 Oct 2022 • Jinming Zhao, Gholamreza Haffar, Ehsan Shareghi
Training end-to-end speech translation (ST) systems requires sufficiently large-scale data, which is unavailable for most language pairs and domains.
1 code implementation • EMNLP 2021 • Jinming Zhao, Philip Arthur, Gholamreza Haffari, Trevor Cohn, Ehsan Shareghi
Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora.
1 code implementation • 3 Jul 2022 • Jinming Zhao, Hao Yang, Ehsan Shareghi, Gholamreza Haffari
End-to-end speech-to-text translation models are often initialized with pre-trained speech encoder and pre-trained text decoder.
1 code implementation • 28 May 2023 • Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi
Pre-trained speech encoders have been central to pushing state-of-the-art results across various speech understanding and generation tasks.
1 code implementation • 24 Oct 2022 • Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi
Pre-trained speech Transformers have facilitated great success across various speech processing tasks.
no code implementations • 27 Oct 2021 • Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li
Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity.
no code implementations • COLING 2022 • Yuchen Liu, Jinming Zhao, Jingwen Hu, Ruichen Li, Qin Jin
Emotion Recognition in Conversation (ERC) has attracted increasing attention in the affective computing research field.
no code implementations • 16 Oct 2022 • Jinming Zhao, Hao Yang, Gholamreza Haffari, Ehsan Shareghi
Pre-trained speech Transformers in speech translation (ST) have facilitated state-of-the-art (SotA) results; yet, using such encoders is computationally expensive.
no code implementations • 23 Apr 2023 • Jinming Zhao, Yuka Ko, Kosuke Doi, Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
Research has been limited due to the lack of a large-scale training corpus.
no code implementations • 13 Sep 2023 • Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan Shareghi, Gholamreza Haffari
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
1 code implementation • 27 Jan 2024 • Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari
While text-based event extraction has been an active research area and has seen successful application in many domains, extracting semantic events from speech directly is an under-explored problem.
no code implementations • 20 Apr 2024 • Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari
To address the challenges of catastrophic forgetting and effective disentanglement, we propose a novel method, 'Double Mixture.'