no code implementations • CCL 2021 • Jimin Zhang, Kerekadeer Zao, Yunfei Shen, Shanwumaier Ai, Liejun Wang
“目前, 开源的中文语音识别数据集多为面向通用领域, 缺少面向新闻领域的开源语音识别语料库, 因此本文构建了面向新闻领域的中文语音识别数据集CHNEWSASR并使用ESPNET-0. 9. 6框架的RNN、Transformer和Conformer等模型对数据集的有效性进行了验证, 实验表明本文所构建的语料在最好的模型上CER为4. 8%, SER为39. 4%。由于新闻联播主持人说话语速相对较快, 本文构建的数据集文本平均长度为28个字符是Aishell1数据集文本平均长度的2倍, 且以往的研究中训练目标函数通常为基于字或词水平, 缺乏明确的句子水平关系, 因此本文提出了一个句子层级的一致性模块与Conformer模型结合直接减少源语音和目标文本的表示差异, 在开源的Aishell1数据集上其CER降低0. 4%, SER降低2%;在CHNEWSASR数据集上其CER降低0. 9%, SER降低3%, 实验结果表明该方法不提升模型参数量的前提下能有效提升语音识别的质量。”
no code implementations • 12 Apr 2025 • Yubing Cao, Yinfeng Yu, Yongming Li, Liejun Wang
This paper presents AMNet, an Acoustic Model Network designed to improve the performance of Mandarin speech synthesis by incorporating phrase structure annotation and local convolution modules.
no code implementations • 7 Apr 2025 • Xuechun Shao, Yinfeng Yu, Liejun Wang
We introduce a novel model called Label Signal-Guided Multimodal Emotion Recognition (LSGMER) to overcome this limitation.
no code implementations • 27 Mar 2025 • Alimjan Mattursun, Liejun Wang, Yinfeng Yu, Chunyang Ma
Speech self-supervised learning (SSL) has made great progress in various speech processing tasks, but there is still room for improvement in speech enhancement (SE).
no code implementations • 7 Jan 2025 • Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao
Multimodal Sentiment Analysis (MSA) integrates diverse modalities(text, audio, and video) to comprehensively analyze and understand individuals' emotional states.
1 code implementation • 8 Nov 2024 • Jiaren Peng, Hongda Sun, Wenzhong Yang, Fuyuan Wei, Liang He, Liejun Wang
The first method introduces the Co and Structure Event Argument Extraction model (CsEAE) based on Small Language Models (SLMs).
1 code implementation • 10 Sep 2024 • Hui-Yue Yang, Hui Chen, Lihao Liu, Zijia Lin, Kai Chen, Liejun Wang, Jungong Han, Guiguang Ding
By incorporating the RASFormer block, our RAS method achieves superior contextual awareness capabilities, leading to remarkable performance.
Multi-class Anomaly Detection
Unsupervised Anomaly Detection
no code implementations • 13 Aug 2024 • Yubing Cao, Yongming Li, Liejun Wang, Yinfeng Yu
Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained.
no code implementations • 13 Aug 2024 • Tao Zheng, Liejun Wang, Yinfeng Yu
Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research.
1 code implementation • 13 Aug 2024 • Alimjan Mattursun, Liejun Wang, Yinfeng Yu
BSS-CFFMA comprises a multi-scale cross-domain feature fusion (MSCFF) block and a residual hybrid multi-attention (RHMA) block.
no code implementations • 17 Jul 2024 • Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao
In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions.
no code implementations • Sci Rep 14, 15013 2024 • Shiwei Liu, Wenwen Yue, Zhiqing Guo, Liejun Wang
In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images.
no code implementations • 21 Apr 2024 • Xinxin Jiao, Liejun Wang, Yinfeng Yu
This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio.
3 code implementations • 16 Apr 2024 • Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi
In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.
no code implementations • 4 Oct 2022 • Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang
Audio-visual embodied navigation, as a hot research topic, aims training a robot to reach an audio target using egocentric visual (from the sensors mounted on the robot) and audio (emitted from the target) input.
no code implementations • 16 Oct 2020 • Xinyu Huang, Lijun He, Xing Chen, Liejun Wang, Fan Li
In this paper, we propose a joint task type and vehicle speed-aware task offloading and resource allocation strategy to decrease the vehicl's energy cost for executing tasks and increase the revenue of the vehicle for processing tasks within the delay constraint.