Search Results for author: Liejun Wang

Found 16 papers, 4 papers with code

基于改进Conformer的新闻领域端到端语音识别(End-to-End Speech Recognition in News Field based on Conformer)

no code implementations CCL 2021 Jimin Zhang, Kerekadeer Zao, Yunfei Shen, Shanwumaier Ai, Liejun Wang

“目前, 开源的中文语音识别数据集多为面向通用领域, 缺少面向新闻领域的开源语音识别语料库, 因此本文构建了面向新闻领域的中文语音识别数据集CHNEWSASR并使用ESPNET-0. 9. 6框架的RNN、Transformer和Conformer等模型对数据集的有效性进行了验证, 实验表明本文所构建的语料在最好的模型上CER为4. 8%, SER为39. 4%。由于新闻联播主持人说话语速相对较快, 本文构建的数据集文本平均长度为28个字符是Aishell1数据集文本平均长度的2倍, 且以往的研究中训练目标函数通常为基于字或词水平, 缺乏明确的句子水平关系, 因此本文提出了一个句子层级的一致性模块与Conformer模型结合直接减少源语音和目标文本的表示差异, 在开源的Aishell1数据集上其CER降低0. 4%, SER降低2%;在CHNEWSASR数据集上其CER降低0. 9%, SER降低3%, 实验结果表明该方法不提升模型参数量的前提下能有效提升语音识别的质量。”

speech-recognition Speech Recognition

AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis

no code implementations12 Apr 2025 Yubing Cao, Yinfeng Yu, Yongming Li, Liejun Wang

This paper presents AMNet, an Acoustic Model Network designed to improve the performance of Mandarin speech synthesis by incorporating phrase structure annotation and local convolution modules.

Speech Synthesis

Leveraging Label Potential for Enhanced Multimodal Emotion Recognition

no code implementations7 Apr 2025 Xuechun Shao, Yinfeng Yu, Liejun Wang

We introduce a novel model called Label Signal-Guided Multimodal Emotion Recognition (LSGMER) to overcome this limitation.

Multimodal Emotion Recognition

Magnitude-Phase Dual-Path Speech Enhancement Network based on Self-Supervised Embedding and Perceptual Contrast Stretch Boosting

no code implementations27 Mar 2025 Alimjan Mattursun, Liejun Wang, Yinfeng Yu, Chunyang Ma

Speech self-supervised learning (SSL) has made great progress in various speech processing tasks, but there is still room for improvement in speech enhancement (SE).

Self-Supervised Learning Speech Enhancement

Modality-Invariant Bidirectional Temporal Representation Distillation Network for Missing Multimodal Sentiment Analysis

no code implementations7 Jan 2025 Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao

Multimodal Sentiment Analysis (MSA) integrates diverse modalities(text, audio, and video) to comprehensively analyze and understand individuals' emotional states.

Multimodal Sentiment Analysis Representation Learning

One Small and One Large for Document-level Event Argument Extraction

1 code implementation8 Nov 2024 Jiaren Peng, Hongda Sun, Wenzhong Yang, Fuyuan Wei, Liang He, Liejun Wang

The first method introduces the Co and Structure Event Argument Extraction model (CsEAE) based on Small Language Models (SLMs).

Event Argument Extraction

VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders

no code implementations13 Aug 2024 Yubing Cao, Yongming Li, Liejun Wang, Yinfeng Yu

Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained.

Speech Synthesis

Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement

no code implementations13 Aug 2024 Tao Zheng, Liejun Wang, Yinfeng Yu

Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research.

Self-Supervised Learning Speech Enhancement

PCQ: Emotion Recognition in Speech via Progressive Channel Querying

no code implementations17 Jul 2024 Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao

In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions.

Speech Emotion Recognition

Multi-branch CNN and grouping cascade attention for medical image classification

no code implementations Sci Rep 14, 15013 2024 Shiwei Liu, Wenwen Yue, Zhiqing Guo, Liejun Wang

In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images.

Image Classification Medical Image Analysis +1

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

no code implementations21 Apr 2024 Xinxin Jiao, Liejun Wang, Yinfeng Yu

This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio.

Speech Emotion Recognition

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

3 code implementations16 Apr 2024 Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi

In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.

Image Super-Resolution

Pay Self-Attention to Audio-Visual Navigation

no code implementations4 Oct 2022 Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang

Audio-visual embodied navigation, as a hot research topic, aims training a robot to reach an audio target using egocentric visual (from the sensors mounted on the robot) and audio (emitted from the target) input.

Visual Navigation

Revenue and Energy Efficiency-Driven Delay Constrained Computing Task Offloading and Resource Allocation in a Vehicular Edge Computing Network: A Deep Reinforcement Learning Approach

no code implementations16 Oct 2020 Xinyu Huang, Lijun He, Xing Chen, Liejun Wang, Fan Li

In this paper, we propose a joint task type and vehicle speed-aware task offloading and resource allocation strategy to decrease the vehicl's energy cost for executing tasks and increase the revenue of the vehicle for processing tasks within the delay constraint.

Deep Reinforcement Learning Edge-computing

Cannot find the paper you are looking for? You can Submit a new open access paper.