Search Results for author: Li-Rong Dai

Found 33 papers, 8 papers with code

A Tree-Structured Decoder for Image-to-Markup Generation

1 code implementation ICML 2020 Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Li-Rong Dai

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup.

Math

Adaptive Confidence Multi-View Hashing for Multimedia Retrieval

1 code implementation12 Dec 2023 Jian Zhu, Yu Cui, Zhangmin Huang, Xingyu Li, Lei Liu, Lingfang Zeng, Li-Rong Dai

Furthermore, an adaptive confidence multi-view network is employed to measure the confidence of each view and then fuse multi-view features through a weighted summation.

Retrieval

CASA-ASR: Context-Aware Speaker-Attributed ASR

no code implementations21 May 2023 Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.

Automatic Speech Recognition speech-recognition +1

Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection

no code implementations20 May 2023 Xiao-Min Zeng, Yan Song, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, Li-Rong Dai, Ian McLoughlin

In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD).

Contrastive Learning Representation Learning

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

no code implementations7 Mar 2023 Kang Li, Yan Song, Li-Rong Dai, Ian McLoughlin, Xin Fang, Lin Liu

In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED.

Audio Tagging Event Detection +1

Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning

1 code implementation27 Oct 2022 Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shu-Jie Liu, Yu-Chen Hu, Li-Rong Dai

Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

no code implementations26 May 2022 Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai

Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

no code implementations5 Apr 2022 Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang

Unpaired data has shown to be beneficial for low-resource automatic speech recognition~(ASR), which can be involved in the design of hybrid models with multi-task training or language model dependent pre-training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition

no code implementations15 Feb 2022 Zi-Qiang Zhang, Jie Zhang, Jian-Shu Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai

The proposed approach explores both the complementarity of audio-visual modalities and long-term context dependency using a transformer-based fusion module and a flexible masking strategy.

Audio-Visual Speech Recognition Lipreading +4

Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals

no code implementations22 Jan 2022 Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai

By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task.

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition

no code implementations15 Mar 2021 Zi-Qiang Zhang, Yan Song, Ming-Hui Wu, Xin Fang, Li-Rong Dai

In this paper, we propose a weakly supervised multilingual representation learning framework, called cross-lingual self-training (XLST).

Data Augmentation Representation Learning +2

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

no code implementations28 Dec 2020 Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin

In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.

Lip Reading

Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement

no code implementations21 Sep 2020 Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee

We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).

Speech Enhancement

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

no code implementations3 Sep 2020 Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai

In this paper, we present a ASR-TTS method for voice conversion, which used iFLYTEK ASR engine to transcribe the source speech into text and a Transformer TTS model with WaveNet vocoder to synthesize the converted speech from the decoded text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

1 code implementation25 Jun 2019 Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion is achieved by preserving the linguistic representations of source utterances while replacing the speaker representations with the target ones.

Audio and Speech Processing Sound

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

no code implementations21 Jun 2019 Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).

Singing Voice Synthesis

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

no code implementations28 Mar 2019 Lanhua You, Wu Guo, Li-Rong Dai, Jun Du

In this paper, gating mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification.

Text-Independent Speaker Verification

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

no code implementations18 Jul 2018 Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis.

Acoustic Modelling Speech Synthesis

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

1 code implementation4 Mar 2018 Shiliang Zhang, Ming Lei, Zhijie Yan, Li-Rong Dai

In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM.

Language Modelling speech-recognition +1

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

no code implementations22 Jan 2018 Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai

The RNN decoder aims at generating the caption by detecting radicals and spatial structures through an attention model.

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

2 code implementations5 Jan 2018 Jianshu Zhang, Jun Du, Li-Rong Dai

Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols.

Math

A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition

1 code implementation4 Dec 2017 Jianshu Zhang, Jun Du, Li-Rong Dai

In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER).

Radical analysis network for zero-shot learning in printed Chinese character recognition

no code implementations3 Nov 2017 Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai

Chinese characters have a huge set of character categories, more than 20, 000 and the number is still increasing as more and more novel characters continue being created.

Zero-Shot Learning

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

no code implementations21 Mar 2017 Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.

Sound

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering

no code implementations14 Mar 2017 Junbei Zhang, Xiaodan Zhu, Qian Chen, Li-Rong Dai, Si Wei, Hui Jiang

The last several years have seen intensive interest in exploring neural-network-based models for machine comprehension (MC) and question answering (QA).

Question Answering Reading Comprehension

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

no code implementations28 Dec 2015 Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu

In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback.

Language Modelling speech-recognition +3

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

no code implementations9 Oct 2015 ShiLiang Zhang, Hui Jiang, Si Wei, Li-Rong Dai

We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback.

Language Modelling

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

1 code implementation6 May 2015 Shiliang Zhang, Hui Jiang, MingBin Xu, JunFeng Hou, Li-Rong Dai

In this paper, we propose the new fixed-size ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation.

Cannot find the paper you are looking for? You can Submit a new open access paper.