Search Results for author: Li-Rong Dai

Found 33 papers, 8 papers with code

A Tree-Structured Decoder for Image-to-Markup Generation

1 code implementation • ICML 2020 • Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Li-Rong Dai

Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup.

Math

Paper
Code

Adaptive Confidence Multi-View Hashing for Multimedia Retrieval

1 code implementation • 12 Dec 2023 • Jian Zhu, Yu Cui, Zhangmin Huang, Xingyu Li, Lei Liu, Lingfang Zeng, Li-Rong Dai

Furthermore, an adaptive confidence multi-view network is employed to measure the confidence of each view and then fuse multi-view features through a weighted summation.

Retrieval

Paper
Code

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

no code implementations • 21 May 2023 • Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

For speech interaction, voice activity detection (VAD) is often used as a front-end.

Action Detection Activity Detection +4

Paper
Add Code

CASA-ASR: Context-Aware Speaker-Attributed ASR

no code implementations • 21 May 2023 • Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection

no code implementations • 20 May 2023 • Xiao-Min Zeng, Yan Song, Zhu Zhuo, Yu Zhou, Yu-Hong Li, Hui Xue, Li-Rong Dai, Ian McLoughlin

In this paper, we propose a joint generative and contrastive representation learning method (GeCo) for anomalous sound detection (ASD).

Contrastive Learning Representation Learning

Paper
Add Code

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

no code implementations • 7 Mar 2023 • Kang Li, Yan Song, Li-Rong Dai, Ian McLoughlin, Xin Fang, Lin Liu

In this paper, we propose an effective sound event detection (SED) method based on the audio spectrogram transformer (AST) model, pretrained on the large-scale AudioSet for audio tagging (AT) task, termed AST-SED.

Audio Tagging Event Detection +1

Paper
Add Code

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations • 1 Nov 2022 • Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai

Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning

1 code implementation • 27 Oct 2022 • Qiu-Shi Zhu, Long Zhou, Jie Zhang, Shu-Jie Liu, Yu-Chen Hu, Li-Rong Dai

Self-supervised pre-training methods based on contrastive learning or regression tasks can utilize more unlabeled data to improve the performance of automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

no code implementations • 26 May 2022 • Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Li-Rong Dai

Speech enhancement (SE) is usually required as a front end to improve the speech quality in noisy environments, while the enhanced speech might not be optimal for automatic speech recognition (ASR) systems due to speech distortion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

no code implementations • 5 Apr 2022 • Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang

Unpaired data has shown to be beneficial for low-resource automatic speech recognition~(ASR), which can be involved in the design of hybrid models with multi-task training or language model dependent pre-training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition

no code implementations • 15 Feb 2022 • Zi-Qiang Zhang, Jie Zhang, Jian-Shu Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai

The proposed approach explores both the complementarity of audio-visual modalities and long-term context dependency using a transformer-based fusion module and a flexible masking strategy.

Audio-Visual Speech Recognition Lipreading +4

Paper
Add Code

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

no code implementations • 22 Jan 2022 • Qiu-Shi Zhu, Jie Zhang, Zi-Qiang Zhang, Ming-Hui Wu, Xin Fang, Li-Rong Dai

In this work, we therefore first analyze the noise robustness of wav2vec2. 0 via experiments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals

no code implementations • 22 Jan 2022 • Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai

By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task.

Paper
Add Code

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition

no code implementations • 15 Mar 2021 • Zi-Qiang Zhang, Yan Song, Ming-Hui Wu, Xin Fang, Li-Rong Dai

In this paper, we propose a weakly supervised multilingual representation learning framework, called cross-lingual self-training (XLST).

Data Augmentation Representation Learning +2

Paper
Add Code

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

no code implementations • 28 Dec 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin

In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.

Lip Reading

Paper
Add Code

Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement

no code implementations • 21 Sep 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee

We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).

Speech Enhancement

Paper
Add Code

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

no code implementations • 3 Sep 2020 • Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai

In this paper, we present a ASR-TTS method for voice conversion, which used iFLYTEK ASR engine to transcribe the source speech into text and a Transformer TTS model with WaveNet vocoder to synthesize the converted speech from the decoded text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition

no code implementations • 6 Aug 2020 • Liangfa Wei, Jie Zhang, JunFeng Hou, Li-Rong Dai

The proposed method can sufficiently combine the two streams and weaken the over-reliance on the audio modality.

Robust Speech Recognition speech-recognition

Paper
Add Code

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

1 code implementation • 25 Jun 2019 • Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion is achieved by preserving the linguistic representations of source utterances while replacing the speaker representations with the target ones.

Audio and Speech Processing Sound

246

Paper
Code

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

no code implementations • 21 Jun 2019 • Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).

Singing Voice Synthesis

Paper
Add Code

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

no code implementations • 28 Mar 2019 • Lanhua You, Wu Guo, Li-Rong Dai, Jun Du

In this paper, gating mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification.

Text-Independent Speaker Verification

Paper
Add Code

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

no code implementations • 18 Jul 2018 • Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis.

Acoustic Modelling Speech Synthesis

Paper
Add Code

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

1 code implementation • 4 Mar 2018 • Shiliang Zhang, Ming Lei, Zhijie Yan, Li-Rong Dai

In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM.

Language Modelling speech-recognition +1

Paper
Code

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

no code implementations • 22 Jan 2018 • Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai

The RNN decoder aims at generating the caption by detecting radicals and spatial structures through an attention model.

Paper
Add Code

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

2 code implementations • 5 Jan 2018 • Jianshu Zhang, Jun Du, Li-Rong Dai

Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols.

Math

247

Paper
Code

A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition

1 code implementation • 4 Dec 2017 • Jianshu Zhang, Jun Du, Li-Rong Dai

In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER).

Paper
Code

Radical analysis network for zero-shot learning in printed Chinese character recognition

no code implementations • 3 Nov 2017 • Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai

Chinese characters have a huge set of character categories, more than 20, 000 and the number is still increasing as more and more novel characters continue being created.

Zero-Shot Learning

Paper
Add Code

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

no code implementations • 21 Mar 2017 • Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.

Sound

Paper
Add Code

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering

no code implementations • 14 Mar 2017 • Junbei Zhang, Xiaodan Zhu, Qian Chen, Li-Rong Dai, Si Wei, Hui Jiang

The last several years have seen intensive interest in exploring neural-network-based models for machine comprehension (MC) and question answering (QA).

Ranked #39 on Question Answering on SQuAD1.1 dev

Question Answering Reading Comprehension

Paper
Add Code

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

no code implementations • 28 Dec 2015 • Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu

In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback.

Language Modelling speech-recognition +3

Paper
Add Code

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

no code implementations • 9 Oct 2015 • ShiLiang Zhang, Hui Jiang, Si Wei, Li-Rong Dai

We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback.

Language Modelling

Paper
Add Code

The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models

no code implementations • IJCNLP 2015 • ShiLiang Zhang, Hui Jiang, MingBin Xu, JunFeng Hou, Li-Rong Dai

Information Retrieval Language Modelling +2

Paper
Add Code

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

1 code implementation • 6 May 2015 • Shiliang Zhang, Hui Jiang, MingBin Xu, JunFeng Hou, Li-Rong Dai

In this paper, we propose the new fixed-size ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.