Search Results for author: Jun Du

Found 86 papers, 25 papers with code

Joint Optimization of Communication Enhancement and Location Privacy Protection in RIS-Assisted Underwater Communication System

no code implementations30 Nov 2024 Ziqi Chen, Jun Du, Chunxiao Jiang, Zhu Han

However, in open underwater environments, the location of the source node is highly susceptible to being obtained by eavesdropping nodes through correlation analysis, leading to the issue of location privacy in underwater communication systems, which has been overlooked by many studies.

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion

no code implementations23 Nov 2024 Haotian Wang, Yuzhe Weng, Yueyan Li, Zilu Guo, Jun Du, Shutong Niu, Jiefeng Ma, Shan He, Xiaoyan Wu, Qiming Hu, Bing Yin, Cong Liu, Qingfeng Liu

Diffusion models have revolutionized the field of talking head generation, yet still face challenges in expressiveness, controllability, and stability in long-time generation.

Talking Head Generation

MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance Estimation

no code implementations21 Nov 2024 Hengyi Hong, Qing Wang, Jun Du, Ruoyu Wei, Mingqi Cai, Xin Fang

We propose a novel output representation that combines the DOA with distance of sound sources by calculating the real Cartesian coordinates to address the newly introduced source distance estimation (SDE) task in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge.

Data Augmentation Sound Event Localization and Detection

DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions

no code implementations11 Nov 2024 Shu-Tong Niu, Jun Du, Ruo-Yu Wang, Gao-Bin Yang, Tian Gao, Jia Pan, Yu Hu

First, we sequentially integrate the NSD and SS modules within a joint training framework, enabling the separation module to leverage speaker time boundaries from the diarization module effectively.

speaker-diarization Speaker Diarization +4

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

1 code implementation19 Oct 2024 Yuzhe Weng, Haotian Wang, Tian Gao, Kewei Li, Shutong Niu, Jun Du

In multimodal sentiment analysis, collecting text data is often more challenging than video or audio due to higher annotation costs and inconsistent automatic speech recognition (ASR) quality.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

1 code implementation17 Oct 2024 Hanbo Cheng, Limin Lin, Chenyu Liu, Pengcheng Xia, Pengfei Hu, Jiefeng Ma, Jun Du, Jia Pan

To address these challenges, we present DAWN (Dynamic frame Avatar With Non-autoregressive diffusion), a framework that enables all-at-once generation of dynamic-length video sequences.

Talking Head Generation Video Generation

The USTC-NERCSLIP Systems for the CHiME-8 MMCSG Challenge

no code implementations8 Oct 2024 Ya Jiang, Hongbo Lan, Jun Du, Qing Wang, Shutong Niu

In the two-person conversation scenario with one wearing smart glasses, transcribing and displaying the speaker's content in real-time is an intriguing application, providing a priori information for subsequent tasks such as translation and comprehension.

speech-recognition Speech Recognition

See then Tell: Enhancing Key Information Extraction with Vision Grounding

no code implementations29 Sep 2024 Shuhang Liu, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Qing Wang, Jianshu Zhang, Chenyu Liu

Positioned at the outset of the answer text, the <see> token allows the model to first see--observing the regions of the image related to the input question--and then tell--providing articulated textual responses.

Image to text Key Information Extraction +4

Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings

no code implementations25 Sep 2024 Ruoyu Wang, Shutong Niu, Gaobin Yang, Jun Du, Shuangqing Qian, Tian Gao, Jia Pan

This paper proposes a three-stage modular system to enhance single-channel neural speaker diarization systems and recognition performance by utilizing spatial cues from multi-channel speech to provide more accurate initialization for each stage of neural speaker diarization (NSD) decoding: (1) Overlap detection and continuous speech separation (CSS) on multi-channel speech are used to obtain cleaner single speaker speech segments for clustering, followed by the first NSD decoding pass.

Clustering speaker-diarization +2

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

no code implementations9 Sep 2024 Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, BinBin Zhang, Bin Jia

The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images

no code implementations24 Aug 2024 Tianxiang Huang, Jing Shi, Ge Jin, Juncheng Li, Jun Wang, Jun Du, Jun Shi

In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance.

NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

no code implementations16 Jul 2024 Chenyu Liu, Jia Pan, Jinshui Hu, BaoCai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du, Qingfeng Liu

Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding.

Decoder document understanding +1

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

no code implementations13 Jun 2024 Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang

Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding.

Relation Prediction

A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

1 code implementation27 May 2024 Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

1 code implementation24 May 2024 Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering an innovative approach to synthesizing musical content from textual descriptions.

Diversity Music Generation +1

SEMv3: A Fast and Robust Approach to Table Separation Line Detection

1 code implementation20 May 2024 Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du

The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial.

Line Detection

Multitask frame-level learning for few-shot sound event detection

no code implementations17 Mar 2024 Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang

This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples.

Data Augmentation Event Detection +1

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

1 code implementation CVPR 2024 Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee

In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.

Audio-Visual Speech Recognition Knowledge Distillation +2

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

1 code implementation17 Sep 2023 Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.

speaker-diarization Speaker Diarization

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

no code implementations15 Sep 2023 Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.

Audio-Visual Speech Recognition speech-recognition +2

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

no code implementations28 Aug 2023 Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.

speaker-diarization Speaker Diarization +2

Count, Decode and Fetch: A New Approach to Handwritten Chinese Character Error Correction

no code implementations30 Jul 2023 Pengfei Hu, Jiefeng Ma, Zhenrong Zhang, Jun Du, Jianshu Zhang

This poses a challenge when dealing with an unseen misspelled character, as the decoder may generate an IDS sequence that matches a seen character instead.

Decoder Transfer Learning

Semi-supervised multi-channel speaker diarization with cross-channel attention

no code implementations17 Jul 2023 Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee

Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios.

speaker-diarization Speaker Diarization

HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures

1 code implementation24 Mar 2023 Jiefeng Ma, Jun Du, Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Huihui Zhu, Cong Liu

Moreover, we proposed an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem.

Decoder

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

1 code implementation6 Dec 2022 Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Jun Du, Jiajia Wu

Next, to parse the hierarchical relationship between the heading entities, a tree-structured decoder is designed.

Decoder document understanding +2

Gradient and Channel Aware Dynamic Scheduling for Over-the-Air Computation in Federated Edge Learning Systems

no code implementations1 Dec 2022 Jun Du, Bingqing Jiang, Chunxiao Jiang, Yuanming Shi, Zhu Han

To further improve the efficiency of wireless data aggregation and model learning, over-the-air computation (AirComp) is emerging as a promising solution by using the superposition characteristics of wireless channels.

Federated Learning Privacy Preserving +1

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function

no code implementations26 Oct 2022 Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du, Chin-Hui Lee

In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function.

Active Speaker Detection Sound Source Localization

Vision-Language Adaptive Mutual Decoder for OOV-STR

no code implementations2 Sep 2022 Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, LiRong Dai

However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings.

Decoder Language Modelling +2

Convergence Theory of Generalized Distributed Subgradient Method with Random Quantization

no code implementations22 Jul 2022 Zhaoyue Xia, Jun Du, Yong Ren

Compared with perfect data, quantization poses fundamental challenges on loss of data accuracy, which further impacts the convergence of the algorithms.

Distributed Optimization Quantization

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

no code implementations17 Feb 2022 Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.

Network Pruning

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

no code implementations10 Feb 2022 Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee

We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.

Action Detection Activity Detection +2

Underwater Differential Game: Finite-Time Target Hunting Task with Communication Delay

no code implementations1 Feb 2022 Wei Wei, Jingjing Wang, Jun Du, Zhengru Fang, Chunxiao Jiang, Yong Ren

Simulations show that underwater disturbances have a large impact on the system considering communication delay.

Deep Reinforcement Learning

SDN-based Resource Allocation in Edge and Cloud Computing Systems: An Evolutionary Stackelberg Differential Game Approach

no code implementations26 Sep 2021 Jun Du, Chunxiao Jiang, Abderrahim Benslimane, Song Guo, Yong Ren

Based on this dynamic access model, a Stackelberg differential game based cloud computing resource sharing mechanism is proposed to facilitate the resource trading between the cloud computing service provider (CCP) and different edge computing service providers (ECPs).

Cloud Computing Edge-computing +1

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

no code implementations7 Aug 2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +3

Split, embed and merge: An accurate table structure recognizer

no code implementations12 Jul 2021 Zhenrong Zhang, Jianshu Zhang, Jun Du

However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand easily, especially for complex tables.

Table Recognition

Separation Guided Speaker Diarization in Realistic Mismatched Conditions

no code implementations6 Jul 2021 Shu-Tong Niu, Jun Du, Lei Sun, Chin-Hui Lee

We propose a separation guided speaker diarization (SGSD) approach by fully utilizing a complementarity of speech separation and speaker clustering.

Clustering speaker-diarization +2

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention

no code implementations28 Dec 2020 Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin

In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.

Lip Reading

The Third DIHARD Diarization Challenge

3 code implementations2 Dec 2020 Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.

speaker-diarization Speaker Diarization +1

Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

no code implementations8 Nov 2020 Koen Oostermeijer, Qing Wang, Jun Du

One of the strengths of traditional convolutional neural networks (CNNs) is their inherent translational invariance.

Speech Enhancement

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

1 code implementation3 Nov 2020 Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.

Acoustic Scene Classification Classification +4

A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement

no code implementations25 Oct 2020 Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan

We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions.

regression Speech Enhancement

Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement

no code implementations21 Sep 2020 Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee

We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).

Speech Enhancement

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression

no code implementations12 Aug 2020 Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression.

regression Speech Enhancement

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression

no code implementations4 Aug 2020 Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

In this paper, we show that, in vector-to-vector regression utilizing deep neural networks (DNNs), a generalized loss of mean absolute error (MAE) between the predicted and expected feature vectors is upper bounded by the sum of an approximation error, an estimation error, and an optimization error.

Learning Theory regression +2

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances

no code implementations31 Jul 2020 Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee

In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i. e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification.

Acoustic Scene Classification Classification +5

Stroke Constrained Attention Network for Online Handwritten Mathematical Expression Recognition

no code implementations20 Feb 2020 Jia-Ming Wang, Jun Du, Jianshu Zhang

For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features.

Decoder

Joint Architecture and Knowledge Distillation in CNN for Chinese Text Recognition

no code implementations17 Dec 2019 Zi-Rui Wang, Jun Du

Finally, the knowledge distillation with multiple losses is adopted to improve performance of the compact CNN.

Handwritten Chinese Text Recognition Knowledge Distillation

Challenge of Spatial Cognition for Deep Learning

no code implementations30 Jul 2019 Xi Zhang, Xiaolin Wu, Jun Du

Given the success of the deep convolutional neural networks (DCNNs) in applications of visual recognition and classification, it would be tantalizing to test if DCNNs can also learn spatial concepts, such as straightness, convexity, left/right, front/back, relative size, aspect ratio, polygons, etc., from varied visual examples of these concepts that are simple and yet vital for spatial reasoning.

Deep Learning Spatial Reasoning

Adaptive Period Embedding for Representing Oriented Objects in Aerial Images

no code implementations22 Jun 2019 Yixing Zhu, Xueqing Wu, Jun Du

While almost all previous object detectors for aerial images directly regress the angle of objects, they use complex rules to calculate the angle, and their performance is limited by the rule design.

Ranked #41 on Object Detection In Aerial Images on DOTA (using extra training data)

Object object-detection +1

The Second DIHARD Diarization Challenge: Dataset, task, and baselines

1 code implementation18 Jun 2019 Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Mark Liberman

This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain.

Action Detection Activity Detection +5

Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification

no code implementations28 Mar 2019 Lanhua You, Wu Guo, LiRong Dai, Jun Du

The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification.

Multi-Task Learning Text-Independent Speaker Verification

Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification

no code implementations28 Mar 2019 Lanhua You, Wu Guo, Li-Rong Dai, Jun Du

In this paper, gating mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification.

Text-Independent Speaker Verification

Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition

no code implementations15 Jan 2019 Yuanyuan Zhang, Zi-Rui Wang, Jun Du

Although there is no consensus on a definition, human emotional states usually can be apperceived by auditory and visual systems.

Video Emotion Recognition

Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese Text Recognition

1 code implementation24 Dec 2018 Zi-Rui Wang, Jun Du, Jia-Ming Wang

Recently, the hybrid convolutional neural network hidden Markov model (CNN-HMM) has been introduced for offline handwritten Chinese text recognition (HCTR) and has achieved state-of-the-art performance.

Handwritten Chinese Text Recognition Language Modelling

TextMountain: Accurate Scene Text Detection via Instance Segmentation

no code implementations30 Nov 2018 Yixing Zhu, Jun Du

In inference stage, each pixel at the mountain foot needs to search the path to the mountaintop and this process can be efficiently completed in parallel, yielding the efficiency of our method compared with others.

Binary Classification Instance Segmentation +3

Parsimonious HMMs for Offline Handwritten Chinese Text Recognition

no code implementations13 Aug 2018 Wenchao Wang, Jun Du, Zi-Rui Wang

Recently, hidden Markov models (HMMs) have achieved promising results for offline handwritten Chinese text recognition.

Handwritten Chinese Text Recognition

DenseRAN for Offline Handwritten Chinese Character Recognition

no code implementations13 Aug 2018 Wenchao Wang, Jianshu Zhang, Jun Du, Zi-Rui Wang, Yixing Zhu

Recently, great success has been achieved in offline handwritten Chinese character recognition by using deep learning methods.

Decoder Offline Handwritten Chinese Character Recognition

Attention Based Fully Convolutional Network for Speech Emotion Recognition

1 code implementation5 Jun 2018 Yuanyuan Zhang, Jun Du, Zi-Rui Wang, Jianshu Zhang

In this paper, we present a novel attention based fully convolutional network for speech emotion recognition.

Speech Emotion Recognition Transfer Learning

Sliding Line Point Regression for Shape Robust Scene Text Detection

1 code implementation30 Jan 2018 Yixing Zhu, Jun Du

Specifically, we first generate the smallest rectangular box including the text with region proposal network (RPN), then isometrically regress the points on the edge of text by using the vertically and horizontally sliding lines.

Curved Text Detection object-detection +4

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

no code implementations22 Jan 2018 Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai

The RNN decoder aims at generating the caption by detecting radicals and spatial structures through an attention model.

Decoder

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

2 code implementations5 Jan 2018 Jianshu Zhang, Jun Du, Li-Rong Dai

Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols.

Decoder Handwritten Mathmatical Expression Recognition +1

A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition

1 code implementation4 Dec 2017 Jianshu Zhang, Jun Du, Li-Rong Dai

In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER).

Decoder

Radical analysis network for zero-shot learning in printed Chinese character recognition

no code implementations3 Nov 2017 Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai

Chinese characters have a huge set of character categories, more than 20, 000 and the number is still increasing as more and more novel characters continue being created.

Decoder Zero-Shot Learning

Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition

1 code implementation Pattern Recognition 2017 Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jinshui Hu, Si Wei, LiRong Dai

We employ a convolutional neural network encoder that takes HME images as input as the watcher and employ a recurrent neural network decoder equipped with an attention mechanism as the parser to generate LaTeX sequences.

Decoder Handwritten Mathmatical Expression Recognition

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

no code implementations21 Mar 2017 Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.

Sound

Cannot find the paper you are looking for? You can Submit a new open access paper.