1 code implementation • ICML 2020 • Jianshu Zhang, Jun Du, Yongxin Yang, Yi-Zhe Song, Si Wei, Li-Rong Dai
Recent encoder-decoder approaches typically employ string decoders to convert images into serialized strings for image-to-markup.
no code implementations • 30 Nov 2024 • Ziqi Chen, Jun Du, Chunxiao Jiang, Zhu Han
However, in open underwater environments, the location of the source node is highly susceptible to being obtained by eavesdropping nodes through correlation analysis, leading to the issue of location privacy in underwater communication systems, which has been overlooked by many studies.
no code implementations • 23 Nov 2024 • Haotian Wang, Yuzhe Weng, Yueyan Li, Zilu Guo, Jun Du, Shutong Niu, Jiefeng Ma, Shan He, Xiaoyan Wu, Qiming Hu, Bing Yin, Cong Liu, Qingfeng Liu
Diffusion models have revolutionized the field of talking head generation, yet still face challenges in expressiveness, controllability, and stability in long-time generation.
no code implementations • 21 Nov 2024 • Hengyi Hong, Qing Wang, Jun Du, Ruoyu Wei, Mingqi Cai, Xin Fang
We propose a novel output representation that combines the DOA with distance of sound sources by calculating the real Cartesian coordinates to address the newly introduced source distance estimation (SDE) task in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge.
no code implementations • 11 Nov 2024 • Shu-Tong Niu, Jun Du, Ruo-Yu Wang, Gao-Bin Yang, Tian Gao, Jia Pan, Yu Hu
First, we sequentially integrate the NSD and SS modules within a joint training framework, enabling the separation module to leverage speaker time boundaries from the diarization module effectively.
1 code implementation • 19 Oct 2024 • Yuzhe Weng, Haotian Wang, Tian Gao, Kewei Li, Shutong Niu, Jun Du
In multimodal sentiment analysis, collecting text data is often more challenging than video or audio due to higher annotation costs and inconsistent automatic speech recognition (ASR) quality.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 17 Oct 2024 • Hanbo Cheng, Limin Lin, Chenyu Liu, Pengcheng Xia, Pengfei Hu, Jiefeng Ma, Jun Du, Jia Pan
To address these challenges, we present DAWN (Dynamic frame Avatar With Non-autoregressive diffusion), a framework that enables all-at-once generation of dynamic-length video sequences.
no code implementations • 8 Oct 2024 • Ya Jiang, Hongbo Lan, Jun Du, Qing Wang, Shutong Niu
In the two-person conversation scenario with one wearing smart glasses, transcribing and displaying the speaker's content in real-time is an intriguing application, providing a priori information for subsequent tasks such as translation and comprehension.
no code implementations • 29 Sep 2024 • Shuhang Liu, Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Qing Wang, Jianshu Zhang, Chenyu Liu
Positioned at the outset of the answer text, the <see> token allows the model to first see--observing the regions of the image related to the input question--and then tell--providing articulated textual responses.
no code implementations • 25 Sep 2024 • Ruoyu Wang, Shutong Niu, Gaobin Yang, Jun Du, Shuangqing Qian, Tian Gao, Jia Pan
This paper proposes a three-stage modular system to enhance single-channel neural speaker diarization systems and recognition performance by utilizing spatial cues from multi-channel speech to provide more accurate initialization for each stage of neural speaker diarization (NSD) decoding: (1) Overlap detection and continuous speech separation (CSS) on multi-channel speech are used to obtain cleaner single speaker speech segments for clustering, followed by the first NSD decoding pass.
no code implementations • 18 Sep 2024 • Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Shuhang Liu, Jun Du, Jianshu Zhang
In recent years, visually-rich document understanding has attracted increasing attention.
no code implementations • 9 Sep 2024 • Hongfei Xue, Rong Gong, Mingchen Shao, Xin Xu, Lezhi Wang, Lei Xie, Hui Bu, Jiaming Zhou, Yong Qin, Jun Du, Ming Li, BinBin Zhang, Bin Jia
The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 3 Sep 2024 • Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao
This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge.
no code implementations • 24 Aug 2024 • Tianxiang Huang, Jing Shi, Ge Jin, Juncheng Li, Jun Wang, Jun Du, Jun Shi
In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance.
no code implementations • 16 Jul 2024 • Chenyu Liu, Jia Pan, Jinshui Hu, BaoCai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du, Qingfeng Liu
Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding.
no code implementations • 21 Jun 2024 • Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee
Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances.
no code implementations • 18 Jun 2024 • Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao
To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology.
no code implementations • 13 Jun 2024 • Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang
Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding.
no code implementations • 11 Jun 2024 • Rong Gong, Hongfei Xue, Lezhi Wang, Xin Xu, Qisheng Li, Lei Xie, Hui Bu, Shaomei Wu, Jiaming Zhou, Yong Qin, BinBin Zhang, Jun Du, Jia Bin, Ming Li
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 27 May 2024 • Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui
In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 24 May 2024 • Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering an innovative approach to synthesizing musical content from textual descriptions.
Ranked #1 on Music Generation on Song Describer Dataset
1 code implementation • 20 May 2024 • Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du
The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial.
no code implementations • 17 Mar 2024 • Liang Zou, Genwei Yan, Ruoyu Wang, Jun Du, Meng Lei, Tian Gao, Xin Fang
This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples.
1 code implementation • CVPR 2024 • Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee
In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.
no code implementations • 31 Dec 2023 • Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR.
no code implementations • 17 Sep 2023 • Zilu Guo, Jun Du, Chin-Hui Lee
The starting state is noisy speech and the ending state is clean speech.
1 code implementation • 17 Sep 2023 • Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee
We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.
no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
no code implementations • 11 Sep 2023 • Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng
Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion.
no code implementations • 28 Aug 2023 • Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee
This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.
1 code implementation • 14 Aug 2023 • Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee
In this paper, we propose two novel techniques to improve audio-visual speech recognition (AVSR) under a pre-training and fine-tuning training framework.
Audio-Visual Speech Recognition Automatic Speech Recognition +2
no code implementations • 30 Jul 2023 • Pengfei Hu, Jiefeng Ma, Zhenrong Zhang, Jun Du, Jianshu Zhang
This poses a challenge when dealing with an unseen misspelled character, as the decoder may generate an IDS sequence that matches a seen character instead.
no code implementations • 17 Jul 2023 • Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee
Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios.
1 code implementation • 14 Jun 2023 • Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang
The goal of this study is to implement diffusion models for speech enhancement (SE).
1 code implementation • 24 Mar 2023 • Jiefeng Ma, Jun Du, Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Huihui Zhu, Cong Liu
Moreover, we proposed an encoder-decoder-based hierarchical document structure parsing system (DSPS) to tackle this problem.
1 code implementation • 8 Mar 2023 • Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jianshu Zhang, Huihui Zhu, BaoCai Yin, Bing Yin, Cong Liu
Table structure recognition is an indispensable element for enabling machines to comprehend tables.
1 code implementation • 6 Dec 2022 • Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Jun Du, Jiajia Wu
Next, to parse the hierarchical relationship between the heading entities, a tree-structured decoder is designed.
no code implementations • 1 Dec 2022 • Jun Du, Bingqing Jiang, Chunxiao Jiang, Yuanming Shi, Zhu Han
To further improve the efficiency of wireless data aggregation and model learning, over-the-air computation (AirComp) is emerging as a promising solution by using the superposition characteristics of wireless channels.
no code implementations • 26 Oct 2022 • Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du, Chin-Hui Lee
In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function.
no code implementations • 2 Sep 2022 • Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, LiRong Dai
However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings.
no code implementations • 22 Jul 2022 • Zhaoyue Xia, Jun Du, Yong Ren
Compared with perfect data, quantization poses fundamental challenges on loss of data accuracy, which further impacts the convergence of the algorithms.
no code implementations • Proceedings of the AAAI Conference on Artificial Intelligence 2022 • Changjie Wu, Jun Du, Yunqing Li, Jianshu Zhang, Chen Yang, Bo Ren, Yiqing Hu
However previous tree decoders converted the tree structure labels into a fixed and ordered sequence, which could not make full use of the diversified expression of tree labels.
1 code implementation • 25 Mar 2022 • Zhenrong Zhang, Jiefeng Ma, Jun Du, Licheng Wang, Jianshu Zhang
Its main task is to automatically read, understand, and analyze documents.
no code implementations • 7 Mar 2022 • Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang, Chin-Hui Lee
In this paper, we propose two techniques, namely joint modeling and data augmentation, to improve system performances for audio-visual scene classification (AVSC).
no code implementations • 17 Feb 2022 • Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee
Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.
no code implementations • 10 Feb 2022 • Maokui He, Xiang Lv, Weilin Zhou, JingJing Yin, Xiaoqi Zhang, Yuxuan Wang, Shutong Niu, Yuhang Cao, Heng Lu, Jun Du, Chin-Hui Lee
We propose two improvements to target-speaker voice activity detection (TS-VAD), the core component in our proposed speaker diarization system that was submitted to the 2022 Multi-Channel Multi-Party Meeting Transcription (M2MeT) challenge.
no code implementations • 1 Feb 2022 • Wei Wei, Jingjing Wang, Jun Du, Zhengru Fang, Chunxiao Jiang, Yong Ren
Simulations show that underwater disturbances have a large impact on the system considering communication delay.
no code implementations • 26 Sep 2021 • Jun Du, Chunxiao Jiang, Abderrahim Benslimane, Song Guo, Yong Ren
Based on this dynamic access model, a Stackelberg differential game based cloud computing resource sharing mechanism is proposed to facilitate the resource trading between the cloud computing service provider (CCP) and different edge computing service providers (ECPs).
no code implementations • 7 Aug 2021 • Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.
no code implementations • 12 Jul 2021 • Zhenrong Zhang, Jianshu Zhang, Jun Du
However, due to the complexity and diversity in their structure and style, it is very difficult to parse the tabular data into the structured format which machines can understand easily, especially for complex tables.
Ranked #9 on Table Recognition on PubTabNet
no code implementations • 6 Jul 2021 • Shu-Tong Niu, Jun Du, Lei Sun, Chin-Hui Lee
We propose a separation guided speaker diarization (SGSD) approach by fully utilizing a complementarity of speech separation and speaker clustering.
no code implementations • 3 Jul 2021 • Hao Yen, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Qing Wang, Yuyang Wang, Xianjun Xia, Yuanjun Zhao, Yuzhong Wu, Yannan Wang, Jun Du, Chin-Hui Lee
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC).
no code implementations • 19 Mar 2021 • Yuxuan Wang, Maokui He, Shutong Niu, Lei Sun, Tian Gao, Xin Fang, Jia Pan, Jun Du, Chin-Hui Lee
This system description describes our submission system to the Third DIHARD Speech Diarization Challenge.
no code implementations • 28 Dec 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin
In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.
no code implementations • 27 Dec 2020 • Hengshun Zhou, Debin Meng, Yuanyuan Zhang, Xiaojiang Peng, Jun Du, Kai Wang, Yu Qiao
The audio-video based emotion recognition aims to classify a given video into basic emotions.
Facial Expression Recognition (FER) Video Emotion Recognition
3 code implementations • 2 Dec 2020 • Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman
DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.
no code implementations • 8 Nov 2020 • Koen Oostermeijer, Qing Wang, Jun Du
One of the strengths of traditional convolutional neural networks (CNNs) is their inherent translational invariance.
no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 3 Nov 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.
Ranked #1 on Acoustic Scene Classification on TAU Urban Acoustic Scenes 2019 (using extra training data)
no code implementations • 25 Oct 2020 • Yu-Xuan Wang, Jun Du, Li Chai, Chin-Hui Lee, Jia Pan
We propose a novel noise-aware memory-attention network (NAMAN) for regression-based speech enhancement, aiming at improving quality of enhanced speech in unseen noise conditions.
no code implementations • 21 Sep 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee
We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).
no code implementations • 12 Aug 2020 • Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
In this paper, we exploit the properties of mean absolute error (MAE) as a loss function for the deep neural network (DNN) based vector-to-vector regression.
no code implementations • 4 Aug 2020 • Jun Qi, Jun Du, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee
In this paper, we show that, in vector-to-vector regression utilizing deep neural networks (DNNs), a generalized loss of mean absolute error (MAE) between the predicted and expected feature vectors is upper bounded by the sum of an approximation error, an estimation error, and an optimization error.
no code implementations • 31 Jul 2020 • Hu Hu, Sabato Marco Siniscalchi, Yannan Wang, Xue Bai, Jun Du, Chin-Hui Lee
In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i. e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification.
1 code implementation • 16 Jul 2020 • Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee
On Task 1b development data set, we achieve an accuracy of 96. 7\% with a model size smaller than 500KB.
no code implementations • 20 Feb 2020 • Jia-Ming Wang, Jun Du, Jianshu Zhang
For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features.
no code implementations • 17 Dec 2019 • Zi-Rui Wang, Jun Du
Finally, the knowledge distillation with multiple losses is adopted to improve performance of the compact CNN.
1 code implementation • 2 Dec 2019 • Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak
This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios.
Audio and Speech Processing Sound
no code implementations • 30 Jul 2019 • Xi Zhang, Xiaolin Wu, Jun Du
Given the success of the deep convolutional neural networks (DCNNs) in applications of visual recognition and classification, it would be tantalizing to test if DCNNs can also learn spatial concepts, such as straightness, convexity, left/right, front/back, relative size, aspect ratio, polygons, etc., from varied visual examples of these concepts that are simple and yet vital for spatial reasoning.
no code implementations • 22 Jun 2019 • Yixing Zhu, Xueqing Wu, Jun Du
While almost all previous object detectors for aerial images directly regress the angle of objects, they use complex rules to calculate the angle, and their performance is limited by the rule design.
Ranked #41 on Object Detection In Aerial Images on DOTA (using extra training data)
1 code implementation • 18 Jun 2019 • Neville Ryant, Kenneth Church, Christopher Cieri, Alejandrina Cristia, Jun Du, Sriram Ganapathy, Mark Liberman
This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain.
no code implementations • 28 Mar 2019 • Lanhua You, Wu Guo, LiRong Dai, Jun Du
The x-vector based deep neural network (DNN) embedding systems have demonstrated effectiveness for text-independent speaker verification.
no code implementations • 28 Mar 2019 • Lanhua You, Wu Guo, Li-Rong Dai, Jun Du
In this paper, gating mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification.
no code implementations • 15 Jan 2019 • Yuanyuan Zhang, Zi-Rui Wang, Jun Du
Although there is no consensus on a definition, human emotional states usually can be apperceived by auditory and visual systems.
1 code implementation • 24 Dec 2018 • Zi-Rui Wang, Jun Du, Jia-Ming Wang
Recently, the hybrid convolutional neural network hidden Markov model (CNN-HMM) has been introduced for offline handwritten Chinese text recognition (HCTR) and has achieved state-of-the-art performance.
no code implementations • 30 Nov 2018 • Yixing Zhu, Jun Du
In inference stage, each pixel at the mountain foot needs to search the path to the mountaintop and this process can be efficiently completed in parallel, yielding the efficiency of our method compared with others.
no code implementations • 13 Aug 2018 • Wenchao Wang, Jun Du, Zi-Rui Wang
Recently, hidden Markov models (HMMs) have achieved promising results for offline handwritten Chinese text recognition.
no code implementations • 13 Aug 2018 • Wenchao Wang, Jianshu Zhang, Jun Du, Zi-Rui Wang, Yixing Zhu
Recently, great success has been achieved in offline handwritten Chinese character recognition by using deep learning methods.
1 code implementation • 5 Jun 2018 • Yuanyuan Zhang, Jun Du, Zi-Rui Wang, Jianshu Zhang
In this paper, we present a novel attention based fully convolutional network for speech emotion recognition.
1 code implementation • 30 Jan 2018 • Yixing Zhu, Jun Du
Specifically, we first generate the smallest rectangular box including the text with region proposal network (RPN), then isometrically regress the points on the edge of text by using the vertically and horizontally sliding lines.
Ranked #17 on Scene Text Detection on SCUT-CTW1500
no code implementations • 22 Jan 2018 • Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai
The RNN decoder aims at generating the caption by detecting radicals and spatial structures through an attention model.
2 code implementations • 5 Jan 2018 • Jianshu Zhang, Jun Du, Li-Rong Dai
Handwritten mathematical expression recognition is a challenging problem due to the complicated two-dimensional structures, ambiguous handwriting input and variant scales of handwritten math symbols.
1 code implementation • 4 Dec 2017 • Jianshu Zhang, Jun Du, Li-Rong Dai
In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER).
no code implementations • 3 Nov 2017 • Jianshu Zhang, Yixing Zhu, Jun Du, Li-Rong Dai
Chinese characters have a huge set of character categories, more than 20, 000 and the number is still increasing as more and more novel characters continue being created.
1 code implementation • Pattern Recognition 2017 • Jianshu Zhang, Jun Du, Shiliang Zhang, Dan Liu, Yulong Hu, Jinshui Hu, Si Wei, LiRong Dai
We employ a convolutional neural network encoder that takes HME images as input as the watcher and employ a recurrent neural network decoder equipped with an attention mechanism as the parser to generate LaTeX sequences.
no code implementations • 21 Mar 2017 • Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee
We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals.
Sound