1 code implementation • 17 Aug 2022 • Goutham Rajendran, Wei Zou
Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 1 Aug 2022 • Hu Su, Yonghao He, Rui Jiang, Jiabin Zhang, Wei Zou, Bin Fan
The dynamic smooth label is assigned to supervise the classification branch.
no code implementations • 19 Apr 2022 • Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li
In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022).
no code implementations • 19 Apr 2022 • Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li
This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}.
no code implementations • 8 Mar 2022 • Snehasish Roy Chowdhury, Ramesh Arumugam, Wei Zou, V. K. Chandrasekar, D. V. Senthilkumar
Nevertheless, at the local scale, the spread of the inhomogeneous steady states increases up to a critical value of the limiting factor, favoring the metacommunity persistence, and then starts decreasing for further decrease in the limiting factor with varying local interaction.
1 code implementation • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on
Speech Recognition
on GigaSpeech
no code implementations • 26 Apr 2021 • Jianwei Sun, Zhiyuan Tang, Hengxin Yin, Wei Wang, Xi Zhao, Shuaijiang Zhao, Xiaoning Lei, Wei Zou, Xiangang Li
Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 27 Oct 2020 • Dongwei Jiang, Wubo Li, Miao Cao, Wei Zou, Xiangang Li
Self-supervised visual pretraining has shown significant progress recently.
no code implementations • 21 Oct 2020 • Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.
no code implementations • 19 Oct 2020 • Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.
Audio and Speech Processing
1 code implementation • 20 May 2020 • Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li
In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.
1 code implementation • ACL 2020 • Wei Zou, Shu-Jian Huang, Jun Xie, Xin-yu Dai, Jia-Jun Chen
Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance.
no code implementations • 23 Oct 2019 • Wubo Li, Wei Zou, Xiangang Li
Multimodalities provide promising performance than unimodality in most tasks.
1 code implementation • 22 Oct 2019 • Dongwei Jiang, Xiaoning Lei, Wubo Li, Ne Luo, Yuxuan Hu, Wei Zou, Xiangang Li
Speech recognition technologies are gaining enormous popularity in various industrial applications.
no code implementations • 22 Oct 2019 • Ruixiong Zhang, Wei Zou, Xiangang Li
To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks.
no code implementations • 24 Sep 2019 • Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang
In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.
1 code implementation • 19 Sep 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma
This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.
no code implementations • 13 Sep 2019 • Zheng Zhu, Hongxuan Ma, Wei Zou
Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.
no code implementations • 26 Aug 2019 • Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang
In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).
no code implementations • 24 Aug 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu
Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.
no code implementations • 15 Aug 2019 • Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang
Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.
2 code implementations • 2 Aug 2019 • Kun Han, Junwen Chen, HUI ZHANG, Haiyang Xu, Yiping Peng, Yun Wang, Ning Ding, Hui Deng, Yonghu Gao, Tingwei Guo, Yi Zhang, Yahao He, Baochang Ma, Yu-Long Zhou, Kangli Zhang, Chao Liu, Ying Lyu, Chenxi Wang, Cheng Gong, Yunbo Wang, Wei Zou, Hui Song, Xiangang Li
In this paper we present DELTA, a deep learning based language technology platform.
Ranked #3 on
Text Classification
on Yahoo! Answers
no code implementations • 14 Dec 2018 • Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du
On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.
Ranked #1 on
Action Recognition
on UTD-MHAD
no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu
Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.
Motion Detection
Motion Detection In Non-Stationary Scenes
+1
no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu
Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.
no code implementations • 31 Oct 2018 • Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang Li
Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 13 Jul 2018 • Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu
Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.
no code implementations • 10 May 2018 • Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li
We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.
1 code implementation • 11 Nov 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu
From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.
no code implementations • 10 Nov 2017 • Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang
Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.
no code implementations • CVPR 2018 • Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan
Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.
1 code implementation • 12 Sep 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu
For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.