Search Results for author: Wei Zou

Found 32 papers, 11 papers with code

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

1 code implementation17 Aug 2022 Goutham Rajendran, Wei Zou

Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Time Domain Adversarial Voice Conversion for ADD 2022

no code implementations19 Apr 2022 Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022).

Voice Conversion

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

no code implementations19 Apr 2022 Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}.

Voice Conversion

Role of limiting dispersal on metacommunity stability and persistence

no code implementations8 Mar 2022 Snehasish Roy Chowdhury, Ramesh Arumugam, Wei Zou, V. K. Chandrasekar, D. V. Senthilkumar

Nevertheless, at the local scale, the spread of the inhomogeneous steady states increases up to a critical value of the limiting factor, favoring the metacommunity persistence, and then starts decreasing for further decrease in the limiting factor with varying local interaction.

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

1 code implementation13 Jun 2021 Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

speech-recognition Speech Recognition

DiDiSpeech: A Large Scale Mandarin Speech Corpus

no code implementations19 Oct 2020 Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.

Audio and Speech Processing

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

1 code implementation20 May 2020 Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li

In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.

speech-recognition Speech Recognition +2

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

1 code implementation ACL 2020 Wei Zou, Shu-Jian Huang, Jun Xie, Xin-yu Dai, Jia-Jun Chen

Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance.

Machine Translation Translation

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

no code implementations23 Oct 2019 Wubo Li, Wei Zou, Xiangang Li

Multimodalities provide promising performance than unimodality in most tasks.

Cross-task pre-training for on-device acoustic scene classification

no code implementations22 Oct 2019 Ruixiong Zhang, Wei Zou, Xiangang Li

To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks.

Acoustic Scene Classification Classification +3

The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera

no code implementations24 Sep 2019 Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang

In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.

EPOSIT: An Absolute Pose Estimation Method for Pinhole and Fish-Eye Cameras

1 code implementation19 Sep 2019 Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma

This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.

Pose Estimation

Human Following for Wheeled Robot with Monocular Pan-tilt Camera

no code implementations13 Sep 2019 Zheng Zhu, Hongxuan Ma, Wei Zou

Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.

Optical Flow Estimation Visual Tracking

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations26 Aug 2019 Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Visual Object Tracking Vocal Bursts Intensity Prediction

Camera Pose Correction in SLAM Based on Bias Values of Map Points

no code implementations24 Aug 2019 Zhaobing Kang, Wei Zou, Zheng Zhu

Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.

feature selection Pose Estimation

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations15 Aug 2019 Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations14 Dec 2018 Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Action Recognition Multimodal Activity Recognition +3

An Efficient Optical Flow Based Motion Detection Method for Non-stationary Scenes

no code implementations18 Nov 2018 Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Motion Detection Motion Detection In Non-Stationary Scenes +1

Optical Flow Based Online Moving Foreground Analysis

no code implementations18 Nov 2018 Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.

Moving Object Detection object-detection +1

Towards End-to-End Code-Switching Speech Recognition

no code implementations31 Oct 2018 Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang Li

Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes

no code implementations13 Jul 2018 Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu

Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Moving Object Detection object-detection +1

A comparable study of modeling units for end-to-end Mandarin speech recognition

no code implementations10 May 2018 Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li

We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.

speech-recognition Speech Recognition

End-to-end Video-level Representation Learning for Action Recognition

1 code implementation11 Nov 2017 Jiagang Zhu, Wei Zou, Zheng Zhu

From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.

Action Recognition Optical Flow Estimation +2

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations10 Nov 2017 Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

End-to-end Flow Correlation Tracking with Spatial-temporal Attention

no code implementations CVPR 2018 Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan

Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.

Optical Flow Estimation

Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

1 code implementation12 Sep 2017 Jiagang Zhu, Wei Zou, Zheng Zhu

For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.

Action Classification Action Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.