1 code implementation • 10 Oct 2018 • Aonan Zhang, Quan Wang, Zhenyao Zhu, John Paisley, Chong Wang
In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN).
Ranked #1 on Speaker Diarization on Hub5'00 CallHome
no code implementations • 24 Jul 2017 • Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition.
no code implementations • ICLR 2018 • Han Zhao, Zhenyao Zhu, Junjie Hu, Adam Coates, Geoff Gordon
This provides us a very general way to interpolate between generative and discriminative extremes through different choices of priors.
no code implementations • 11 May 2017 • Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition.
15 code implementations • 5 May 2017 • Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
no code implementations • ICML 2017 • Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh
These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed.
no code implementations • 31 Mar 2016 • Zhenyao Zhu, Jesse H. Engel, Awni Hannun
Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand.
35 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
no code implementations • NeurIPS 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.
no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang
In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.
no code implementations • 26 Jun 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.
no code implementations • 14 Apr 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.