Search Results for author: Zhenyao Zhu

Found 12 papers, 3 papers with code

Fully Supervised Speaker Diarization

1 code implementation10 Oct 2018 Aonan Zhang, Quan Wang, Zhenyao Zhu, John Paisley, Chong Wang

In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN).

Clustering speaker-diarization +1

Exploring Neural Transducers for End-to-End Speech Recognition

no code implementations24 Jul 2017 Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition.

Language Modelling speech-recognition +1

Principled Hybrids of Generative and Discriminative Domain Adaptation

no code implementations ICLR 2018 Han Zhao, Zhenyao Zhu, Junjie Hu, Adam Coates, Geoff Gordon

This provides us a very general way to interpolate between generative and discriminative extremes through different choices of priors.

Domain Adaptation

Reducing Bias in Production Speech Models

no code implementations11 May 2017 Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu

Replacing hand-engineered pipelines with end-to-end deep learning systems has enabled strong results in applications like speech and object recognition.

Object Recognition

Deep Speaker: an End-to-End Neural Speaker Embedding System

15 code implementations5 May 2017 Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu

We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.

Clustering Speaker Identification +1

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

no code implementations ICML 2017 Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh

These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed.

Computational Efficiency speech-recognition +2

Learning Multiscale Features Directly From Waveforms

no code implementations31 Mar 2016 Zhenyao Zhu, Jesse H. Engel, Awni Hannun

Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand.

speech-recognition Speech Recognition +1

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations

no code implementations NeurIPS 2014 Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.

Face Recognition

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations11 Sep 2014 Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Object object-detection +1

Deep Learning Multi-View Representation for Face Recognition

no code implementations26 Jun 2014 Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.

Face Recognition

Recover Canonical-View Faces in the Wild with Deep Neural Networks

no code implementations14 Apr 2014 Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.

Face Reconstruction Face Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.