Search Results for author: Joel Shor

Found 13 papers, 5 papers with code

TRILLsson: Distilled Universal Paralinguistic Speech Representations

no code implementations1 Mar 2022 Joel Shor, Subhashini Venugopalan

Our largest distilled model is less than 15% the size of the original model (314MB vs 2. 2GB), achieves over 96% the accuracy on 6 of 7 tasks, and is trained on 6. 5% the data.

Emotion Recognition Knowledge Distillation

Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

no code implementations9 Oct 2021 Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech.

Towards Learning a Universal Non-Semantic Representation of Speech

1 code implementation25 Feb 2020 Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Felix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, Yinnon Haviv

The ultimate goal of transfer learning is to reduce labeled data requirements by exploiting a pre-existing embedding model trained for different datasets or tasks.

Transfer Learning

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

1 code implementation ICML 2018 RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron J. Weiss, Rob Clark, Rif A. Saurous

We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody.

Expressive Speech Synthesis

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

10 code implementations ICML 2018 Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Ren, Ye Jia, Rif A. Saurous

In this work, we propose "global style tokens" (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system.

Speech Synthesis Style Transfer +1

Spatially adaptive image compression using a tiled deep network

no code implementations7 Feb 2018 David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, Saurabh Singh

Deep neural networks represent a powerful class of function approximators that can learn to compress and reconstruct images.

Image Compression

Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

no code implementations18 May 2017 Michele Covell, Nick Johnston, David Minnen, Sung Jin Hwang, Joel Shor, Saurabh Singh, Damien Vincent, George Toderici

Our methods introduce a multi-pass training method to combine the training goals of high-quality reconstructions in areas around stop-code masking as well as in highly-detailed areas.

Image Compression

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

no code implementations CVPR 2018 Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jin Hwang, Joel Shor, George Toderici

We propose a method for lossy image compression based on recurrent, convolutional neural networks that outperforms BPG (4:2:0 ), WebP, JPEG2000, and JPEG as measured by MS-SSIM.

Image Compression MS-SSIM +1

Full Resolution Image Compression with Recurrent Neural Networks

4 code implementations CVPR 2017 George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, Michele Covell

As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.

Image Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.