Large neural networks pretrained on web-scale corpora are central to modern machine learning.
We propose a framework which combines structured pruning with transfer learning to reduce the need for task-specific data.
1 code implementation • 29 Jan 2022 • Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov, Benoit Steiner, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks.
These experiments show that STC can recover most of the performance of supervised baseline when up to 70% of the labels are missing.
We benchmark our parallel algorithm on the composition of random graphs and the composition of graphs commonly used in speech recognition.
To foster adoption of secure MPC in machine learning, we present CrypTen: a software framework that exposes popular secure MPC primitives via abstractions that are common in modern machine-learning frameworks, such as tensor computations, automatic differentiation, and modular neural networks.
Given the remarkable changes in the state of speech recognition over the previous decade, what can we expect over the coming decade?
We perform experiments on both the Wilds benchmark, which captures distribution shift in the real world, as well as datasets in DomainBed benchmark that focuses more on synthetic-to-real transfer.
Machine-learning systems such as self-driving cars or virtual assistants are composed of a large number of machine-learning models that recognize image content, transcribe speech, analyze natural language, infer preferences, rank options, etc.
We show that in private, forward influence functions provide an appealing trade-off between high quality appraisal and required computation, in spite of label noise, class imbalance, and missing data.
We introduce a framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time.
We study training a single acoustic model for multiple languages with the aim of improving automatic speech recognition (ASR) performance on low-resource languages, and over-all simplifying deployment of ASR systems that support diverse languages.
In particular, IPL fine-tunes an existing model at each iteration using both labeled data and a subset of unlabeled data.
Ranked #10 on Speech Recognition on LibriSpeech test-other (using extra training data)
For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability.
Ranked #42 on Speech Recognition on LibriSpeech test-other
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC).
Secure multiparty computations enable the distribution of so-called shares of sensitive data to multiple parties such that the multiple parties can effectively process the data while being unable to glean much information about the data (at least not without collusion among all parties to put back together all the shares).
The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors.
This paper considers a learning setting in which multiple parties aim to train a contextual bandit together in a private way: the parties aim to maximize the total reward but do not want to share any of the relevant information they possess with the other parties.
We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters.
Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.
We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models.
This paper introduces wav2letter++, the fastest open-source deep learning speech recognition framework.
We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection.
Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand.
35 code implementations • 8 Dec 2015 • Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, Zhenyao Zhu
We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
We present a state-of-the-art speech recognition system developed using end-to-end deep learning.