Search Results for author: Boris Ginsburg

Found 63 papers, 27 papers with code

RULER: What's the Real Context Size of Your Long-Context Language Models?

1 code implementation • 9 Apr 2024 • Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg

Despite achieving nearly perfect accuracy in the vanilla NIAH test, all models exhibit large performance drops as the context length increases.

Long-Context Understanding

Paper
Code

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

no code implementations • 4 Apr 2024 • Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg

This paper proposes Transducers with Pronunciation-aware Embeddings (PET).

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

1 code implementation • 27 Dec 2023 • Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

We also showed that training a model with multiple latencies can achieve better accuracy than single latency models while it enables us to support multiple latencies with a single model.

Automatic Speech Recognition speech-recognition +1

10,062

Paper
Code

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

This capability offers a tailored training environment for developing neural models suited for speaker diarization and voice activity detection.

Action Detection Activity Detection +3

Paper
Add Code

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.

Automatic Speech Recognition speaker-diarization +3

Paper
Add Code

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

no code implementations • 14 Oct 2023 • Paarth Neekhara, Shehzeen Hussain, Rafael Valle, Boris Ginsburg, Rishabh Ranjan, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley

In this work, instead of explicitly disentangling attributes with loss terms, we present a framework to train a controllable voice conversion model on entangled speech representations derived from self-supervised learning and speaker verification models.

Self-Supervised Learning Speaker Verification +2

Paper
Add Code

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

1 code implementation • 13 Oct 2023 • Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg

We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

10,062

Paper
Code

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models

no code implementations • 4 Oct 2023 • Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Chat About Boring Problems: Studying GPT-based text normalization

no code implementations • 23 Sep 2023 • Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg

Through this new framework, we can identify strengths and weaknesses of GPT-based TN, opening opportunities for future work.

Prompt Engineering

Paper
Add Code

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations • 19 Sep 2023 • Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modelling Quantization +4

Paper
Add Code

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

no code implementations • 18 Sep 2023 • Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio

2 code implementations • 9 Aug 2023 • Yang Zhang, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg

We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR).

Automatic Speech Recognition speech-recognition +1

10,062

Paper
Code

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling

no code implementations • 13 Jul 2023 • He Huang, Jagadeesh Balam, Boris Ginsburg

We study speech intent classification and slot filling (SICSF) by proposing to use an encoder pretrained on speech recognition (ASR) to initialize an end-to-end (E2E) Conformer-Transformer model, which achieves the new state-of-the-art results on the SLURP dataset, with 90. 14% intent accuracy and 82. 27% SLURP-F1.

intent-classification Intent Classification +7

Paper
Add Code

Confidence-based Ensembles of End-to-End Speech Recognition Models

no code implementations • 27 Jun 2023 • Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg

Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data.

Language Identification Model Selection +2

Paper
Add Code

Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

1 code implementation • 14 Jun 2023 • Kunal Dhawan, Dima Rekesh, Boris Ginsburg

Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

10,062

Paper
Code

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

1 code implementation • 4 Jun 2023 • Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition (ASR) quality given user vocabulary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

10,062

Paper
Code

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

no code implementations • 8 May 2023 • Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks.

Ranked #1 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition speech-recognition +3

Paper
Add Code

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

1 code implementation • 13 Apr 2023 • Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

TDT models for Speech Recognition achieve better accuracy and up to 2. 82X faster inference than conventional Transducers.

Ranked #1 on Speech Recognition on facebook/multilingual_librispeech german

Intent Classification Intent Classification and Slot Filling +3

10,062

Paper
Code

Powerful and Extensible WFST Framework for RNN-Transducer Losses

no code implementations • 18 Mar 2023 • Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss.

Paper
Add Code

VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

no code implementations • 14 Mar 2023 • Rohan Badlani, Akshit Arora, Subhankar Ghosh, Rafael Valle, Kevin J. Shih, João Felipe Santos, Boris Ginsburg, Bryan Catanzaro

We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system.

Disentanglement Speech Synthesis

Paper
Add Code

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

no code implementations • 27 Feb 2023 • Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg

We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations

no code implementations • 16 Feb 2023 • Shehzeen Hussain, Paarth Neekhara, Jocelyn Huang, Jason Li, Boris Ginsburg

In this work, we propose a zero-shot voice conversion method using speech representations trained with self-supervised learning.

Self-Supervised Learning Speaker Verification +1

Paper
Add Code

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition

no code implementations • 16 Dec 2022 • Aleksandr Laptev, Boris Ginsburg

This paper presents a class of new fast non-trainable entropy-based confidence estimation methods for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

no code implementations • 9 Nov 2022 • Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg

In this paper, we extend previous self-supervised approaches for language identification by experimenting with Conformer based architecture in a multilingual pre-training paradigm.

Language Identification Spoken language identification

Paper
Add Code

Multi-blank Transducers for Speech Recognition

1 code implementation • 4 Nov 2022 • Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

10,062

Paper
Code

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers

1 code implementation • 1 Nov 2022 • Cheng-Ping Hsieh, Subhankar Ghosh, Boris Ginsburg

In the proposed approach, a few small adapter modules are added to the original network.

Speech Synthesis

10,062

Paper
Code

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification

no code implementations • 27 Oct 2022 • Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language Identification (LID) that is based on the ContextNet architecture.

Language Identification Spoken language identification

Paper
Add Code

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

no code implementations • 6 Oct 2022 • Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg

Automatic speech recognition models are often adapted to improve their accuracy in a new domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

no code implementations • 29 Jul 2022 • Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

The model is trained on the Google Text Normalization dataset and achieves state-of-the-art sentence accuracy on both English and Russian test sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

3 code implementations • 9 Jun 2022 • Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.

Ranked #5 on Speech Synthesis on LibriTTS

Audio Generation Audio Synthesis +4

1,073

Paper
Code

Multi-scale Speaker Diarization with Dynamic Scale Weighting

no code implementations • 30 Mar 2022 • Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.

speaker-diarization Speaker Diarization

Paper
Add Code

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization

1 code implementation • 29 Mar 2022 • Evelina Bakhturina, Yang Zhang, Boris Ginsburg

First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one -- similar to shallow fusion for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

10,062

Paper
Code

Adapting TTS models For New Speakers using Transfer Learning

no code implementations • 12 Oct 2021 • Paarth Neekhara, Jason Li, Boris Ginsburg

We address this challenge by proposing transfer-learning guidelines for adapting high quality single-speaker TTS models for a new speaker, using only a few minutes of speech data.

Transfer Learning Voice Cloning

Paper
Add Code

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

2 code implementations • 8 Oct 2021 • Nithin Rao Koluguri, Taejin Park, Boris Ginsburg

In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations.

Ranked #1 on Speaker Diarization on CALLHOME-109

speaker-diarization Speaker Diarization +1

10,062

Paper
Code

Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings

1 code implementation • 7 Oct 2021 • Oktai Tatanov, Stanislav Beliaev, Boris Ginsburg

This paper describes Mixer-TTS, a non-autoregressive model for mel-spectrogram generation.

Language Modelling Speech Synthesis

Paper
Code

CTC Variations Through New WFST Topologies

no code implementations • 6 Oct 2021 • Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Unified Transformer-based Framework for Duplex Text Normalization

no code implementations • 23 Aug 2021 • Tuan Manh Lai, Yang Zhang, Evelina Bakhturina, Boris Ginsburg, Heng Ji

In addition, we also create a cleaned dataset from the Spoken Wikipedia Corpora for German and report the performance of our systems on the dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

CarneliNet: Neural Mixture Model for Automatic Speech Recognition

no code implementations • 22 Jul 2021 • Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg

The basic idea is to introduce a parallel mixture of shallow networks instead of a very deep network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services

no code implementations • 17 May 2021 • Yang Zhang, Vahid Noroozi, Evelina Bakhturina, Boris Ginsburg

In this paper, we propose SGD-QA, a simple and extensible model for schema-guided dialogue state tracking based on a question answering approach.

Dialogue State Tracking Goal-Oriented Dialogue Systems +1

Paper
Add Code

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction

1 code implementation • 16 Apr 2021 • Stanislav Beliaev, Boris Ginsburg

We propose TalkNet, a non-autoregressive convolutional neural model for speech synthesis with explicit pitch and duration prediction.

Speech Synthesis

Paper
Code

A Toolbox for Construction and Analysis of Speech Datasets

1 code implementation • 11 Apr 2021 • Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

292

Paper
Code

NeMo Inverse Text Normalization: From Development To Production

1 code implementation • 11 Apr 2021 • Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg

Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

5,577

Paper
Code

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

1 code implementation • 5 Apr 2021 • Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.

Ranked #3 on Speech Recognition on SPGISpeech

speech-recognition Speech Recognition

7,877

Paper
Code

Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition

no code implementations • 5 Apr 2021 • Somshubra Majumdar, Jagadeesh Balam, Oleksii Hrinchuk, Vitaly Lavrukhin, Vahid Noroozi, Boris Ginsburg

We propose Citrinet - a new end-to-end convolutional Connectionist Temporal Classification (CTC) based automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Hi-Fi Multi-Speaker English TTS Dataset

no code implementations • 3 Apr 2021 • Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang

This paper introduces a new multi-speaker English dataset for training text-to-speech models.

Paper
Add Code

On regularization of gradient descent, layer imbalance and flat minima

no code implementations • 18 Jul 2020 • Boris Ginsburg

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution.

Data Augmentation

Paper
Add Code

Training Deep Networks with Stochastic Gradient Normalized by Layerwise Adaptive Second Moments

no code implementations • ICLR 2020 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

General Classification Image Classification +5

Paper
Add Code

Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

no code implementations • 23 Oct 2019 • Oleksii Hrinchuk, Mariya Popova, Boris Ginsburg

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

15 code implementations • 22 Oct 2019 • Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang

We propose a new end-to-end neural acoustic model for automatic speech recognition.

Ranked #33 on Speech Recognition on LibriSpeech test-clean

Speech Recognition Audio and Speech Processing

10,062

Paper
Code

NeMo: a toolkit for building AI applications using Neural Modules

1 code implementation • 14 Sep 2019 • Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen

NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition.

Ranked #1 on Speech Recognition on Common Voice Spanish (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

10,062

Paper
Code

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

3 code implementations • 27 May 2019 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

General Classification speech-recognition +2

1,535

Paper
Code

Jasper: An End-to-End Convolutional Neural Acoustic Model

10 code implementations • 5 Apr 2019 • Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.

Ranked #3 on Speech Recognition on Hub5'00 SwitchBoard

Language Modelling Speech Recognition

2,917

Paper
Code

Training Neural Speech Recognition Systems with Synthetic Speech Augmentation

no code implementations • 2 Nov 2018 • Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin

Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

OpenSeq2Seq: Extensible Toolkit for Distributed and Mixed Precision Training of Sequence-to-Sequence Models

no code implementations • WS 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, Paulius Micikevicius

We present OpenSeq2Seq {--} an open-source toolkit for training sequence-to-sequence models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

3 code implementations • 25 May 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius

We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,535

Paper
Code

Large Batch Training of Convolutional Networks with Layer-wise Adaptive Rate Scaling

no code implementations • ICLR 2018 • Boris Ginsburg, Igor Gitman, Yang You

Using LARS, we scaled AlexNet and ResNet-50 to a batch size of 16K.

16k

Paper
Add Code

Training Deep AutoEncoders for Recommender Systems

no code implementations • ICLR 2018 • Oleksii Kuchaiev, Boris Ginsburg

Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training.

Recommendation Systems

Paper
Add Code

Mixed Precision Training

8 code implementations • ICLR 2018 • Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

Using this approach, we can reduce the memory consumption of deep learning models by nearly 2x.

8,368

Paper
Code

Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification

no code implementations • 24 Sep 2017 • Igor Gitman, Boris Ginsburg

However, it is not clear if these algorithms could replace BN in practical, large-scale applications.

General Classification Image Classification

Paper
Add Code

Large Batch Training of Convolutional Networks

12 code implementations • 13 Aug 2017 • Yang You, Igor Gitman, Boris Ginsburg

Using LARS, we scaled Alexnet up to a batch size of 8K, and Resnet-50 to a batch size of 32K without loss in accuracy.

3,229

Paper
Code

Training Deep AutoEncoders for Collaborative Filtering

10 code implementations • 5 Aug 2017 • Oleksii Kuchaiev, Boris Ginsburg

Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training.

Collaborative Filtering Recommendation Systems

4,099

Paper
Code

Factorization tricks for LSTM networks

2 code implementations • 31 Mar 2017 • Oleksii Kuchaiev, Boris Ginsburg

We present two simple ways of reducing the number of parameters and accelerating the training of large Long Short-Term Memory (LSTM) networks: the first one is "matrix factorization by design" of LSTM matrix into the product of two smaller matrices, and the second one is partitioning of LSTM matrix, its inputs and states into the independent groups.

Ranked #20 on Language Modelling on One Billion Word

Language Modelling

156

Paper
Code

SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques

1 code implementation • NeurIPS 2016 • Elad Richardson, Rom Herskovitz, Boris Ginsburg, Michael Zibulevsky

SEBOOST applies a secondary optimization process in the subspace spanned by the last steps and descent directions.

Stochastic Optimization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.