Search Results for author: Dmitriy Serdyuk

Found 20 papers, 12 papers with code

Audio-visual fine-tuning of audio-only ASR models

no code implementations • 14 Dec 2023 • Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan

Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data.

Automatic Speech Recognition Self-Supervised Learning +2

Paper
Add Code

On Robustness to Missing Video for Audiovisual Speech Recognition

no code implementations • 13 Dec 2023 • Oscar Chang, Otavio Braga, Hank Liao, Dmitriy Serdyuk, Olivier Siohan

Multi-modal models need to be robust: missing video frames should not degrade the performance of an audiovisual model to be worse than that of a single-modality audio-only model.

speech-recognition Speech Recognition

Paper
Add Code

Conformers are All You Need for Visual Speech Recognition

no code implementations • 17 Feb 2023 • Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah, Olivier Siohan

We achieve a new state-of-the-art of 12. 8% WER for visual speech recognition on the TED LRS3 dataset, which rivals the performance of audio-only models from just four years ago.

speech-recognition Visual Speech Recognition

Paper
Add Code

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video

no code implementations • 25 Jan 2022 • Dmitriy Serdyuk, Otavio Braga, Olivier Siohan

We achieve the state of the art performance of the audio-visual recognition on the LRS3-TED after fine-tuning our model (1. 6% WER).

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Paper
Add Code

Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels

no code implementations • 20 Sep 2021 • Dmitriy Serdyuk, Otavio Braga, Olivier Siohan

In this work, we propose to replace the 3D convolutional visual front-end with a video transformer front-end.

Audio-Visual Speech Recognition Automatic Speech Recognition +5

Paper
Add Code

Accounting for Variance in Machine Learning Benchmarks

no code implementations • 1 Mar 2021 • Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices.

Benchmarking BIG-bench Machine Learning +1

Paper
Add Code

Multi-Class Few Shot Learning Task and Controllable Environment

no code implementations • 24 Mar 2019 • Dmitriy Serdyuk, Negar Rostamzadeh, Pedro Oliveira Pinheiro, Boris Oreshkin, Yoshua Bengio

In this paper, we address the task of classifying multiple objects by seeing only a few samples from each category.

Classification Few-Shot Learning

Paper
Add Code

Unsupervised adversarial domain adaptation for acoustic scene classification

1 code implementation • 17 Aug 2018 • Shayan Gharib, Konstantinos Drossos, Emre Çakır, Dmitriy Serdyuk, Tuomas Virtanen

A general problem in acoustic scene classification task is the mismatched conditions between training and testing data, which significantly reduces the performance of the developed methods on classification accuracy.

Acoustic Scene Classification Classification +3

Paper
Code

Twin Regularization for online speech recognition

2 code implementations • 15 Apr 2018 • Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

Online speech recognition is crucial for developing natural human-machine interfaces.

speech-recognition Speech Recognition

2,351

Paper
Code

Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations

1 code implementation • ICLR 2019 • Alex Lamb, Jonathan Binas, Anirudh Goyal, Dmitriy Serdyuk, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio

Deep networks have achieved impressive results across a variety of important tasks.

Paper
Code

Towards end-to-end spoken language understanding

1 code implementation • 23 Feb 2018 • Dmitriy Serdyuk, Yongqiang Wang, Christian Fuegen, Anuj Kumar, Baiyang Liu, Yoshua Bengio

Spoken language understanding system is traditionally designed as a pipeline of a number of components.

Natural Language Understanding Spoken Language Understanding

Paper
Code

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

2 code implementations • 1 Feb 2018 • Konstantinos Drossos, Stylianos Ioannis Mimilakis, Dmitriy Serdyuk, Gerald Schuller, Tuomas Virtanen, Yoshua Bengio

Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods.

Sound Audio and Speech Processing

111

Paper
Code

Twin Networks: Matching the Future for Sequence Generation

2 code implementations • ICLR 2018 • Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio

We propose a simple technique for encouraging generative RNNs to plan ahead.

Caption Generation speech-recognition +1

2,351

Paper
Code

Deep Complex Networks

9 code implementations • ICLR 2018 • Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal

Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.

Ranked #3 on Music Transcription on MusicNet

Image Classification Music Transcription +1

700

Paper
Code

Invariant Representations for Noisy Speech Recognition

no code implementations • 27 Nov 2016 • Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio

Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

9,852

Paper
Code

Task Loss Estimation for Sequence Prediction

1 code implementation • 19 Nov 2015 • Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.

Language Modelling speech-recognition +1

260

Paper
Code

End-to-End Attention-based Large Vocabulary Speech Recognition

1 code implementation • 18 Aug 2015 • Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).

Acoustic Modelling Language Modelling +2

260

Paper
Code

Attention-Based Models for Speech Recognition

14 code implementations • NeurIPS 2015 • Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

Ranked #17 on Speech Recognition on TIMIT

Machine Translation Speech Recognition +1

1,157

Paper
Code

Blocks and Fuel: Frameworks for deep learning

5 code implementations • 1 Jun 2015 • Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel.

BIG-bench Machine Learning

1,161

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.