Search Results for author: Siddharth Dalmia

Found 34 papers, 7 papers with code

CMU’s IWSLT 2022 Dialect Speech Translation System

no code implementations IWSLT (ACL) 2022 Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe

We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems.

Knowledge Distillation Machine Translation +3

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

no code implementations11 Nov 2022 Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe

To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word.

Automatic Speech Recognition speech-recognition +1

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

1 code implementation27 Oct 2022 Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation.

named-entity-recognition Named Entity Recognition +1

CTC Alignments Improve Autoregressive Translation

no code implementations11 Oct 2022 Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment.

Automatic Speech Recognition speech-recognition +2

Two-Pass Low Latency End-to-End Spoken Language Understanding

no code implementations14 Jul 2022 Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches.

speech-recognition Speech Recognition +1

LegoNN: Building Modular Encoder-Decoder Models

no code implementations7 Jun 2022 Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused with no fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks to match or beat respective baseline models.

Machine Translation speech-recognition +2

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations29 Nov 2021 Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

speech-recognition Speech Recognition

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

2 code implementations29 Nov 2021 Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.

Spoken Language Understanding

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

no code implementations27 Sep 2021 Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe

We propose Fast-MD, a fast MD model that generates HI by non-autoregressive (NAR) decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.

Automatic Speech Recognition Machine Translation +2

Differentiable Allophone Graphs for Language-Universal Speech Recognition

1 code implementation24 Jul 2021 Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe

These phone-based systems with learned allophone graphs can be used by linguists to document new languages, build phone-based lexicons that capture rich pronunciation variations, and re-evaluate the allophone mappings of seen language.

speech-recognition Speech Recognition

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

no code implementations NAACL 2021 Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe

In this work, we present an end-to-end framework that exploits compositionality to learn searchable hidden representations at intermediate stages of a sequence model using decomposed sub-tasks.

speech-recognition Speech Recognition +1

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

2 code implementations EACL 2021 Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black

When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system.

Question Answering

Towards Zero-shot Learning for Automatic Phonemic Transcription

no code implementations26 Feb 2020 Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze

The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.

Zero-Shot Learning

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

no code implementations9 Nov 2019 Siddharth Dalmia, Abdel-rahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer

Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability.

Multilingual Speech Recognition with Corpus Relatedness Sampling

no code implementations2 Aug 2019 Xinjian Li, Siddharth Dalmia, Alan W. black, Florian Metze

For example, the target corpus might benefit more from a corpus in the same domain or a corpus from a close language.

speech-recognition Speech Recognition

Cross-Attention End-to-End ASR for Two-Party Conversations

no code implementations24 Jul 2019 Suyoun Kim, Siddharth Dalmia, Florian Metze

We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information.

speech-recognition Speech Recognition

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

no code implementations ACL 2019 Suyoun Kim, Siddharth Dalmia, Florian Metze

We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings.

Sentence Embeddings speech-recognition +1

The ARIEL-CMU Systems for LoReHLT18

no code implementations24 Feb 2019 Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Phoneme Level Language Models for Sequence Based Low Resource ASR

no code implementations20 Feb 2019 Siddharth Dalmia, Xinjian Li, Alan W. black, Florian Metze

Building multilingual and crosslingual models help bring different languages together in a language universal space.

Language Modelling

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

no code implementations28 Jul 2018 Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. black

We demonstrate the effectiveness of using a pre-trained English recognizer, which is robust to such mismatched conditions, as a domain normalizing feature extractor on a low resource language.

Sequence-based Multi-lingual Low Resource Speech Recognition

no code implementations21 Feb 2018 Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. black

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains.

speech-recognition Speech Recognition

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations20 Sep 2016 Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.