Search Results for author: Siddharth Dalmia

Found 38 papers, 11 papers with code

CMU’s IWSLT 2022 Dialect Speech Translation System

no code implementations • IWSLT (ACL) 2022 • Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe

We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems.

Knowledge Distillation Machine Translation +3

Paper
Add Code

Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation

no code implementations • NAACL (AmericasNLP) 2021 • Jiatong Shi, Jonathan D. Amith, Xuankai Chang, Siddharth Dalmia, Brian Yan, Shinji Watanabe

Documentation of endangered languages (ELs) has become increasingly urgent as thousands of languages are on the verge of disappearing by the end of the 21st century.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems

no code implementations • 2 Apr 2024 • Frank Palma Gomez, Ramon Sanabria, Yun-Hsuan Sung, Daniel Cer, Siddharth Dalmia, Gustavo Hernandez Abrego

Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages.

Machine Translation Retrieval +1

Paper
Add Code

LLM Augmented LLMs: Expanding Capabilities through Composition

1 code implementation • 4 Jan 2024 • Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains.

Arithmetic Reasoning Code Generation

131

Paper
Code

Multimodal Modeling For Spoken Language Identification

no code implementations • 19 Sep 2023 • Shikhar Bharadwaj, Min Ma, Shikhar Vashishth, Ankur Bapna, Sriram Ganapathy, Vera Axelrod, Siddharth Dalmia, Wei Han, Yu Zhang, Daan van Esch, Sandy Ritchie, Partha Talukdar, Jason Riesa

Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.

Language Identification Spoken language identification

Paper
Add Code

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

1 code implementation • 10 Apr 2023 • Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community.

Benchmarking Simultaneous Speech-to-Text Translation +2

7,851

Paper
Code

Align, Write, Re-order: Explainable End-to-End Speech Translation via Operation Sequence Generation

no code implementations • 11 Nov 2022 • Motoi Omachi, Brian Yan, Siddharth Dalmia, Yuya Fujita, Shinji Watanabe

To solve this problem, we would like to simultaneously generate automatic speech recognition (ASR) and ST predictions such that each source language word is explicitly mapped to a target language word.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

no code implementations • 10 Nov 2022 • Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

1 code implementation • 27 Oct 2022 • Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation.

named-entity-recognition Named Entity Recognition +2

7,851

Paper
Code

CTC Alignments Improve Autoregressive Translation

no code implementations • 11 Oct 2022 • Brian Yan, Siddharth Dalmia, Yosuke Higuchi, Graham Neubig, Florian Metze, Alan W Black, Shinji Watanabe

Connectionist Temporal Classification (CTC) is a widely used approach for automatic speech recognition (ASR) that performs conditionally independent monotonic alignment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Two-Pass Low Latency End-to-End Spoken Language Understanding

no code implementations • 14 Jul 2022 • Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches.

speech-recognition Speech Recognition +2

Paper
Add Code

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

4 code implementations • 6 Jul 2022 • Yifan Peng, Siddharth Dalmia, Ian Lane, Shinji Watanabe

Conformer has proven to be effective in many speech processing tasks.

speech-recognition Speech Recognition +1

7,851

Paper
Code

LegoNN: Building Modular Encoder-Decoder Models

no code implementations • 7 Jun 2022 • Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed

We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

1 code implementation • 25 May 2022 • Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

2 code implementations • 29 Nov 2021 • Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.

Spoken Language Understanding

7,851

Paper
Code

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations • 29 Nov 2021 • Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

speech-recognition Speech Recognition

Paper
Add Code

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

1 code implementation • 27 Sep 2021 • Hirofumi Inaguma, Siddharth Dalmia, Brian Yan, Shinji Watanabe

We propose Fast-MD, a fast MD model that generates HI by non-autoregressive (NAR) decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

9,768

Paper
Code

Differentiable Allophone Graphs for Language-Universal Speech Recognition

1 code implementation • 24 Jul 2021 • Brian Yan, Siddharth Dalmia, David R. Mortensen, Florian Metze, Shinji Watanabe

These phone-based systems with learned allophone graphs can be used by linguists to document new languages, build phone-based lexicons that capture rich pronunciation variations, and re-evaluate the allophone mappings of seen language.

speech-recognition Speech Recognition

Paper
Code

ESPnet-ST IWSLT 2021 Offline Speech Translation System

no code implementations • ACL (IWSLT) 2021 • Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe

This year we made various efforts on training data, architecture, and audio segmentation.

Knowledge Distillation speech-recognition +2

Paper
Add Code

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding

no code implementations • 29 Jun 2021 • Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W Black

Our splits identify performance gaps up to 10% between end-to-end systems that were within 1% of each other on the original test sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks

no code implementations • NAACL 2021 • Siddharth Dalmia, Brian Yan, Vikas Raunak, Florian Metze, Shinji Watanabe

In this work, we present an end-to-end framework that exploits compositionality to learn searchable hidden representations at intermediate stages of a sequence model using decomposed sub-tasks.

speech-recognition Speech Recognition +1

Paper
Add Code

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

2 code implementations • EACL 2021 • Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard Hovy, Alan W Black

When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system.

Question Answering

Paper
Code

Transformer-Transducers for Code-Switched Speech Recognition

no code implementations • 30 Nov 2020 • Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff

We live in a world where 60% of the population can speak two or more languages fluently.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

On Long-Tailed Phenomena in Neural Machine Translation

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze

State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge.

Conditional Text Generation Machine Translation +5

Paper
Code

Universal Phone Recognition with a Multilingual Allophone System

1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.

speech-recognition Speech Recognition

503

Paper
Code

Towards Zero-shot Learning for Automatic Phonemic Transcription

no code implementations • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze

The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.

Zero-Shot Learning

Paper
Add Code

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

no code implementations • 9 Nov 2019 • Siddharth Dalmia, Abdel-rahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer

Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability.

Paper
Add Code

SANTLR: Speech Annotation Toolkit for Low Resource Languages

no code implementations • 2 Aug 2019 • Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. black, Florian Metze

In this work, we present SANTLR: Speech Annotation Toolkit for Low Resource Languages.

speech-recognition Speech Recognition

Paper
Add Code

Multilingual Speech Recognition with Corpus Relatedness Sampling

no code implementations • 2 Aug 2019 • Xinjian Li, Siddharth Dalmia, Alan W. black, Florian Metze

For example, the target corpus might benefit more from a corpus in the same domain or a corpus from a close language.

speech-recognition Speech Recognition

Paper
Add Code

Cross-Attention End-to-End ASR for Two-Party Conversations

no code implementations • 24 Jul 2019 • Suyoun Kim, Siddharth Dalmia, Florian Metze

We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information.

speech-recognition Speech Recognition +1

Paper
Add Code

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

no code implementations • ACL 2019 • Suyoun Kim, Siddharth Dalmia, Florian Metze

We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings.

Sentence Sentence Embeddings +2

Paper
Add Code

The ARIEL-CMU Systems for LoReHLT18

no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Paper
Add Code

Phoneme Level Language Models for Sequence Based Low Resource ASR

no code implementations • 20 Feb 2019 • Siddharth Dalmia, Xinjian Li, Alan W. black, Florian Metze

Building multilingual and crosslingual models help bring different languages together in a language universal space.

Language Modelling

Paper
Add Code

Zero-shot Learning for Speech Recognition with Universal Phonetic Model

no code implementations • 27 Sep 2018 • Xinjian Li, Siddharth Dalmia, David R. Mortensen, Florian Metze, Alan W Black

Our model is able to recognize unseen phonemes in the target language, if only a small text corpus is available.

speech-recognition Speech Recognition +1

Paper
Add Code

Domain Robust Feature Extraction for Rapid Low Resource ASR Development

no code implementations • 28 Jul 2018 • Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. black

We demonstrate the effectiveness of using a pre-trained English recognizer, which is robust to such mismatched conditions, as a domain normalizing feature extractor on a low resource language.

Paper
Add Code

Epitran: Precision G2P for Many Languages

no code implementations • LREC 2018 • David R. Mortensen, Siddharth Dalmia, Patrick Littell

Entity Linking

Paper
Add Code

Sequence-based Multi-lingual Low Resource Speech Recognition

no code implementations • 21 Feb 2018 • Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. black

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains.

speech-recognition Speech Recognition

Paper
Add Code

An Approach for Self-Training Audio Event Detectors Using Web Data

no code implementations • 20 Sep 2016 • Benjamin Elizalde, Ankit Shah, Siddharth Dalmia, Min Hun Lee, Rohan Badlani, Anurag Kumar, Bhiksha Raj, Ian Lane

The audio event detectors are trained on the labeled audio and ran on the unlabeled audio downloaded from YouTube.

Event Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.