Search Results for author: Siddhant Arora

Found 37 papers, 8 papers with code

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting

no code implementations18 Jun 2024 Yosuke Kashiwagi, Hayato Futami, Emiru Tsunoo, Siddhant Arora, Shinji Watanabe

End-to-end multilingual speech recognition models handle multiple languages through a single model, often incorporating language identification to automatically detect the language of incoming speech.

Decoder Language Identification +2

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

no code implementations14 Jun 2024 Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

To answer this, we perform an extensive evaluation of multiple supervised and self-supervised SFMs using several evaluation protocols: (i) frozen SFMs with a lightweight prediction head, (ii) frozen SFMs with a complex prediction head, and (iii) fine-tuned SFMs with a lightweight prediction head.

Benchmarking speech-recognition +2

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

no code implementations15 Dec 2023 Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe

While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusual pronunciations.

speech-recognition Speech Recognition

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

no code implementations4 Oct 2023 Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.

 Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Semi-Autoregressive Streaming ASR With Label Context

no code implementations19 Sep 2023 Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury

Non-autoregressive (NAR) modeling has gained significant interest in speech processing since these models achieve dramatically lower inference time than autoregressive (AR) models while also achieving good transcription accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation18 Sep 2023 Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

no code implementations16 Sep 2023 Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Because the decoder architecture is the same as an autoregressive LM, it is simple to enhance the model by leveraging external text data with LM training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

no code implementations24 Jul 2023 Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning.

Automatic Speech Recognition Decoder +2

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

no code implementations20 Jul 2023 Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework.

speech-recognition Speech Recognition +1

BASS: Block-wise Adaptation for Speech Summarization

no code implementations17 Jul 2023 Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

2 code implementations18 May 2023 Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

no code implementations1 May 2023 Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe

Most human interactions occur in the form of spoken conversations where the semantic meaning of a given utterance depends on the context.

Spoken Language Understanding

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

no code implementations20 Dec 2022 Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

Dialog Act Classification Question Answering +4

Streaming Joint Speech Recognition and Disfluency Detection

1 code implementation16 Nov 2022 Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe

In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner.

Decoder Language Modelling +2

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models

1 code implementation27 Oct 2022 Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe

End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation.

named-entity-recognition Named Entity Recognition +2

Two-Pass Low Latency End-to-End Spoken Language Understanding

no code implementations14 Jul 2022 Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe

End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches.

speech-recognition Speech Recognition +2

Creation and Analysis of an International Corpus of Privacy Laws

no code implementations28 Jun 2022 Sonu Gupta, Ellen Poplavska, Nora O'Toole, Siddhant Arora, Thomas Norton, Norman Sadeh, Shomir Wilson

To examine the status and evolution of this patchwork, we introduce the Government Privacy Instructions Corpus, or GPI Corpus, of 1, 043 privacy laws, regulations, and guidelines, covering 182 jurisdictions.

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

no code implementations19 Apr 2022 Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora

Although Transformers have gained success in several speech processing tasks like spoken language understanding (SLU) and speech translation (ST), achieving online processing while keeping competitive performance is still essential for real-world interaction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

1 code implementation17 Dec 2021 Siddhant Arora, Danish Pruthi, Norman Sadeh, William W. Cohen, Zachary C. Lipton, Graham Neubig

Through our evaluation, we observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.

Deception Detection

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

2 code implementations29 Nov 2021 Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.

Spoken Language Understanding

BERT Meets Relational DB: Contextual Representations of Relational Databases

no code implementations30 Apr 2021 Siddhant Arora, Vinayak Gupta, Garima Gaur, Srikanta Bedathur

In this paper, we address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables.

Representation Learning

A Survey on Graph Neural Networks for Knowledge Graph Completion

no code implementations24 Jul 2020 Siddhant Arora

Knowledge Graphs are increasingly becoming popular for a variety of downstream tasks like Question Answering and Information Retrieval.

Information Retrieval Knowledge Base Completion +2

IterefinE: Iterative KG Refinement Embeddings using Symbolic Knowledge

no code implementations AKBC 2020 Siddhant Arora, Srikanta Bedathur, Maya Ramanath, Deepak Sharma

Knowledge Graphs (KGs) extracted from text sources are often noisy and lead to poor performance in downstream application tasks such as KG-based question answering. While much of the recent activity is focused on addressing the sparsity of KGs by using embeddings for inferring new facts, the issue of cleaning up of noise in KGs through KG refinement task is not as actively studied.

Knowledge Graphs

On Embeddings in Relational Databases

no code implementations13 May 2020 Siddhant Arora, Srikanta Bedathur

We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.

Cannot find the paper you are looking for? You can Submit a new open access paper.