no code implementations • LREC 2022 • Siddhant Arora, Henry Hosseini, Christine Utz, Vinayshekhar Bannihatti Kumar, Tristan Dhellemmes, Abhilasha Ravichander, Peter Story, Jasmine Mangat, Rex Chen, Martin Degeling, Thomas Norton, Thomas Hupperich, Shomir Wilson, Norman Sadeh
Over the past decade, researchers have started to explore the use of NLP to develop tools aimed at helping the public, vendors, and regulators analyze disclosures made in privacy policies.
1 code implementation • 25 Feb 2024 • Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro
We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text.
no code implementations • 30 Jan 2024 • Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe
In this work, we aim to improve the performance and efficiency of OWSM without extra training data.
no code implementations • 15 Dec 2023 • Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe
While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusual pronunciations.
no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe
Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.
Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe
Pre-training speech models on large volumes of data has achieved remarkable success.
no code implementations • 19 Sep 2023 • Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury
Non-autoregressive (NAR) modeling has gained significant interest in speech processing since these models achieve dramatically lower inference time than autoregressive (AR) models while also achieving good transcription accuracy.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 18 Sep 2023 • Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee
To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.
no code implementations • 16 Sep 2023 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
Because the decoder architecture is the same as an autoregressive LM, it is simple to enhance the model by leveraging external text data with LM training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 24 Jul 2023 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe
Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning.
no code implementations • 20 Jul 2023 • Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework.
no code implementations • 17 Jul 2023 • Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj
End-to-end speech summarization has been shown to improve performance over cascade baselines.
no code implementations • 2 Jun 2023 • Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
We reduce the model size by applying tensor decomposition to the Conformer and E-Branchformer architectures used in our E2E SLU models.
2 code implementations • 18 May 2023 • Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 May 2023 • Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe
In the track, we adopt a pipeline approach of ASR and NLU.
no code implementations • 2 May 2023 • Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe
Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 1 May 2023 • Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe
Most human interactions occur in the form of spoken conversations where the semantic meaning of a given utterance depends on the context.
no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
1 code implementation • 16 Nov 2022 • Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe
In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner.
no code implementations • 10 Nov 2022 • Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive and time-consuming.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 29 Oct 2022 • Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe
This paper presents BERT-CTC, a novel formulation of end-to-end speech recognition that adapts BERT for connectionist temporal classification (CTC).
1 code implementation • 27 Oct 2022 • Siddhant Arora, Siddharth Dalmia, Brian Yan, Florian Metze, Alan W Black, Shinji Watanabe
End-to-end spoken language understanding (SLU) systems are gaining popularity over cascaded approaches due to their simplicity and ability to avoid error propagation.
no code implementations • 14 Jul 2022 • Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe
End-to-end (E2E) models are becoming increasingly popular for spoken language understanding (SLU) systems and are beginning to achieve competitive performance to pipeline-based approaches.
no code implementations • 28 Jun 2022 • Sonu Gupta, Ellen Poplavska, Nora O'Toole, Siddhant Arora, Thomas Norton, Norman Sadeh, Shomir Wilson
To examine the status and evolution of this patchwork, we introduce the Government Privacy Instructions Corpus, or GPI Corpus, of 1, 043 privacy laws, regulations, and guidelines, covering 182 jurisdictions.
no code implementations • 19 Apr 2022 • Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora
Although Transformers have gained success in several speech processing tasks like spoken language understanding (SLU) and speech translation (ST), achieving online processing while keeping competitive performance is still essential for real-world interaction.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 17 Dec 2021 • Siddhant Arora, Danish Pruthi, Norman Sadeh, William W. Cohen, Zachary C. Lipton, Graham Neubig
Through our evaluation, we observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
2 code implementations • 29 Nov 2021 • Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe
However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.
no code implementations • 29 Jun 2021 • Siddhant Arora, Alissa Ostapenko, Vijay Viswanathan, Siddharth Dalmia, Florian Metze, Shinji Watanabe, Alan W Black
Our splits identify performance gaps up to 10% between end-to-end systems that were within 1% of each other on the original test sets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 30 Apr 2021 • Siddhant Arora, Vinayak Gupta, Garima Gaur, Srikanta Bedathur
In this paper, we address the problem of learning low dimension representation of entities on relational databases consisting of multiple tables.
no code implementations • 24 Jul 2020 • Siddhant Arora
Knowledge Graphs are increasingly becoming popular for a variety of downstream tasks like Question Answering and Information Retrieval.
no code implementations • AKBC 2020 • Siddhant Arora, Srikanta Bedathur, Maya Ramanath, Deepak Sharma
Knowledge Graphs (KGs) extracted from text sources are often noisy and lead to poor performance in downstream application tasks such as KG-based question answering. While much of the recent activity is focused on addressing the sparsity of KGs by using embeddings for inferring new facts, the issue of cleaning up of noise in KGs through KG refinement task is not as actively studied.
no code implementations • 13 May 2020 • Siddhant Arora, Srikanta Bedathur
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
no code implementations • 11 Apr 2019 • Siddhant Arora, Andrew Yates
We consider algorithm selection in the context of ad-hoc information retrieval.