Search Results for author: Shalini Ghosh

Found 32 papers, 3 papers with code

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

no code implementations28 Mar 2024 Yash Jain, David Chan, Pranav Dheram, Aparna Khare, Olabanji Shonibare, Venkatesh Ravichandran, Shalini Ghosh

Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

no code implementations5 Jan 2024 Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke

In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text.

In-Context Learning intent-classification +6

Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition

1 code implementation4 Jan 2024 David M. Chan, Shalini Ghosh, Hitesh Tulsiani, Ariya Rastrow, Björn Hoffmeister

We demonstrate that our CLC family of approaches can improve the performance of ASR models on OD3, a new public large-scale semi-synthetic meta-dataset of audio task-oriented dialogues, by up to 19. 2%.

Attribute Automatic Speech Recognition +4

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

no code implementations23 Dec 2023 Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko

Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.

Attribute Language Modelling +4

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

no code implementations22 Dec 2023 Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

JAB: Joint Adversarial Prompting and Belief Augmentation

no code implementations16 Nov 2023 Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Jwala Dhamala, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta

With the recent surge of language models in different applications, attention to safety and robustness of these models has gained significant importance.

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

no code implementations27 Sep 2023 Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.

Ranked #3 on Speech Recognition on WSJ eval92 (using extra training data)

In-Context Learning speech-recognition +1

FLIRT: Feedback Loop In-context Red Teaming

no code implementations8 Aug 2023 Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta

Here we propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation.

In-Context Learning Response Generation

Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data

no code implementations4 Apr 2023 Vladislav Lialin, Stephen Rawls, David Chan, Shalini Ghosh, Anna Rumshisky, Wael Hamza

Currently popular video-text data mining approach via automatic speech recognition (ASR) used in HowTo100M provides low-quality captions that often do not refer to the video content.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

no code implementations6 Jan 2023 David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).

Domain Adaptation speech-recognition +1

Disentangled Action Recognition with Knowledge Bases

no code implementations NAACL 2022 Zhekun Luo, Shalini Ghosh, Devin Guillory, Keizo Kato, Trevor Darrell, Huijuan Xu

In this paper, we aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns that are unseen during training time, by leveraging the power of knowledge graphs.

Action Recognition Knowledge Graphs

Content-Context Factorized Representations for Automated Speech Recognition

no code implementations19 May 2022 David M. Chan, Shalini Ghosh

Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames.

speech-recognition Speech Recognition

Unified Modeling of Multi-Domain Multi-Device ASR Systems

no code implementations13 May 2022 Soumyajit Mitra, Swayambhu Nath Ray, Bharat Padi, Arunasish Sen, Raghavendra Bilgi, Harish Arsikere, Shalini Ghosh, Ajay Srinivasamurthy, Sri Garimella

Modern Automatic Speech Recognition (ASR) systems often use a portfolio of domain-specific models in order to get high accuracy for distinct user utterance types across different devices.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-Modal Pre-Training for Automated Speech Recognition

no code implementations12 Oct 2021 David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance.

Language Modelling Masked Language Modeling +3

Hierarchical Class-Based Curriculum Loss

no code implementations5 Jun 2020 Palash Goyal, Shalini Ghosh

We theoretically show that the proposed loss function is a tighter bound of 0-1 loss compared to any other loss satisfying the hierarchical constraints.

Cross-modal Learning for Multi-modal Video Categorization

no code implementations7 Mar 2020 Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee

Multi-modal machine learning (ML) models can process data in multiple modalities (e. g., video, audio, text) and are useful for video content analysis in a variety of problems (e. g., object detection, scene understanding, activity recognition).

Activity Recognition object-detection +2

Exploiting Temporal Coherence for Multi-modal Video Categorization

no code implementations7 Feb 2020 Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee

Multimodal ML models can process data in multiple modalities (e. g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e. g., object detection, scene understanding).

object-detection Object Detection +1

RILOD: Near Real-Time Incremental Learning for Object Detection at the Edge

no code implementations26 Mar 2019 Dawei Li, Serafettin Tasci, Shalini Ghosh, Jingwen Zhu, Junting Zhang, Larry Heck

The key component of RILOD is a novel incremental learning algorithm that trains end-to-end for one-stage deep object detection models only using training data of new object classes.

Incremental Learning Object +2

Regularize, Expand and Compress: Multi-task based Lifelong Learning via NonExpansive AutoML

no code implementations20 Mar 2019 Jie Zhang, Junting Zhang, Shalini Ghosh, Dawei Li, Jingwen Zhu, Heming Zhang, Yalin Wang

Lifelong learning, the problem of continual learning where tasks arrive in sequence, has been lately attracting more attention in the computer vision community.

AutoML Continual Learning

Class-incremental Learning via Deep Model Consolidation

2 code implementations19 Mar 2019 Junting Zhang, Jie Zhang, Shalini Ghosh, Dawei Li, Serafettin Tasci, Larry Heck, Heming Zhang, C. -C. Jay Kuo

The idea is to first train a separate model only for the new classes, and then combine the two individual models trained on data of two distinct set of classes (old classes and new classes) via a novel double distillation training objective.

Class Incremental Learning Image Classification +3

Generative Visual Dialogue System via Adaptive Reasoning and Weighted Likelihood Estimation

no code implementations26 Feb 2019 Heming Zhang, Shalini Ghosh, Larry Heck, Stephen Walsh, Junting Zhang, Jie Zhang, C. -C. Jay Kuo

The key challenge of generative Visual Dialogue (VD) systems is to respond to human queries with informative answers in natural and contiguous conversation flow.

Visual Dialog

Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention

no code implementations15 Feb 2019 Shalini Ghosh, Giedrius Burachas, Arijit Ray, Avi Ziskind

In this paper, we present a novel approach for the task of eXplainable Question Answering (XQA), i. e., generating natural language (NL) explanations for the Visual Question Answering (VQA) problem.

Explanation Generation Language Modelling +2

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

no code implementations ICCV 2019 Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh

Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.

Image Captioning Question Answering +2

MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression

no code implementations3 Feb 2019 Jie Zhang, Xiaolong Wang, Dawei Li, Shalini Ghosh, Abhishek Kolagunda, Yalin Wang

State-of-the-art deep model compression methods exploit the low-rank approximation and sparsity pruning to remove redundant parameters from a learned hidden layer.

Knowledge Distillation Model Compression

Data Masking with Privacy Guarantees

no code implementations8 Jan 2019 Anh T. Pham, Shalini Ghosh, Vinod Yegneswaran

In particular, we propose a method of masking the private data with privacy guarantee while ensuring that a classifier trained on the masked data is similar to the classifier trained on the original data, to maintain usability.

Time Series Deinterleaving of DNS Traffic

no code implementations16 Jul 2018 Amir Asiaee, Hardik Goel, Shalini Ghosh, Vinod Yegneswaran, Arindam Banerjee

Stream deinterleaving is an important problem with various applications in the cybersecurity domain.

BIG-bench Machine Learning Time Series +1

Trusted Neural Networks for Safety-Constrained Autonomous Control

no code implementations18 May 2018 Shalini Ghosh, Amaury Mercier, Dheeraj Pichapati, Susmit Jha, Vinod Yegneswaran, Patrick Lincoln

Experiments using our first approach of a multi-headed TNN model, on a dataset generated by a customized version of TORCS, show that (1) adding safety constraints to a neural network model results in increased performance and safety, and (2) the improvement increases with increasing importance of the safety constraints.

Self-Driving Cars

A Unified Framework for Domain Adaptation using Metric Learning on Manifolds

1 code implementation28 Apr 2018 Sridhar Mahadevan, Bamdev Mishra, Shalini Ghosh

We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be integrated by exploiting the curved Riemannian geometry of statistical manifolds.

Domain Adaptation Metric Learning +1

Contextual LSTM (CLSTM) models for Large scale NLP tasks

no code implementations19 Feb 2016 Shalini Ghosh, Oriol Vinyals, Brian Strope, Scott Roy, Tom Dean, Larry Heck

We evaluate CLSTM on three specific NLP tasks: word prediction, next sentence selection, and sentence topic prediction.

Paraphrase Generation Question Answering +2

Virus Detection in Multiplexed Nanowire Arrays using Hidden Semi-Markov models

no code implementations16 Jul 2014 Shalini Ghosh, Patrick Lincoln, Christian Petersen, Alfonso Valdes

In this paper, we address the problem of real-time detection of viruses docking to nanowires, especially when multiple viruses dock to the same nano-wire.

ARSENAL: Automatic Requirements Specification Extraction from Natural Language

no code implementations13 Mar 2014 Shalini Ghosh, Daniel Elenius, Wenchao Li, Patrick Lincoln, Natarajan Shankar, Wilfried Steiner

Requirements are informal and semi-formal descriptions of the expected behavior of a complex system from the viewpoints of its stakeholders (customers, users, operators, designers, and engineers).

Cannot find the paper you are looking for? You can Submit a new open access paper.