Search Results for author: Subhashini Venugopalan

Found 32 papers, 14 papers with code

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

1 code implementation12 Jul 2024 Shraman Pramanick, Rama Chellappa, Subhashini Venugopalan

Leveraging the breadth of expertise and ability of multimodal large language models (MLLMs) to understand figures, we employ automatic and manual curation to create the dataset.

Question Answering Visual Question Answering (VQA)

Quantum Many-Body Physics Calculations with Large Language Models

no code implementations5 Mar 2024 Haining Pan, Nayantara Mudur, Will Taranto, Maria Tikhanovskaya, Subhashini Venugopalan, Yasaman Bahri, Michael P. Brenner, Eun-Ah Kim

We evaluate GPT-4's performance in executing the calculation for 15 research papers from the past decade, demonstrating that, with correction of intermediate steps, it can correctly derive the final Hartree-Fock Hamiltonian in 13 cases and makes minor errors in 2 cases.

Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion

no code implementations21 Dec 2023 Katrin Tomanek, Shanqing Cai, Subhashini Venugopalan

Abbreviation expansion is a strategy used to speed up communication by limiting the amount of typing and using a language model to suggest expansions.

Language Modelling Retrieval

Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

no code implementations3 Dec 2023 Shanqing Cai, Subhashini Venugopalan, Katie Seaver, Xiang Xiao, Katrin Tomanek, Sri Jalasutram, Meredith Ringel Morris, Shaun Kane, Ajit Narayanan, Robert L. MacDonald, Emily Kornman, Daniel Vance, Blair Casey, Steve M. Gleason, Philip Q. Nelson, Michael P. Brenner

A pilot study with 19 non-AAC participants typing on a mobile device by hand demonstrated gains in motor savings in line with the offline simulation, while introducing relatively small effects on overall typing speed.

Speech Intelligibility Classifiers from 550k Disordered Speech Samples

no code implementations13 Mar 2023 Subhashini Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J. N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner

We developed dysarthric speech intelligibility classifiers on 551, 176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a five-point scale.

Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings

no code implementations10 Mar 2023 Joel Shor, Ruyue Agnes Bi, Subhashini Venugopalan, Steven Ibara, Roman Goldenberg, Ehud Rivlin

We demonstrate that this metric more closely aligns with clinician preferences on medical sentences as compared to other metrics (WER, BLUE, METEOR, etc), sometimes by wide margins.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Is Attention All That NeRF Needs?

1 code implementation27 Jul 2022 Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages.

Generalizable Novel View Synthesis Inductive Bias +1

Context-Aware Abbreviation Expansion Using Large Language Models

no code implementations NAACL 2022 Shanqing Cai, Subhashini Venugopalan, Katrin Tomanek, Ajit Narayanan, Meredith Ringel Morris, Michael P. Brenner

Motivated by the need for accelerating text entry in augmentative and alternative communication (AAC) for people with severe motor impairments, we propose a paradigm in which phrases are abbreviated aggressively as primarily word-initial letters.

TRILLsson: Distilled Universal Paralinguistic Speech Representations

no code implementations1 Mar 2022 Joel Shor, Subhashini Venugopalan

Our largest distilled model is less than 15% the size of the original model (314MB vs 2. 2GB), achieves over 96% the accuracy on 6 of 7 tasks, and is trained on 6. 5% the data.

Emotion Recognition Knowledge Distillation

Using a Cross-Task Grid of Linear Probes to Interpret CNN Model Predictions On Retinal Images

no code implementations23 Jul 2021 Katy Blumer, Subhashini Venugopalan, Michael P. Brenner, Jon Kleinberg

We find that some target tasks are easily predicted irrespective of the source task, and that some other target tasks are more accurately predicted from correlated source tasks than from embeddings trained on the same task.


Guided Integrated Gradients: An Adaptive Path Method for Removing Noise

1 code implementation CVPR 2021 Andrei Kapishnikov, Subhashini Venugopalan, Besim Avci, Ben Wedin, Michael Terry, Tolga Bolukbasi

To minimize the effect of this source of noise, we propose adapting the attribution path itself -- conditioning the path not just on the image but also on the model being explained.

Scaling Symbolic Methods using Gradients for Neural Model Explanation

2 code implementations ICLR 2021 Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for model explanation.

Attribution in Scale and Space

1 code implementation CVPR 2020 Shawn Xu, Subhashini Venugopalan, Mukund Sundararajan

Third, it eliminates the need for a 'baseline' parameter for Integrated Gradients [31] for perception tasks.

Attribute Object Recognition

Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions

no code implementations30 Aug 2016 Ronghang Hu, Marcus Rohrbach, Subhashini Venugopalan, Trevor Darrell

Image segmentation from referring expressions is a joint vision and language modeling task, where the input is an image and a textual expression describing a particular region in the image; and the goal is to localize and segment the specific image region based on the given expression.

Image Captioning Image Segmentation +3

Captioning Images with Diverse Objects

1 code implementation CVPR 2017 Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, Kate Saenko

We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets.

Object Object Recognition

Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text

3 code implementations EMNLP 2016 Subhashini Venugopalan, Lisa Anne Hendricks, Raymond Mooney, Kate Saenko

This paper investigates how linguistic knowledge mined from large text corpora can aid the generation of natural language descriptions of videos.

Descriptive Language Modelling +1

Sequence to Sequence - Video to Text

no code implementations ICCV 2015 Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Caption Generation Language Modelling +1

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

1 code implementation CVPR 2016 Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell

Current deep caption models can only describe objects contained in paired image-sentence corpora, despite the fact that they are pre-trained with large object recognition datasets, namely ImageNet.

Image Captioning Novel Concepts +3

A Multi-scale Multiple Instance Video Description Network

no code implementations21 May 2015 Huijuan Xu, Subhashini Venugopalan, Vasili Ramanishka, Marcus Rohrbach, Kate Saenko

Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (AlexNet, GoogLeNet) to extract a visual representation of the input video.

Image Segmentation Multiple Instance Learning +3

Sequence to Sequence -- Video to Text

4 code implementations3 May 2015 Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Our LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip.

Caption Generation Language Modelling +1

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

7 code implementations CVPR 2015 Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.

Retrieval Video Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.