Search Results for author: Arushi Goel

Found 15 papers, 5 papers with code

Audio Dialogues: Dialogues dataset for audio and music understanding

no code implementations11 Apr 2024 Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Existing datasets for audio understanding primarily focus on single-turn interactions (i. e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue.

Audio captioning Audio Question Answering +3

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

no code implementations2 Feb 2024 Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.

Few-Shot Learning In-Context Learning +2

Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter

1 code implementation9 Nov 2023 Georgios Tziafas, Yucheng Xu, Arushi Goel, Mohammadreza Kasaei, Zhibin Li, Hamidreza Kasaei

To address these limitations, we develop a challenging benchmark based on cluttered indoor scenes from OCID dataset, for which we generate referring expressions and connect them with 4-DoF grasp poses.

Object Visual Grounding

Semi-supervised multimodal coreference resolution in image narrations

1 code implementation20 Oct 2023 Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i. e., a narration is paired with an image.

coreference-resolution Descriptive

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

1 code implementation ICCV 2023 Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13. 0% accuracy on our dataset.

Question Answering Retrieval +1

Who are you referring to? Coreference resolution in image narrations

no code implementations ICCV 2023 Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing.

coreference-resolution

WiCV 2022: The Tenth Women In Computer Vision Workshop

no code implementations24 Aug 2022 Doris Antensteiner, Silvia Bucci, Arushi Goel, Marah Halawa, Niveditha Kalavakonda, Tejaswi Kasarla, Miaomiao Liu, Nermin Samet, Ivaxi Sheth

In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2022, organized alongside the hybrid CVPR 2022 in New Orleans, Louisiana.

WiCV 2021: The Eighth Women In Computer Vision Workshop

no code implementations11 Mar 2022 Arushi Goel, Niveditha Kalavakonda, Nour Karessli, Tejaswi Kasarla, Kathryn Leonard, Boyi Li, Nermin Samet and, Ghada Zamzmi

In this paper, we present the details of Women in Computer Vision Workshop - WiCV 2021, organized alongside the virtual CVPR 2021.

PARS: Pseudo-Label Aware Robust Sample Selection for Learning with Noisy Labels

no code implementations26 Jan 2022 Arushi Goel, Yunlong Jiao, Jordan Massiah

In this paper, we propose PARS: Pseudo-Label Aware Robust Sample Selection, a hybrid approach that combines the best from all three worlds in a joint-training framework to achieve robustness to noisy labels.

Learning with noisy labels Pseudo Label

Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation

no code implementations CVPR 2022 Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding.

Graph Generation Informativeness +2

Injecting Prior Knowledge into Image Caption Generation

no code implementations22 Nov 2019 Arushi Goel, Basura Fernando, Thanh-Son Nguyen, Hakan Bilen

Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the visual and textual signals and the correlations between them.

Caption Generation Image Captioning

Cross-Domain Image Classification through Neural-Style Transfer Data Augmentation

1 code implementation12 Oct 2019 Yijie Xu, Arushi Goel

In particular, the lack of sufficient amounts of domain-specific data can reduce the accuracy of a classifier.

Classification Data Augmentation +3

An End-to-End Network for Generating Social Relationship Graphs

no code implementations CVPR 2019 Arushi Goel, Keng Teck Ma, Cheston Tan

Inferring the social context in a given visual scene not only involves recognizing objects, but also demands a more in-depth understanding of the relationships and attributes of the people involved.

Attribute Graph Generation +1

A Multimodal LSTM for Predicting Listener Empathic Responses Over Time

1 code implementation12 Dec 2018 Zhi-Xuan Tan, Arushi Goel, Thanh-Son Nguyen, Desmond C. Ong

People naturally understand the emotions of-and often also empathize with-those around them.

Cannot find the paper you are looking for? You can Submit a new open access paper.