Search Results for author: Babak Damavandi

Found 8 papers, 4 papers with code

SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

no code implementations7 Mar 2024 JieLin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei LI, Babak Damavandi, Seungwhan Moon

We have developed the \textbf{SnapNTell Dataset}, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses.

Question Answering Retrieval +1

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

no code implementations27 Sep 2023 Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i. e. text, image, video, audio, IMU motion sensor), and generates textual responses.

Language Modelling Video Question Answering

Navigating Connected Memories with a Task-oriented Dialog System

1 code implementation15 Nov 2022 Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi

Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections.

Retrieval

Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation

no code implementations8 Nov 2022 Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard

We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection.

Benchmarking Retrieval

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

1 code implementation26 Oct 2022 Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, Babak Damavandi

We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP).

Activity Recognition Contrastive Learning +1

Connecting What to Say With Where to Look by Modeling Human Attention Traces

1 code implementation CVPR 2021 Zihang Meng, Licheng Yu, Ning Zhang, Tamara Berg, Babak Damavandi, Vikas Singh, Amy Bearman

Learning the grounding of each word is challenging, due to noise in the human-provided traces and the presence of words that cannot be meaningfully visually grounded.

Image Captioning Visual Grounding

SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

1 code implementation EMNLP 2021 Satwik Kottur, Seungwhan Moon, Alborz Geramifard, Babak Damavandi

Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment.

Language Modelling

NN-grams: Unifying neural network and n-gram language models for Speech Recognition

no code implementations23 Jun 2016 Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier

The model is trained using noise contrastive estimation (NCE), an approach that transforms the estimation problem of neural networks into one of binary classification between data samples and noise samples.

Binary Classification Language Modelling +3

Cannot find the paper you are looking for? You can Submit a new open access paper.