1 code implementation • SIGDIAL (ACL) 2021 • Satwik Kottur, Chinnadhurai Sankar, Zhou Yu, Alborz Geramifard
Real-world conversational agents must effectively handle long conversations that span multiple contexts.
no code implementations • SIGDIAL (ACL) 2021 • Satwik Kottur, Paul Crook, Seungwhan Moon, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard
There is a growing interest in virtual assistants with multimodal capabilities, e. g., inferring the context of a conversation through scene understanding.
1 code implementation • 23 May 2023 • Hyundong Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Raghavi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar
Dialogue systems are frequently updated to accommodate new services, but naively updating them by continually training with data for new services in diminishing performance on previously learnt services.
1 code implementation • CVPR 2023 • Abhay Zala, Jaemin Cho, Satwik Kottur, Xilun Chen, Barlas Oğuz, Yasher Mehdad, Mohit Bansal
Our hierarchical benchmark consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.
1 code implementation • 15 Nov 2022 • Seungwhan Moon, Satwik Kottur, Alborz Geramifard, Babak Damavandi
Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections.
no code implementations • 8 Nov 2022 • Satwik Kottur, Seungwhan Moon, Aram H. Markosyan, Hardik Shah, Babak Damavandi, Alborz Geramifard
We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection.
no code implementations • NAACL 2022 • Kun Qian, Ahmad Beirami, Satwik Kottur, Shahin Shayandeh, Paul Crook, Alborz Geramifard, Zhou Yu, Chinnadhurai Sankar
We find that training on our augmented dialog data improves the model's ability to deal with ambiguous scenarios, without sacrificing performance on unmodified turns.
1 code implementation • 21 Oct 2021 • Tianjian Huang, Shaunak Halbe, Chinnadhurai Sankar, Pooyan Amini, Satwik Kottur, Alborz Geramifard, Meisam Razaviyayn, Ahmad Beirami
Our experiments show that DAIR consistently outperforms ERM and DA-ERM with little marginal computational cost and sets new state-of-the-art results in several benchmarks involving covariant data augmentation.
Ranked #1 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.2
Multi-domain Dialogue State Tracking Visual Question Answering
8 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
1 code implementation • EMNLP 2021 • Satwik Kottur, Seungwhan Moon, Alborz Geramifard, Babak Damavandi
Next generation task-oriented dialog systems need to understand conversational contexts with their perceived surroundings, to effectively help users in the real-world multimodal environment.
1 code implementation • ACL 2021 • Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, Satwik Kottur
A video-grounded dialogue system is required to understand both dialogue, which contains semantic dependencies from turn to turn, and video, which contains visual cues of spatial and temporal scene variations.
no code implementations • 12 Nov 2020 • Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng, Carla Gordon, Seyed Hossein Alavi, David Traum, Maxine Eskenazi, Ahmad Beirami, Eunjoon, Cho, Paul A. Crook, Ankita De, Alborz Geramifard, Satwik Kottur, Seungwhan Moon, Shivani Poddar, Rajen Subba
Interactive evaluation of dialog, and 4.
2 code implementations • COLING 2020 • Seungwhan Moon, Satwik Kottur, Paul A. Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard
Next generation virtual assistants are envisioned to handle multimodal inputs (e. g., vision, memories of previous interactions, in addition to the user's utterances), and perform multimodal actions (e. g., displaying a route in addition to generating the system's utterance).
1 code implementation • 4 Mar 2020 • Paul Pu Liang, Jeffrey Chen, Ruslan Salakhutdinov, Louis-Philippe Morency, Satwik Kottur
Several recent works have found the emergence of grounded compositional language in the communication protocols developed by mostly cooperative multi-agent systems when learned end-to-end to maximize performance on a downstream task.
1 code implementation • NAACL 2019 • Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.
1 code implementation • ECCV 2018 • Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
Visual dialog entails answering a series of questions grounded in an image, using dialog history as context.
Ranked #1 on Common Sense Reasoning on Visual Dialog v0.9
1 code implementation • EMNLP 2017 • Satwik Kottur, Jos{\'e} Moura, Stefan Lee, Dhruv Batra
A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, learned without any human supervision!
no code implementations • ICML 2017 • Manzil Zaheer, Satwik Kottur, Amr Ahmed, José Moura, Alex Smola
In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space.
3 code implementations • 26 Jun 2017 • Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, all learned without any human supervision!
7 code implementations • ICCV 2017 • Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.
6 code implementations • NeurIPS 2017 • Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola
Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong.
11 code implementations • CVPR 2017 • Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.
Ranked #15 on Visual Dialog on VisDial v0.9 val
1 code implementation • CVPR 2016 • Satwik Kottur, Ramakrishna Vedantam, José M. F. Moura, Devi Parikh
While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world.