1 code implementation • 1 Dec 2023 • Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang
We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task.
1 code implementation • 27 May 2023 • Hammad A. Ayyubi, Rahul Lokesh, Alireza Zareian, Bo Wu, Shih-Fu Chang
The difficulty is progressively increased with each new phase by adding one more concept per caption.
2 code implementations • 24 May 2023 • Haoxuan You, Rui Sun, Zhecan Wang, Long Chen, Gengyu Wang, Hammad A. Ayyubi, Kai-Wei Chang, Shih-Fu Chang
Specifically, IdealGPT utilizes an LLM to generate sub-questions, a VLM to provide corresponding sub-answers, and another LLM to reason to achieve the final answer.
no code implementations • 14 Jun 2022 • Hammad A. Ayyubi, Christopher Thomas, Lovish Chum, Rahul Lokesh, Long Chen, Yulei Niu, Xudong Lin, Xuande Feng, Jaywon Koo, Sounak Ray, Shih-Fu Chang
To support research on this task, we introduce the Multimodal Hierarchical Events (MultiHiEve) dataset.
no code implementations • 4 Apr 2020 • Hammad A. Ayyubi, Md. Mehrab Tanjim, Julian J. McAuley, Garrison W. Cottrell
Despite recent advances in Visual QuestionAnswering (VQA), it remains a challenge todetermine how much success can be attributedto sound reasoning and comprehension ability. We seek to investigate this question by propos-ing a new task ofrationale generation.
no code implementations • ICLR Workshop DeepDiffEq 2019 • Hammad A. Ayyubi, Yi Yao, Ajay Divakaran
Neural Ordinary Differential Equations (NODEs) have proven to be a powerful modeling tool for approximating (interpolation) and forecasting (extrapolation) irregularly sampled time series data.
no code implementations • 21 Oct 2019 • Hammad A. Ayyubi
Generative Adversarial Networks (GANs) have been used extensively and quite successfully for unsupervised learning.
no code implementations • 21 Oct 2019 • Hammad A. Ayyubi, Md. Mehrab Tanjim, David J. Kriegman
The task of Visual Commonsense Reasoning is extremely challenging in the sense that the model has to not only be able to answer a question given an image, but also be able to learn to reason.