no code implementations • 20 Nov 2024 • Shantanu Jaiswal, Debaditya Roy, Basura Fernando, Cheston Tan
Complex visual reasoning and question answering (VQA) is a challenging task that requires compositional multi-step processing and higher-level reasoning capabilities beyond the immediate recognition and localization of objects and events.
no code implementations • 27 Aug 2024 • Aishik Nagar, Shantanu Jaiswal, Cheston Tan
We focus on two novel aspects of zero-shot visual reasoning: i) evaluating the impact of conveying scene information as either visual embeddings or purely textual scene descriptions to the underlying large language model (LLM) of the VLM, and ii) comparing the effectiveness of chain-of-thought prompting to standard prompting for zero-shot visual reasoning.
no code implementations • 15 Jun 2023 • Ishaan Singh Rawal, Alexander Matyasko, Shantanu Jaiswal, Basura Fernando, Cheston Tan
Consistent with the findings of QUAG, we find that most of the models achieve near-trivial performance on CLAVI.
no code implementations • 30 Nov 2022 • Shantanu Jaiswal, Liu Yan, Dongkyu Choi, Kenneth Kwok
Our resulting knowledge representation framework can encode a wider variety of world knowledge and represent beliefs flexibly using grounded concepts as well as free-text phrases.
1 code implementation • 26 Nov 2021 • Shantanu Jaiswal, Basura Fernando, Cheston Tan
Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computer-vision tasks.
6 code implementations • 17 Jul 2017 • Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, Shantanu Jaiswal
Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs.
Ranked #1 on
Malware Detection
on Android Malware Dataset