no code implementations • 30 Nov 2023 • Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran
To provide a more thorough evaluation of the capabilities of long video retrieval systems, we propose a pipeline that leverages state-of-the-art large language models to carefully generate a diverse set of synthetic captions for long videos.
no code implementations • 16 Nov 2023 • Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran
The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences.
no code implementations • 16 Oct 2023 • Anirudh Som, Karan Sikka, Helen Gent, Ajay Divakaran, Andreas Kathol, Dimitra Vergyri
Paraphrasing of offensive content is a better alternative to content removal and helps improve civility in a communication environment.
no code implementations • 8 Sep 2023 • Abhinav Rajvanshi, Karan Sikka, Xiao Lin, Bhoram Lee, Han-Pang Chiu, Alvaro Velasquez
We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments.
1 code implementation • 8 Sep 2023 • Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran
Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs.
no code implementations • ICCV 2023 • Indranil Sur, Karan Sikka, Matthew Walmer, Kaushik Koneripalli, Anirban Roy, Xiao Lin, Ajay Divakaran, Susmit Jha
We present a Multimodal Backdoor Defense technique TIJO (Trigger Inversion using Joint Optimization).
1 code implementation • 19 Feb 2023 • Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, Malihe Alikhani
Content moderation is the process of flagging content based on pre-defined platform rules.
1 code implementation • CVPR 2022 • Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha
This is challenging for the attacker as the detector can distort or ignore the visual trigger entirely, which leads to models where backdoors are over-reliant on the language trigger.
no code implementations • 22 Oct 2021 • Pritish Sahu, Karan Sikka, Ajay Divakaran
We also observe a drop in performance across all the models when testing on RecipeQA and proposed Meta-RecipeQA (e. g. 83. 6% versus 67. 1% for HTRN), which shows that the proposed dataset is relatively less biased.
no code implementations • 20 Apr 2021 • Pritish Sahu, Karan Sikka, Ajay Divakaran
We then evaluate M3C using a textual cloze style question-answering task and highlight an inherent bias in the question answer generation method from [35] that enables a naive baseline to cheat by learning from only answer choices.
no code implementations • 29 Mar 2021 • Panagiota Kiourti, Wenchao Li, Anirban Roy, Karan Sikka, Susmit Jha
Recent studies have shown that neural networks are vulnerable to Trojan attacks, where a network is trained to respond to specially crafted trigger patterns in the inputs in specific and potentially malicious ways.
no code implementations • 3 Dec 2020 • Karan Sikka, Indranil Sur, Susmit Jha, Anirban Roy, Ajay Divakaran
We target the problem of detecting Trojans or backdoors in DNNs.
no code implementations • 21 Nov 2020 • Karan Sikka, Jihua Huang, Andrew Silberfarb, Prateeth Nayak, Luke Rohrer, Pritish Sahu, John Byrnes, Ajay Divakaran, Richard Rohwer
We improve zero-shot learning (ZSL) by incorporating common-sense knowledge in DNNs.
1 code implementation • 12 Sep 2020 • Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images.
no code implementations • 16 Mar 2020 • Karan Sikka, Andrew Silberfarb, John Byrnes, Indranil Sur, Ed Chow, Ajay Divakaran, Richard Rohwer
We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks that incorporates user-provided formal knowledge to improve learning from data.
no code implementations • IJCNLP 2019 • Arijit Ray, Karan Sikka, Ajay Divakaran, Stefan Lee, Giedrius Burachas
For instance, if a model answers "red" to "What color is the balloon?
1 code implementation • 14 Jul 2019 • Parneet Kaur, Karan Sikka, Weijun Wang, Serge Belongie, Ajay Divakaran
Food classification is a challenging problem due to the large number of categories, high visual similarity between different foods, as well as the lack of datasets for training state-of-the-art deep models.
no code implementations • 17 May 2019 • Karan Sikka, Lucas Van Bramer, Ajay Divakaran
We also show that the user embeddings learned within our joint multimodal embedding model are better at predicting user interests compared to those learned with unimodal content on Instagram data.
1 code implementation • IJCNLP 2019 • Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran
Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image.
no code implementations • ICCV 2019 • Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran
We propose a novel end-to-end model that uses caption-to-image retrieval as a `downstream' task to guide the process of phrase localization.
no code implementations • 8 Dec 2018 • Zachary Seymour, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
Furthermore, we present an extensive study demonstrating the contribution of each component of our model, showing $8$--$15\%$ and $4\%$ improvement from adding semantic information and our proposed attention module.
no code implementations • 4 Jul 2018 • Karuna Ahuja, Karan Sikka, Anirban Roy, Ajay Divakaran
We show that our model outperforms other baselines on the benchmark Ad dataset and also show qualitative results to highlight the advantages of using multihop co-attention.
no code implementations • ECCV 2018 • Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, Ajay Divakaran
We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training.
no code implementations • 23 Dec 2017 • Parneet Kaur, Karan Sikka, Ajay Divakaran
Food classification from images is a fine-grained classification problem.
no code implementations • CVPR 2017 • Amlan Kar, Nishant Rai, Karan Sikka, Gaurav Sharma
We propose a novel method for temporally pooling frames in a video for the task of human action recognition.
no code implementations • 8 Aug 2016 • Karan Sikka, Gaurav Sharma
We study the problem of video classification for facial analysis and human action recognition.
no code implementations • CVPR 2016 • Karan Sikka, Gaurav Sharma, Marian Bartlett
We study the problem of facial analysis in videos.
no code implementations • 17 Dec 2015 • Mohsen Malmir, Karan Sikka, Deborah Forster, Ian Fasel, Javier R. Movellan, Garrison W. Cottrell
The results of experiments suggest that the proposed model equipped with Dirichlet state encoding is superior in performance, and selects images that lead to better training and higher accuracy of label prediction at test time.
no code implementations • 24 Oct 2013 • Sahil Sikka, Karan Sikka, M. K. Bhuyan, Yuji Iwahori
In recent years, Printed Circuit Boards (PCB) have become the backbone of a large number of consumer electronic devices leading to a surge in their production.