Search Results for author: Karan Sikka

Found 29 papers, 6 papers with code

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

no code implementations30 Nov 2023 Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran

To provide a more thorough evaluation of the capabilities of long video retrieval systems, we propose a pipeline that leverages state-of-the-art large language models to carefully generate a diverse set of synthetic captions for long videos.

Benchmarking Retrieval +2

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

no code implementations16 Nov 2023 Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences.

Language Modelling

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

no code implementations16 Oct 2023 Anirudh Som, Karan Sikka, Helen Gent, Ajay Divakaran, Andreas Kathol, Dimitra Vergyri

Paraphrasing of offensive content is a better alternative to content removal and helps improve civility in a communication environment.

In-Context Learning

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

1 code implementation8 Sep 2023 Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs.

Visual Reasoning

SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments

no code implementations8 Sep 2023 Abhinav Rajvanshi, Karan Sikka, Xiao Lin, Bhoram Lee, Han-Pang Chiu, Alvaro Velasquez

We evaluate SayNav on multi-object navigation (MultiON) task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment.

Common Sense Reasoning Navigate

Dual-Key Multimodal Backdoors for Visual Question Answering

1 code implementation CVPR 2022 Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha

This is challenging for the attacker as the detector can distort or ignore the visual trigger entirely, which leads to models where backdoors are over-reliant on the language trigger.

Question Answering Visual Question Answering

Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To Benchmark

no code implementations22 Oct 2021 Pritish Sahu, Karan Sikka, Ajay Divakaran

We also observe a drop in performance across all the models when testing on RecipeQA and proposed Meta-RecipeQA (e. g. 83. 6% versus 67. 1% for HTRN), which shows that the proposed dataset is relatively less biased.

Answer Generation Machine Reading Comprehension +2

Towards Solving Multimodal Comprehension

no code implementations20 Apr 2021 Pritish Sahu, Karan Sikka, Ajay Divakaran

We then evaluate M3C using a textual cloze style question-answering task and highlight an inherent bias in the question answer generation method from [35] that enables a naive baseline to cheat by learning from only answer choices.

16k Answer Generation +3

MISA: Online Defense of Trojaned Models using Misattributions

no code implementations29 Mar 2021 Panagiota Kiourti, Wenchao Li, Anirban Roy, Karan Sikka, Susmit Jha

Recent studies have shown that neural networks are vulnerable to Trojan attacks, where a network is trained to respond to specially crafted trigger patterns in the inputs in specific and potentially malicious ways.

Traffic Sign Recognition

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

1 code implementation12 Sep 2020 Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images.

Visual Localization

Deep Adaptive Semantic Logic (DASL): Compiling Declarative Knowledge into Deep Neural Networks

no code implementations16 Mar 2020 Karan Sikka, Andrew Silberfarb, John Byrnes, Indranil Sur, Ed Chow, Ajay Divakaran, Richard Rohwer

We introduce Deep Adaptive Semantic Logic (DASL), a novel framework for automating the generation of deep neural networks that incorporates user-provided formal knowledge to improve learning from data.

Image Classification Relationship Detection +1

FoodX-251: A Dataset for Fine-grained Food Classification

1 code implementation14 Jul 2019 Parneet Kaur, Karan Sikka, Weijun Wang, Serge Belongie, Ajay Divakaran

Food classification is a challenging problem due to the large number of categories, high visual similarity between different foods, as well as the lack of datasets for training state-of-the-art deep models.

Classification Fine-Grained Visual Categorization +1

Deep Unified Multimodal Embeddings for Understanding both Content and Users in Social Media Networks

no code implementations17 May 2019 Karan Sikka, Lucas Van Bramer, Ajay Divakaran

We also show that the user embeddings learned within our joint multimodal embedding model are better at predicting user interests compared to those learned with unimodal content on Instagram data.

Cross-Modal Retrieval Retrieval

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

1 code implementation IJCNLP 2019 Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran

Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image.

Intent Detection

Semantically-Aware Attentive Neural Embeddings for Image-based Visual Localization

no code implementations8 Dec 2018 Zachary Seymour, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

Furthermore, we present an extensive study demonstrating the contribution of each component of our model, showing $8$--$15\%$ and $4\%$ improvement from adding semantic information and our proposed attention module.

Deep Attention Image-Based Localization +1

Understanding Visual Ads by Aligning Symbols and Objects using Co-Attention

no code implementations4 Jul 2018 Karuna Ahuja, Karan Sikka, Anirban Roy, Ajay Divakaran

We show that our model outperforms other baselines on the benchmark Ad dataset and also show qualitative results to highlight the advantages of using multihop co-attention.

Zero-Shot Object Detection

no code implementations ECCV 2018 Ankan Bansal, Karan Sikka, Gaurav Sharma, Rama Chellappa, Ajay Divakaran

We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training.

Object object-detection +2

Discriminatively Trained Latent Ordinal Model for Video Classification

no code implementations8 Aug 2016 Karan Sikka, Gaurav Sharma

We study the problem of video classification for facial analysis and human action recognition.

Action Recognition Classification +5

Deep Active Object Recognition by Joint Label and Action Prediction

no code implementations17 Dec 2015 Mohsen Malmir, Karan Sikka, Deborah Forster, Ian Fasel, Javier R. Movellan, Garrison W. Cottrell

The results of experiments suggest that the proposed model equipped with Dirichlet state encoding is superior in performance, and selects images that lead to better training and higher accuracy of label prediction at test time.

Object Object Recognition

Pseudo vs. True Defect Classification in Printed Circuits Boards using Wavelet Features

no code implementations24 Oct 2013 Sahil Sikka, Karan Sikka, M. K. Bhuyan, Yuji Iwahori

In recent years, Printed Circuit Boards (PCB) have become the backbone of a large number of consumer electronic devices leading to a surge in their production.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.