Search Results for author: Tejas Gokhale

Found 25 papers, 16 papers with code

To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo

1 code implementation • ACL 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who’s Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding +1

Paper
Code

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

1 code implementation • 12 Apr 2024 • Agneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance.

Monocular Depth Estimation

Paper
Code

Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling

no code implementations • 11 Apr 2024 • Sourajit Saha, Tejas Gokhale

Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift.

Image Classification Semantic Segmentation +1

Paper
Add Code

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Paper
Code

Adversarial Bayesian Augmentation for Single-Source Domain Generalization

1 code implementation • ICCV 2023 • Sheng Cheng, Tejas Gokhale, Yezhou Yang

Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings.

Data Augmentation Domain Generalization

Paper
Code

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

1 code implementation • 7 Jun 2023 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a. k. a.

Concept Alignment

Paper
Code

End-to-end Knowledge Retrieval with Multi-modal Queries

1 code implementation • 1 Jun 2023 • Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral

We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.

Benchmarking Cross-Modal Retrieval +2

Paper
Code

Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling

1 code implementation • 30 Mar 2023 • Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang

In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.

Continual Learning Data Poisoning +1

Paper
Code

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.

Benchmarking Text-to-Image Generation

Paper
Code

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

1 code implementation • 7 Nov 2022 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

Videos often capture objects, their visible properties, their motion, and the interactions between different objects.

Ranked #1 on Counterfactual Planning on CRIPP-VQA

Add - PO Add - PQ +12

Paper
Code

Improving Diversity with Adversarially Learned Transformations for Domain Generalization

1 code implementation • 15 Jun 2022 • Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang

To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies.

Domain Generalization

Paper
Code

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

1 code implementation • 30 Mar 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding +1

Paper
Code

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

no code implementations • Findings (ACL) 2022 • Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva, Chitta Baral

However, the effect of data modification on adversarial robustness remains unclear.

Adversarial Robustness Data Augmentation +4

Paper
Add Code

Improving Biomedical Information Retrieval with Neural Retrievers

no code implementations • 19 Jan 2022 • Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral

We show that BM25 and our method can complement each other, and a simple hybrid model leads to further gains in the large corpus setting.

Biomedical Information Retrieval Information Retrieval +4

Paper
Add Code

Unsupervised Natural Language Inference Using PHL Triplet Generation

1 code implementation • Findings (ACL) 2022 • Neeraj Varshney, Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Transformer-based models achieve impressive performance on numerous Natural Language Inference (NLI) benchmarks when trained on respective training datasets.

Natural Language Inference Sentence

Paper
Code

Semantically Distributed Robust Optimization for Vision-and-Language Inference

1 code implementation • Findings (ACL) 2022 • Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.

Data Augmentation Natural Language Inference +2

Paper
Code

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

no code implementations • ICCV 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.

Question Answering Visual Question Answering +1

Paper
Add Code

Self-Supervised Test-Time Learning for Reading Comprehension

no code implementations • NAACL 2021 • Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods.

Question Answering Reading Comprehension

Paper
Add Code

WeaQA: Weak Supervision via Captions for Visual Question Answering

no code implementations • Findings (ACL) 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets.

Question Answering Visual Question Answering

Paper
Add Code

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

3 code implementations • 3 Dec 2020 • Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes.

Attribute

Paper
Code

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

2 code implementations • EMNLP 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.

Out-of-Distribution Generalization Question Answering +1

Paper
Code

Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships

no code implementations • 18 Apr 2020 • Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan

The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image.

Image Generation Semantic Segmentation

Paper
Add Code

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

2 code implementations • EMNLP 2020 • Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene.

Question Answering Video Captioning +1

Paper
Code

VQA-LOL: Visual Question Answering under the Lens of Logic

no code implementations • ECCV 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr\'echet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation.

Negation Question Answering +2

Paper
Add Code

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

no code implementations • 28 May 2019 • Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.