Search Results for author: Tejas Gokhale

Found 25 papers, 16 papers with code

To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo

1 code implementation ACL 2022 Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who’s Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding +1

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

1 code implementation12 Apr 2024 Agneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance.

Monocular Depth Estimation

Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling

no code implementations11 Apr 2024 Sourajit Saha, Tejas Gokhale

Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift.

Image Classification Semantic Segmentation +1

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation1 Apr 2024 Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Adversarial Bayesian Augmentation for Single-Source Domain Generalization

1 code implementation ICCV 2023 Sheng Cheng, Tejas Gokhale, Yezhou Yang

Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings.

Data Augmentation Domain Generalization

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

1 code implementation7 Jun 2023 Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a. k. a.

Concept Alignment

End-to-end Knowledge Retrieval with Multi-modal Queries

1 code implementation1 Jun 2023 Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral

We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.

Benchmarking Cross-Modal Retrieval +2

Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling

1 code implementation30 Mar 2023 Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang

In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.

Continual Learning Data Poisoning +1

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation20 Dec 2022 Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.

Benchmarking Text-to-Image Generation

Improving Diversity with Adversarially Learned Transformations for Domain Generalization

1 code implementation15 Jun 2022 Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang

To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies.

Domain Generalization

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

1 code implementation30 Mar 2022 Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding +1

Improving Biomedical Information Retrieval with Neural Retrievers

no code implementations19 Jan 2022 Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral

We show that BM25 and our method can complement each other, and a simple hybrid model leads to further gains in the large corpus setting.

Biomedical Information Retrieval Information Retrieval +4

Unsupervised Natural Language Inference Using PHL Triplet Generation

1 code implementation Findings (ACL) 2022 Neeraj Varshney, Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Transformer-based models achieve impressive performance on numerous Natural Language Inference (NLI) benchmarks when trained on respective training datasets.

Natural Language Inference Sentence

Semantically Distributed Robust Optimization for Vision-and-Language Inference

1 code implementation Findings (ACL) 2022 Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.

Data Augmentation Natural Language Inference +2

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

no code implementations ICCV 2021 Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.

Question Answering Visual Question Answering +1

Self-Supervised Test-Time Learning for Reading Comprehension

no code implementations NAACL 2021 Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods.

Question Answering Reading Comprehension

WeaQA: Weak Supervision via Captions for Visual Question Answering

no code implementations Findings (ACL) 2021 Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets.

Question Answering Visual Question Answering

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

3 code implementations3 Dec 2020 Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes.

Attribute

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

2 code implementations EMNLP 2020 Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.

Out-of-Distribution Generalization Question Answering +1

Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships

no code implementations18 Apr 2020 Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan

The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image.

Image Generation Semantic Segmentation

VQA-LOL: Visual Question Answering under the Lens of Logic

no code implementations ECCV 2020 Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr\'echet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation.

Negation Question Answering +2

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

no code implementations28 May 2019 Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.

Cannot find the paper you are looking for? You can Submit a new open access paper.