1 code implementation • ACL 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
We find that the original Who’s Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.
1 code implementation • 12 Apr 2024 • Agneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang
Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance.
no code implementations • 11 Apr 2024 • Sourajit Saha, Tejas Gokhale
Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift.
1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.
1 code implementation • ICCV 2023 • Sheng Cheng, Tejas Gokhale, Yezhou Yang
Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings.
1 code implementation • 7 Jun 2023 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a. k. a.
1 code implementation • 1 Jun 2023 • Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral
We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.
1 code implementation • 30 Mar 2023 • Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang
In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.
1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang
We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
1 code implementation • 7 Nov 2022 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
Videos often capture objects, their visible properties, their motion, and the interactions between different objects.
Ranked #1 on Counterfactual Planning on CRIPP-VQA
1 code implementation • 15 Jun 2022 • Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang
To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies.
1 code implementation • 30 Mar 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.
no code implementations • Findings (ACL) 2022 • Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva, Chitta Baral
However, the effect of data modification on adversarial robustness remains unclear.
no code implementations • 19 Jan 2022 • Man Luo, Arindam Mitra, Tejas Gokhale, Chitta Baral
We show that BM25 and our method can complement each other, and a simple hybrid model leads to further gains in the large corpus setting.
1 code implementation • Findings (ACL) 2022 • Neeraj Varshney, Pratyay Banerjee, Tejas Gokhale, Chitta Baral
Transformer-based models achieve impressive performance on numerous Natural Language Inference (NLI) benchmarks when trained on respective training datasets.
1 code implementation • Findings (ACL) 2022 • Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang
Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.
no code implementations • ICCV 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.
no code implementations • NAACL 2021 • Pratyay Banerjee, Tejas Gokhale, Chitta Baral
Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods.
no code implementations • Findings (ACL) 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets.
3 code implementations • 3 Dec 2020 • Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang
While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes.
2 code implementations • EMNLP 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.
no code implementations • 18 Apr 2020 • Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan
The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image.
2 code implementations • EMNLP 2020 • Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene.
no code implementations • ECCV 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr\'echet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation.
no code implementations • 28 May 2019 • Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral
The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.