Search Results for author: Ranjay Krishna

Found 72 papers, 33 papers with code

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

1 code implementation9 Jul 2024 Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James Glass

We find that a linear classifier based on these lookback ratio features is as effective as a richer detector that utilizes the entire hidden states of an LLM or a text-based entailment model.

Hallucination

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

no code implementations9 Jul 2024 Yu-Guan Hsieh, Cheng-Yu Hsieh, Shih-Ying Yeh, Louis Béthune, Hadi Pour Ansari, Pavan Kumar Anasosalu Vasu, Chun-Liang Li, Ranjay Krishna, Oncel Tuzel, Marco Cuturi

The nodes in GBC are created using, in a first stage, object detection and dense captioning tools nested recursively to uncover and describe entity nodes, further linked together in a second stage by highlighting, using new types of nodes, compositions and relations among entities.

Dense Captioning object-detection +1

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

no code implementations27 Jun 2024 Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

Large-scale endeavors like RT-1 and widespread community efforts such as Open-X-Embodiment have contributed to growing the scale of robot demonstration data.

Diversity

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

no code implementations23 Jun 2024 Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input.

RAG

Task Me Anything

1 code implementation17 Jun 2024 Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case.

2k Attribute +3

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

no code implementations15 Jun 2024 Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

In spite of the recent adoption of vision language models (VLMs) to control robot behavior, VLMs struggle to precisely articulate robot actions using language.

Language Modelling Robot Navigation +2

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

no code implementations13 Jun 2024 Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna

In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad.

Math object-detection +2

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

1 code implementation28 May 2024 Ethan Shen, Alan Fan, Sarah M. Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model.

Code Completion Language Modelling

Multilingual Diversity Improves Vision-Language Representations

no code implementations27 May 2024 Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set.

Diversity Text Retrieval

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

1 code implementation5 May 2024 Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process.

Specificity Text-to-Image Generation

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

1 code implementation24 Apr 2024 Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each).

Inductive Bias Representation Learning

BLINK: Multimodal Large Language Models Can See but Not Perceive

no code implementations18 Apr 2024 Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.

Depth Estimation Multiple-choice +1

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

no code implementations21 Mar 2024 Xiang Fan, Anand Bhattad, Ranjay Krishna

We introduce Videoshop, a training-free video editing algorithm for localized semantic edits.

Video Editing

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

1 code implementation17 Mar 2024 Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

With m&m's, we evaluate 6 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution).

4k

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

no code implementations CVPR 2024 Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points.

Image Classification Question Answering +2

Offline Training of Language Model Agents with Functions as Learnable Weights

2 code implementations17 Feb 2024 Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu

Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions.

Language Modelling

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

1 code implementation13 Feb 2024 Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox

To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions.

Robot Manipulation Generalization

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

1 code implementation CVPR 2024 Mehmet Saygin Seyfioglu, Wisdom O. Ikezogwo, Fatemeh Ghezloo, Ranjay Krishna, Linda Shapiro

Training multi-model models for histopathology requires instruction tuning datasets, which currently contain information for individual image patches, without a spatial grounding of the concepts within each patch and without a wider view of the WSI.

Image Captioning Visual Question Answering (VQA) +1

Lasagna: Layered Score Distillation for Disentangled Object Relighting

1 code implementation30 Nov 2023 Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko

Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit.

Colorization Object +1

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations29 Nov 2023 Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

no code implementations7 Nov 2023 Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.

Object Object Recognition

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations27 Oct 2023 Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception

no code implementations22 Oct 2023 Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna

Computer vision often treats human perception as homogeneous: an implicit assumption that visual stimuli are perceived similarly by everyone.

Diversity Graph Embedding

EcoAssistant: Using LLM Assistant More Affordably and Accurately

1 code implementation3 Oct 2023 Jieyu Zhang, Ranjay Krishna, Ahmed H. Awadallah, Chi Wang

Today, users ask Large language models (LLMs) as assistants to answer queries that require external knowledge; they ask about the weather in a specific city, about stock prices, and even about where specific locations are within their neighborhood.

MIMIC: Masked Image Modeling with Image Correspondences

1 code implementation27 Jun 2023 Kalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Patrick Howe, Sharan Ranjit S, Anand Bhattad, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna

We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1. 7%), and surface normals estimation on Taskonomy (2. 05%).

Depth Estimation Pose Estimation +3

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

1 code implementation NeurIPS 2023 Cheng-Yu Hsieh, Jieyu Zhang, Zixian Ma, Aniruddha Kembhavi, Ranjay Krishna

In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem.

AR2-D2:Training a Robot Without a Robot

no code implementations23 Jun 2023 Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot.

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

1 code implementation3 May 2023 Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset.

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

1 code implementation ICCV 2023 Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith

We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).

4k Language Modelling +4

Explanations Can Reduce Overreliance on AI Systems During Decision-Making

no code implementations13 Dec 2022 Helena Vasconcelos, Matthew Jörke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael Bernstein, Ranjay Krishna

Prior work has identified a resilient phenomenon that threatens the performance of human-AI decision-making teams: overreliance, when people agree with an AI, even when it is incorrect.

Decision Making

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

1 code implementation9 Oct 2022 Zixian Ma, Rose Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna

These results identify tasks where expectation alignment is a more useful strategy than curiosity-driven exploration for multi-agent coordination, enabling agents to do zero-shot coordination.

Multi-agent Reinforcement Learning

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

no code implementations12 Apr 2022 Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

Prior benchmarks have analyzed models' answers to questions about videos in order to measure visual compositional reasoning.

Question Answering

Visual Intelligence through Human Interaction

no code implementations12 Nov 2021 Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein

Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces and even generating novel visual content.

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

1 code implementation ACL 2021 Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition.

Active Learning Object Recognition +3

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning

1 code implementation EMNLP (WNUT) 2020 Rachel Gardner, Maya Varma, Clare Zhu, Ranjay Krishna

Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data.

Multi-Task Learning valid

Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

no code implementations5 Aug 2020 Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein

In a third study, we assess effects of metaphor choices on potential users' desire to try out the system and find that users are drawn to systems that project higher competence and warmth.

AI Agent

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

2 code implementations15 Dec 2019 Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

Next, by decomposing and learning the temporal changes in visual relationships that result in an action, we demonstrate the utility of a hierarchical event decomposition by enabling few-shot action recognition, achieving 42. 7% mAP using as few as 10 examples.

Few-Shot action recognition Few Shot Action Recognition +1

Deep Bayesian Active Learning for Multiple Correct Outputs

no code implementations2 Dec 2019 Khaled Jedoui, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

The assumption that these tasks always have exactly one correct answer has resulted in the creation of numerous uncertainty-based measurements, such as entropy and least confidence, which operate over a model's outputs.

Active Learning Answer Generation +4

Scene Graph Prediction with Limited Labels

1 code implementation ICCV 2019 Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei

All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each.

Knowledge Base Completion Question Answering +2

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

no code implementations NeurIPS 2019 Sharon Zhou, Mitchell L. Gordon, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael S. Bernstein

We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.

Image Generation Unconditional Image Generation

HYPE: Human-eYe Perceptual Evaluation of Generative Models

no code implementations ICLR Workshop DeepGenStruct 2019 Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Durim Morina, Michael S. Bernstein

The second, HYPE-Infinity, measures human error rate on fake and real images with no time constraints, maintaining stability and drastically reducing time and cost.

Image Generation Unconditional Image Generation

Information Maximizing Visual Question Generation

no code implementations CVPR 2019 Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We build a model that maximizes mutual information between the image, the expected answer and the generated question.

Clustering Question Generation +1

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

no code implementations11 Aug 2018 Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.

Activity Recognition

Referring Relationships

2 code implementations CVPR 2018 Ranjay Krishna, Ines Chami, Michael Bernstein, Li Fei-Fei

We formulate the cyclic condition between the entities in a relationship by modelling predicates that connect the entities as shifts in attention from one entity to another.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

3 code implementations CVPR 2017 Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Dense Captioning Descriptive +3

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

no code implementations15 Sep 2016 Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets.

Visual Relationship Detection with Language Priors

no code implementations31 Jul 2016 Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.

Content-Based Image Retrieval Relationship Detection +3

Embracing Error to Enable Rapid Crowdsourcing

no code implementations14 Feb 2016 Ranjay Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data.

General Classification Sentiment Analysis +2

Image Retrieval Using Scene Graphs

no code implementations CVPR 2015 Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei

We introduce a novel dataset of 5, 000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval.

Image Retrieval Object Localization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.