Search Results for author: Ranjay Krishna

Found 57 papers, 26 papers with code

Image Retrieval Using Scene Graphs

no code implementations CVPR 2015 Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei

We introduce a novel dataset of 5, 000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval.

Image Retrieval Object Localization +1

Embracing Error to Enable Rapid Crowdsourcing

no code implementations14 Feb 2016 Ranjay Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data.

General Classification Sentiment Analysis +2

Visual Relationship Detection with Language Priors

no code implementations31 Jul 2016 Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.

Content-Based Image Retrieval Relationship Detection +3

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

no code implementations15 Sep 2016 Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets.

A Hierarchical Approach for Generating Descriptive Image Paragraphs

3 code implementations CVPR 2017 Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Dense Captioning Descriptive +3

Referring Relationships

2 code implementations CVPR 2018 Ranjay Krishna, Ines Chami, Michael Bernstein, Li Fei-Fei

We formulate the cyclic condition between the entities in a relationship by modelling predicates that connect the entities as shifts in attention from one entity to another.

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

no code implementations11 Aug 2018 Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.

Activity Recognition

HYPE: Human-eYe Perceptual Evaluation of Generative Models

no code implementations ICLR Workshop DeepGenStruct 2019 Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Durim Morina, Michael S. Bernstein

The second, HYPE-Infinity, measures human error rate on fake and real images with no time constraints, maintaining stability and drastically reducing time and cost.

Image Generation Unconditional Image Generation

Information Maximizing Visual Question Generation

no code implementations CVPR 2019 Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We build a model that maximizes mutual information between the image, the expected answer and the generated question.

Clustering Question Generation +1

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

no code implementations NeurIPS 2019 Sharon Zhou, Mitchell L. Gordon, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael S. Bernstein

We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.

Image Generation Unconditional Image Generation

Scene Graph Prediction with Limited Labels

1 code implementation ICCV 2019 Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei

All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each.

Knowledge Base Completion Question Answering +2

Deep Bayesian Active Learning for Multiple Correct Outputs

no code implementations2 Dec 2019 Khaled Jedoui, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

The assumption that these tasks always have exactly one correct answer has resulted in the creation of numerous uncertainty-based measurements, such as entropy and least confidence, which operate over a model's outputs.

Active Learning Answer Generation +4

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

1 code implementation15 Dec 2019 Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

Next, by decomposing and learning the temporal changes in visual relationships that result in an action, we demonstrate the utility of a hierarchical event decomposition by enabling few-shot action recognition, achieving 42. 7% mAP using as few as 10 examples.

Few-Shot action recognition Few Shot Action Recognition +1

Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

no code implementations5 Aug 2020 Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein

In a third study, we assess effects of metaphor choices on potential users' desire to try out the system and find that users are drawn to systems that project higher competence and warmth.

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning

1 code implementation EMNLP (WNUT) 2020 Rachel Gardner, Maya Varma, Clare Zhu, Ranjay Krishna

Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data.

Multi-Task Learning valid

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

1 code implementation ACL 2021 Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition.

Active Learning Object Recognition +3

On the Opportunities and Risks of Foundation Models

2 code implementations16 Aug 2021 Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

Visual Intelligence through Human Interaction

no code implementations12 Nov 2021 Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein

Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces and even generating novel visual content.

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

no code implementations12 Apr 2022 Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

Prior benchmarks have analyzed models' answers to questions about videos in order to measure visual compositional reasoning.

Question Answering

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

1 code implementation9 Oct 2022 Zixian Ma, Rose Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna

These results identify tasks where expectation alignment is a more useful strategy than curiosity-driven exploration for multi-agent coordination, enabling agents to do zero-shot coordination.

Multi-agent Reinforcement Learning

Explanations Can Reduce Overreliance on AI Systems During Decision-Making

no code implementations13 Dec 2022 Helena Vasconcelos, Matthew Jörke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael Bernstein, Ranjay Krishna

Prior work has identified a resilient phenomenon that threatens the performance of human-AI decision-making teams: overreliance, when people agree with an AI, even when it is incorrect.

Decision Making

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

1 code implementation ICCV 2023 Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith

We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).

Language Modelling Object Counting +3

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

1 code implementation3 May 2023 Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset.

AR2-D2:Training a Robot Without a Robot

no code implementations23 Jun 2023 Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot.

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

1 code implementation NeurIPS 2023 Cheng-Yu Hsieh, Jieyu Zhang, Zixian Ma, Aniruddha Kembhavi, Ranjay Krishna

In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem.

MIMIC: Masked Image Modeling with Image Correspondences

1 code implementation27 Jun 2023 Kalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna

We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1. 7%), and surface normals estimation on Taskonomy (2. 05%).

Depth Estimation Pose Estimation +3

EcoAssistant: Using LLM Assistant More Affordably and Accurately

1 code implementation3 Oct 2023 Jieyu Zhang, Ranjay Krishna, Ahmed H. Awadallah, Chi Wang

Today, users ask Large language models (LLMs) as assistants to answer queries that require external knowledge; they ask about the weather in a specific city, about stock prices, and even about where specific locations are within their neighborhood.

Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception

no code implementations22 Oct 2023 Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna

Computer vision often treats human perception as homogeneous: an implicit assumption that visual stimuli are perceived similarly by everyone.

Graph Embedding

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations27 Oct 2023 Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

no code implementations7 Nov 2023 Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.

Object Object Recognition

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations29 Nov 2023 Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Lasagna: Layered Score Distillation for Disentangled Object Relighting

1 code implementation30 Nov 2023 Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko

Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit.

Colorization Object +1

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

no code implementations5 Dec 2023 Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman

We propose Visual Program Distillation (VPD), an instruction tuning framework that produces a vision-language model (VLM) capable of solving complex visual tasks with a single forward pass.

Language Modelling Large Language Model +3

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

1 code implementation7 Dec 2023 Mehmet Saygin Seyfioglu, Wisdom O. Ikezogwo, Fatemeh Ghezloo, Ranjay Krishna, Linda Shapiro

Current visual instruction datasets, generated through large language models, focus on creating question/answer pairs for individual image patches, which may lack diagnostic capacity on their own in histopathology, further complicated by the absence of spatial grounding in histopathology image captions.

Image Captioning Visual Question Answering (VQA) +1

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

1 code implementation13 Feb 2024 Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox

To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions.

Robot Manipulation Generalization

Training Language Model Agents without Modifying Language Models

no code implementations17 Feb 2024 Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu

Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions.

Language Modelling

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

no code implementations5 Mar 2024 Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points.

Image Classification Question Answering +2

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

1 code implementation17 Mar 2024 Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

With m&m's, we evaluate 6 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution).

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

no code implementations21 Mar 2024 Xiang Fan, Anand Bhattad, Ranjay Krishna

We introduce Videoshop, a training-free video editing algorithm for localized semantic edits.

Video Editing

Cannot find the paper you are looking for? You can Submit a new open access paper.