Search Results for author: Ranjay Krishna

Found 63 papers, 30 papers with code

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

1 code implementation • 28 May 2024 • Ethan Shen, Alan Fan, Sarah M Pratt, Jae Sung Park, Matthew Wallingford, Sham M. Kakade, Ari Holtzman, Ranjay Krishna, Ali Farhadi, Aditya Kusupati

We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model.

Paper
Code

Multilingual Diversity Improves Vision-Language Representations

no code implementations • 27 May 2024 • Thao Nguyen, Matthew Wallingford, Sebastin Santy, Wei-Chiu Ma, Sewoong Oh, Ludwig Schmidt, Pang Wei Koh, Ranjay Krishna

By translating all multilingual image-text pairs from a raw web crawl to English and re-filtering them, we increase the prevalence of (translated) multilingual data in the resulting training set.

Paper
Add Code

ImageInWords: Unlocking Hyper-Detailed Image Descriptions

1 code implementation • 5 May 2024 • Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

To address these issues, we introduce ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process.

Specificity Text-to-Image Generation

164

Paper
Code

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

1 code implementation • 24 Apr 2024 • Ankit Vani, Bac Nguyen, Samuel Lavoie, Ranjay Krishna, Aaron Courville

Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each).

Inductive Bias Representation Learning

Paper
Code

BLINK: Multimodal Large Language Models Can See but Not Perceive

no code implementations • 18 Apr 2024 • Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.

Depth Estimation Multiple-choice +1

Paper
Add Code

Iterated Learning Improves Compositionality in Large Vision-Language Models

no code implementations • 2 Apr 2024 • Chenhao Zheng, Jieyu Zhang, Aniruddha Kembhavi, Ranjay Krishna

A fundamental characteristic common to both human vision and natural language is their compositional nature.

Contrastive Learning

Paper
Add Code

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

no code implementations • 21 Mar 2024 • Xiang Fan, Anand Bhattad, Ranjay Krishna

We introduce Videoshop, a training-free video editing algorithm for localized semantic edits.

Video Editing

Paper
Add Code

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

1 code implementation • 17 Mar 2024 • Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

With m&m's, we evaluate 6 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution).

Paper
Code

Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use

no code implementations • 5 Mar 2024 • Imad Eddine Toubal, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna, Ariel Fuxman, Tom Duerig

Our framework leverages recent advances in foundation models, both large language models and vision-language models, to carve out the concept space through conversation and by automatically labeling training data points.

Image Classification Question Answering +2

Paper
Add Code

Offline Training of Language Model Agents with Functions as Learnable Weights

1 code implementation • 17 Feb 2024 • Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu

Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions.

Language Modelling

26,608

Paper
Code

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

1 code implementation • 13 Feb 2024 • Wilbert Pumacay, Ishika Singh, Jiafei Duan, Ranjay Krishna, Jesse Thomason, Dieter Fox

To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions.

Ranked #1 on Robot Manipulation Generalization on The COLOSSEUM

Robot Manipulation Generalization

Paper
Code

Scaling Up LLM Reviews for Google Ads Content Moderation

no code implementations • 7 Feb 2024 • Wei Qiao, Tushar Dogra, Otilia Stretcu, Yu-Han Lyu, Tiantian Fang, Dongjin Kwon, Chun-Ta Lu, Enming Luo, YuAn Wang, Chih-Chun Chia, Ariel Fuxman, Fangzhou Wang, Ranjay Krishna, Mehmet Tek

This study proposes a method for scaling up LLM reviews for content moderation in Google Ads.

Paper
Add Code

Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows

1 code implementation • 18 Dec 2023 • Madeleine Grunde-McLaughlin, Michelle S. Lam, Ranjay Krishna, Daniel S. Weld, Jeffrey Heer

The design space covers a designer's objectives and the tactics used to build workflows.

Paper
Code

Holodeck: Language Guided Generation of 3D Embodied AI Environments

1 code implementation • 14 Dec 2023 • Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher Clark

3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope.

Common Sense Reasoning Language Modelling +2

277

Paper
Code

Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos

1 code implementation • 7 Dec 2023 • Mehmet Saygin Seyfioglu, Wisdom O. Ikezogwo, Fatemeh Ghezloo, Ranjay Krishna, Linda Shapiro

Training multi-model models for histopathology requires instruction tuning datasets, which currently contain information for individual image patches, without a spatial grounding of the concepts within each patch and without a wider view of the WSI.

Image Captioning Visual Question Answering (VQA) +1

114

Paper
Code

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

no code implementations • 5 Dec 2023 • Yushi Hu, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman

We propose Visual Program Distillation (VPD), an instruction tuning framework that produces a vision-language model (VLM) capable of solving complex visual tasks with a single forward pass.

Ranked #1 on Meme Classification on Hateful Memes

Language Modelling Large Language Model +3

Paper
Add Code

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

no code implementations • 5 Dec 2023 • Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents.

Benchmarking Image Augmentation +3

Paper
Add Code

Lasagna: Layered Score Distillation for Disentangled Object Relighting

1 code implementation • 30 Nov 2023 • Dina Bashkirova, Arijit Ray, Rupayan Mallick, Sarah Adel Bargal, Jianming Zhang, Ranjay Krishna, Kate Saenko

Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit.

Colorization Object +1

Paper
Code

DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback

no code implementations • 29 Nov 2023 • Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus Rashtchian

Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality.

Question Answering Text-to-Image Generation +1

Paper
Add Code

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

no code implementations • 7 Nov 2023 • Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.

Object Object Recognition

Paper
Add Code

Improving Interpersonal Communication by Simulating Audiences with Language Models

1 code implementation • 1 Nov 2023 • Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna

How do we communicate with others to achieve our goals?

Language Modelling Large Language Model

Paper
Code

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

no code implementations • 27 Oct 2023 • Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above.

Question Answering Question Generation +3

Paper
Add Code

Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception

no code implementations • 22 Oct 2023 • Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna

Computer vision often treats human perception as homogeneous: an implicit assumption that visual stimuli are perceived similarly by everyone.

Graph Embedding

Paper
Add Code

EcoAssistant: Using LLM Assistant More Affordably and Accurately

1 code implementation • 3 Oct 2023 • Jieyu Zhang, Ranjay Krishna, Ahmed H. Awadallah, Chi Wang

Today, users ask Large language models (LLMs) as assistants to answer queries that require external knowledge; they ask about the weather in a specific city, about stock prices, and even about where specific locations are within their neighborhood.

113

Paper
Code

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

no code implementations • 1 Aug 2023 • Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage.

Image Generation

Paper
Add Code

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

1 code implementation • NeurIPS 2023 • Yue Yu, Yuchen Zhuang, Jieyu Zhang, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, Chao Zhang

Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks.

Attribute Language Modelling +1

121

Paper
Code

MIMIC: Masked Image Modeling with Image Correspondences

1 code implementation • 27 Jun 2023 • Kalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Patrick Howe, Sharan Ranjit S, Anand Bhattad, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna

We train multiple models with different masked image modeling objectives to showcase the following findings: Representations trained on our automatically generated MIMIC-3M outperform those learned from expensive crowdsourced datasets (ImageNet-1K) and those learned from synthetic environments (MULTIVIEW-HABITAT) on two dense geometric tasks: depth estimation on NYUv2 (1. 7%), and surface normals estimation on Taskonomy (2. 05%).

Depth Estimation Pose Estimation +3

Paper
Code

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

1 code implementation • NeurIPS 2023 • Cheng-Yu Hsieh, Jieyu Zhang, Zixian Ma, Aniruddha Kembhavi, Ranjay Krishna

In the last year alone, a surge of new benchmarks to measure compositional understanding of vision-language models have permeated the machine learning ecosystem.

Paper
Code

AR2-D2:Training a Robot Without a Robot

no code implementations • 23 Jun 2023 • Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot.

Paper
Add Code

Quilt-1M: One Million Image-Text Pairs for Histopathology

1 code implementation • NeurIPS 2023 • Wisdom Oluchi Ikezogwo, Mehmet Saygin Seyfioglu, Fatemeh Ghezloo, Dylan Stefan Chan Geva, Fatwir Sheikh Mohammed, Pavan Kumar Anand, Ranjay Krishna, Linda Shapiro

From YouTube, we curate QUILT: a large-scale vision-language dataset consisting of $802, 144$ image and text pairs.

Automatic Speech Recognition Cross-Modal Retrieval +3

114

Paper
Code

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

1 code implementation • 3 May 2023 • Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

Third, we reduce both the model size and the amount of data required to outperform LLMs; our finetuned 770M T5 model outperforms the few-shot prompted 540B PaLM model using only 80% of available data on a benchmark, whereas standard finetuning the same T5 model struggles to match even by using 100% of the dataset.

353

Paper
Code

DataComp: In search of the next generation of multimodal datasets

1 code implementation • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.

Paper
Code

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

1 code implementation • ICCV 2023 • Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith

We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).

4k Language Modelling +4

115

Paper
Code

VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]

no code implementations • 7 Mar 2023 • Maureen Daum, Enhao Zhang, Dong He, Stephen Mussmann, Brandon Haynes, Ranjay Krishna, Magdalena Balazinska

We introduce VOCALExplore, a system designed to support users in building domain-specific models over video datasets.

feature selection

Paper
Add Code

Agile Modeling: From Concept to Classifier in Minutes

no code implementations • ICCV 2023 • Otilia Stretcu, Edward Vendrow, Kenji Hata, Krishnamurthy Viswanathan, Vittorio Ferrari, Sasan Tavakkol, Wenlei Zhou, Aditya Avinash, Enming Luo, Neil Gordon Alldrin, Mohammadhossein Bateni, Gabriel Berger, Andrew Bunner, Chun-Ta Lu, Javier A Rey, Giulia Desalvo, Ranjay Krishna, Ariel Fuxman

In reaction, we introduce the problem of Agile Modeling: the process of turning any subjective visual concept into a computer vision model through a real-time user-in-the-loop interactions.

Image Classification

Paper
Add Code

Explanations Can Reduce Overreliance on AI Systems During Decision-Making

no code implementations • 13 Dec 2022 • Helena Vasconcelos, Matthew Jörke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael Bernstein, Ranjay Krishna

Prior work has identified a resilient phenomenon that threatens the performance of human-AI decision-making teams: overreliance, when people agree with an AI, even when it is incorrect.

Decision Making

Paper
Add Code

CREPE: Can Vision-Language Foundation Models Reason Compositionally?

1 code implementation • CVPR 2023 • Zixian Ma, Jerry Hong, Mustafa Omer Gul, Mona Gandhi, Irena Gao, Ranjay Krishna

To measure systematicity, CREPE consists of a test dataset containing over $370K$ image-text pairs and three different seen-unseen splits.

Ranked #1 on Image Retrieval on CREPE (Compositional REPresentation Evaluation)

Image Retrieval Negation +1

Paper
Code

ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward

1 code implementation • 9 Oct 2022 • Zixian Ma, Rose Wang, Li Fei-Fei, Michael Bernstein, Ranjay Krishna

These results identify tasks where expectation alignment is a more useful strategy than curiosity-driven exploration for multi-agent coordination, enabling agents to do zero-shot coordination.

Multi-agent Reinforcement Learning

Paper
Code

Measuring Compositional Consistency for Video Question Answering

no code implementations • CVPR 2022 • Mona Gandhi, Mustafa Omer Gul, Eva Prakash, Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

Recent video question answering benchmarks indicate that state-of-the-art models struggle to answer compositional questions.

Question Answering Video Question Answering

Paper
Add Code

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

no code implementations • 12 Apr 2022 • Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

Prior benchmarks have analyzed models' answers to questions about videos in order to measure visual compositional reasoning.

Question Answering

Paper
Add Code

Visual Intelligence through Human Interaction

no code implementations • 12 Nov 2021 • Ranjay Krishna, Mitchell Gordon, Li Fei-Fei, Michael Bernstein

Over the last decade, Computer Vision, the branch of Artificial Intelligence aimed at understanding the visual world, has evolved from simply recognizing objects in images to describing pictures, answering questions about images, aiding robots maneuver around physical spaces and even generating novel visual content.

Paper
Add Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

861

Paper
Code

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

1 code implementation • ACL 2021 • Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition.

Active Learning Object Recognition +3

Paper
Code

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

no code implementations • CVPR 2021 • Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

AGQA contains $192M$ unbalanced question answer pairs for $9. 6K$ videos.

Question Answering Video Question Answering +1

Paper
Add Code

Determining Question-Answer Plausibility in Crowdsourced Datasets Using Multi-Task Learning

1 code implementation • EMNLP (WNUT) 2020 • Rachel Gardner, Maya Varma, Clare Zhu, Ranjay Krishna

Datasets extracted from social networks and online forums are often prone to the pitfalls of natural language, namely the presence of unstructured and noisy data.

Multi-Task Learning valid

Paper
Code

Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

no code implementations • 5 Aug 2020 • Pranav Khadpe, Ranjay Krishna, Li Fei-Fei, Jeffrey Hancock, Michael Bernstein

In a third study, we assess effects of metaphor choices on potential users' desire to try out the system and find that users are drawn to systems that project higher competence and warmth.

Paper
Add Code

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

2 code implementations • 15 Dec 2019 • Jingwei Ji, Ranjay Krishna, Li Fei-Fei, Juan Carlos Niebles

Next, by decomposing and learning the temporal changes in visual relationships that result in an action, we demonstrate the utility of a hierarchical event decomposition by enabling few-shot action recognition, achieving 42. 7% mAP using as few as 10 examples.

Few-Shot action recognition Few Shot Action Recognition +1

Paper
Code

Deep Bayesian Active Learning for Multiple Correct Outputs

no code implementations • 2 Dec 2019 • Khaled Jedoui, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

The assumption that these tasks always have exactly one correct answer has resulted in the creation of numerous uncertainty-based measurements, such as entropy and least confidence, which operate over a model's outputs.

Active Learning Answer Generation +4

Paper
Add Code

Learning Predicates as Functions to Enable Few-shot Scene Graph Prediction

no code implementations • 12 Jun 2019 • Apoorva Dornadula, Austin Narcomey, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We introduce the first scene graph prediction model that supports few-shot learning of predicates.

Few-Shot Learning Graph Generation +4

Paper
Add Code

Scene Graph Prediction with Limited Labels

1 code implementation • ICCV 2019 • Vincent S. Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Re, Li Fei-Fei

All scene graph models to date are limited to training on a small set of visual relationships that have thousands of training labels each.

Ranked #1 on Scene Graph Detection on VRD

Knowledge Base Completion Question Answering +2

Paper
Code

HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

no code implementations • NeurIPS 2019 • Sharon Zhou, Mitchell L. Gordon, Ranjay Krishna, Austin Narcomey, Li Fei-Fei, Michael S. Bernstein

We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.

Image Generation Unconditional Image Generation

Paper
Add Code

HYPE: Human-eYe Perceptual Evaluation of Generative Models

no code implementations • ICLR Workshop DeepGenStruct 2019 • Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Durim Morina, Michael S. Bernstein

The second, HYPE-Infinity, measures human error rate on fake and real images with no time constraints, maintaining stability and drastically reducing time and cost.

Image Generation Unconditional Image Generation

Paper
Add Code

Information Maximizing Visual Question Generation

no code implementations • CVPR 2019 • Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We build a model that maximizes mutual information between the image, the expected answer and the generated question.

Clustering Question Generation +1

Paper
Add Code

The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary

no code implementations • 11 Aug 2018 • Bernard Ghanem, Juan Carlos Niebles, Cees Snoek, Fabian Caba Heilbron, Humam Alwassel, Victor Escorcia, Ranjay Krishna, Shyamal Buch, Cuong Duc Dao

The guest tasks focused on complementary aspects of the activity recognition problem at large scale and involved three challenging and recently compiled datasets: the Kinetics-600 dataset from Google DeepMind, the AVA dataset from Berkeley and Google, and the Moments in Time dataset from MIT and IBM Research.

Activity Recognition

Paper
Add Code

Referring Relationships

2 code implementations • CVPR 2018 • Ranjay Krishna, Ines Chami, Michael Bernstein, Li Fei-Fei

We formulate the cyclic condition between the entities in a relationship by modelling predicates that connect the entities as shifts in attention from one entity to another.

260

Paper
Code

Dense-Captioning Events in Videos

4 code implementations • ICCV 2017 • Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles

We also introduce ActivityNet Captions, a large-scale benchmark for dense-captioning events.

Dense Captioning Retrieval +1

Paper
Code

A Hierarchical Approach for Generating Descriptive Image Paragraphs

3 code implementations • CVPR 2017 • Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail.

Ranked #7 on Image Paragraph Captioning on Image Paragraph Captioning

Dense Captioning Descriptive +3

Paper
Code

A Glimpse Far into the Future: Understanding Long-term Crowd Worker Quality

no code implementations • 15 Sep 2016 • Kenji Hata, Ranjay Krishna, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing is increasingly critical to the creation of extremely large datasets.

Paper
Add Code

Visual Relationship Detection with Language Priors

no code implementations • 31 Jul 2016 • Cewu Lu, Ranjay Krishna, Michael Bernstein, Li Fei-Fei

We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.

Ranked #2 on Scene Graph Generation on VRD

Content-Based Image Retrieval Relationship Detection +3

Paper
Add Code

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

1 code implementation • 23 Feb 2016 • Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, Fei-Fei Li

Despite progress in perceptual tasks such as image classification, computers still perform poorly on cognitive tasks such as image description and question answering.

Image Classification Question Answering

225

Paper
Code

Embracing Error to Enable Rapid Crowdsourcing

no code implementations • 14 Feb 2016 • Ranjay Krishna, Kenji Hata, Stephanie Chen, Joshua Kravitz, David A. Shamma, Li Fei-Fei, Michael S. Bernstein

Microtask crowdsourcing has enabled dataset advances in social science and machine learning, but existing crowdsourcing schemes are too expensive to scale up with the expanding volume of data.

General Classification Sentiment Analysis +2

Paper
Add Code

Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval

no code implementations • WS 2015 • Sebastian Schuster, Ranjay Krishna, Angel Chang, Li Fei-Fei, Christopher D. Manning

Image Retrieval Retrieval

Paper
Add Code

Image Retrieval Using Scene Graphs

no code implementations • CVPR 2015 • Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, Li Fei-Fei

We introduce a novel dataset of 5, 000 human-generated scene graphs grounded to images and use this dataset to evaluate our method for image retrieval.

Image Retrieval Object Localization +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.