Search Results for author: Ali Farhadi

Found 104 papers, 57 papers with code

Object Manipulation via Visual Target Localization

no code implementations15 Mar 2022 Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi

Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them.

Object Detection

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

2 code implementations10 Mar 2022 Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin.

 Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Domain Generalization Image Classification +1

The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

1 code implementation2 Jan 2022 Sarah Pratt, Luca Weihs, Ali Farhadi

While traditional embodied agents manipulate an environment to best achieve a goal, we argue for an introspective agent, which considers its own abilities in the context of its environment.

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

no code implementations6 Dec 2021 Vivek Ramanujan, Pavan Kumar Anasosalu Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari

To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model.

Representation Learning

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

1 code implementation8 Oct 2021 Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari

Our models require no retraining, thus our subspace of models can be deployed entirely on-device to allow adaptive network compression at inference time.

Quantization

Robust fine-tuning of zero-shot models

1 code implementation4 Sep 2021 Mitchell Wortsman, Gabriel Ilharco, Jong Wook Kim, Mike Li, Simon Kornblith, Rebecca Roelofs, Raphael Gontijo Lopes, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt

Compared to standard fine-tuning, WiSE-FT provides large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.

Transfer Learning

LanguageRefer: Spatial-Language Model for 3D Visual Grounding

no code implementations7 Jul 2021 Junha Roh, Karthik Desingh, Ali Farhadi, Dieter Fox

Specifically, given a reconstructed 3D scene in the form of point clouds with 3D bounding boxes of potential object candidates, and a language utterance referring to a target object in the scene, our model successfully identifies the target object from a set of potential candidates.

Language Modelling Visual Grounding

MERLOT: Multimodal Neural Script Knowledge Models

1 code implementation NeurIPS 2021 Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi

As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.

Visual Commonsense Reasoning

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

1 code implementation NeurIPS 2021 Aditya Kusupati, Matthew Wallingford, Vivek Ramanujan, Raghav Somani, Jae Sung Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi

We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems.

Image Retrieval OOD Detection

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

no code implementations ACL 2021 Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi

We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language.

Language Modelling

Pushing it out of the Way: Interactive Visual Navigation

1 code implementation CVPR 2021 Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi

In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.

Visual Navigation

Learning Neural Network Subspaces

1 code implementation20 Feb 2021 Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance.

Learning Visual Representation from Human Interactions

no code implementations ICLR 2021 Kiana Ehsani, Daniel Gordon, Thomas Hai Dang Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

Learning Flexible Visual Representations via Interactive Gameplay

no code implementations ICLR 2021 Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making and socialization.

Decision Making Representation Learning

Layer-Wise Data-Free CNN Compression

no code implementations18 Nov 2020 Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari

In the high-computation regime, we show how to combine our method with generative methods to improve upon state-of-the-art performance for several networks.

Quantization

What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions

1 code implementation16 Oct 2020 Kiana Ehsani, Daniel Gordon, Thomas Nguyen, Roozbeh Mottaghi, Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision.

Action Recognition Depth Estimation +2

FLUID: A Unified Evaluation Framework for Flexible Sequential Data

1 code implementation6 Jul 2020 Matthew Wallingford, Aditya Kusupati, Keivan Alizadeh-Vahid, Aaron Walsman, Aniruddha Kembhavi, Ali Farhadi

To foster research towards the goal of general ML methods, we introduce a new unified evaluation framework - FLUID (Flexible Sequential Data).

Continual Learning Representation Learning +1

Supermasks in Superposition

1 code implementation NeurIPS 2020 Mitchell Wortsman, Vivek Ramanujan, Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski, Ali Farhadi

We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting.

Probing Contextual Language Models for Common Ground with Visual Representations

no code implementations NAACL 2021 Gabriel Ilharco, Rowan Zellers, Ali Farhadi, Hannaneh Hajishirzi

The success of large-scale contextual language models has attracted great interest in probing what is encoded in their representations.

Representation Learning

VisualCOMET: Reasoning about the Dynamic Context of a Still Image

no code implementations ECCV 2020 Jae Sung Park, Chandra Bhagavatula, Roozbeh Mottaghi, Ali Farhadi, Yejin Choi

In addition, we provide person-grounding (i. e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text.

Frame Visual Commonsense Reasoning

RoboTHOR: An Open Simulation-to-Real Embodied AI Platform

1 code implementation CVPR 2020 Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi

We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems.

Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects

1 code implementation CVPR 2020 Kiana Ehsani, Shubham Tulsiani, Saurabh Gupta, Ali Farhadi, Abhinav Gupta

Our quantitative and qualitative results show that (a) we can predict meaningful forces from videos whose effects lead to accurate imitation of the motions observed, (b) by jointly optimizing for contact point and force prediction, we can improve the performance on both tasks in comparison to independent training, and (c) we can learn a representation from this model that generalizes to novel objects using few shot examples.

Human-Object Interaction Detection

Grounded Situation Recognition

1 code implementation ECCV 2020 Sarah Pratt, Mark Yatskar, Luca Weihs, Ali Farhadi, Aniruddha Kembhavi

We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e. g. agent, tool), and bounding-box groundings of entities.

Image Retrieval

Watching the World Go By: Representation Learning from Unlabeled Videos

1 code implementation18 Mar 2020 Daniel Gordon, Kiana Ehsani, Dieter Fox, Ali Farhadi

Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks.

Data Augmentation Representation Learning

Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping

3 code implementations15 Feb 2020 Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah Smith

We publicly release all of our experimental data, including training and validation scores for 2, 100 trials, to encourage further analysis of training dynamics during fine-tuning.

Pretrained Language Models

Learning Generalizable Visual Representations via Interactive Gameplay

no code implementations17 Dec 2019 Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making, and socialization.

Decision Making Representation Learning

Visual Reaction: Learning to Play Catch with Your Drone

1 code implementation CVPR 2020 Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi

In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.

Conditional Driving from Natural Language Instructions

no code implementations16 Oct 2019 Junha Roh, Chris Paxton, Andrzej Pronobis, Ali Farhadi, Dieter Fox

Widespread adoption of self-driving cars will depend not only on their safety but largely on their ability to interact with human users.

Imitation Learning Self-Driving Cars

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

1 code implementation ACL 2019 Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query.

Open-Domain Question Answering

Butterfly Transform: An Efficient FFT Based Neural Architecture Design

1 code implementation CVPR 2020 Keivan Alizadeh Vahid, Anish Prabhu, Ali Farhadi, Mohammad Rastegari

By replacing pointwise convolutions with BFT, we reduce the computational complexity of these layers from O(n^2) to O(n\log n) with respect to the number of channels.

Neural Architecture Search

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

1 code implementation CVPR 2019 Kenneth Marino, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi

In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources.

Object Detection Question Answering +3

Defending Against Neural Fake News

4 code implementations NeurIPS 2019 Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi

We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data.

Fake News Detection Text Generation

HellaSwag: Can a Machine Really Finish Your Sentence?

1 code implementation ACL 2019 Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, Yejin Choi

In this paper, we show that commonsense inference still proves difficult for even state-of-the-art models, by presenting HellaSwag, a new challenge dataset.

Natural Language Inference

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

1 code implementation CVPR 2019 Yao-Hung Hubert Tsai, Santosh Divvala, Louis-Philippe Morency, Ruslan Salakhutdinov, Ali Farhadi

Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts.

What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning

no code implementations6 Jan 2019 Daniel Gordon, Dieter Fox, Ali Farhadi

In this work we propose Hierarchical Planning and Reinforcement Learning (HIP-RL), a method for merging the benefits and capabilities of Symbolic Planning with the learning abilities of Deep Reinforcement Learning.

Question Answering reinforcement-learning

ELASTIC: Improving CNNs with Dynamic Scaling Policies

1 code implementation CVPR 2019 Huiyu Wang, Aniruddha Kembhavi, Ali Farhadi, Alan Yuille, Mohammad Rastegari

We formulate the scaling policy as a non-linear function inside the network's structure that (a) is learned from data, (b) is instance specific, (c) does not add extra computation, and (d) can be applied on any network architecture.

General Classification Multi-Label Classification +1

From Recognition to Cognition: Visual Commonsense Reasoning

4 code implementations CVPR 2019 Rowan Zellers, Yonatan Bisk, Ali Farhadi, Yejin Choi

While this task is easy for humans, it is tremendously difficult for today's vision systems, requiring higher-order cognition and commonsense reasoning about the world.

Multiple-choice Multiple Choice Question Answering (MCQA) +1

Visual Semantic Navigation using Scene Priors

1 code implementation ICLR 2019 Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi

Do we use the semantic/functional priors we have built over years to efficiently search and navigate?

reinforcement-learning

PhotoShape: Photorealistic Materials for Large-Scale Shape Collections

1 code implementation26 Sep 2018 Keunhong Park, Konstantinos Rematas, Ali Farhadi, Steven M. Seitz

Existing online 3D shape repositories contain thousands of 3D models but lack photorealistic appearance.

Label Refinery: Improving ImageNet Classification through Label Progression

4 code implementations7 May 2018 Hessam Bagherinezhad, Maxwell Horton, Mohammad Rastegari, Ali Farhadi

Among the three main components (data, labels, and models) of any supervised learning system, data and models have been the main subjects of active research.

Classification General Classification

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

no code implementations25 Apr 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

In this paper we describe the egocentric aspect of the dataset and present annotations for Charades-Ego with 68, 536 activity instances in 68. 8 hours of first and third-person video, making it one of the largest and most diverse egocentric datasets available.

General Classification Video Classification +1

Actor and Observer: Joint Modeling of First and Third-Person Videos

1 code implementation CVPR 2018 Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, Karteek Alahari

Several theories in cognitive neuroscience suggest that when people interact with the world, or simulate interactions, they do so from a first-person egocentric perspective, and seamlessly transfer knowledge between third-person (observer) and first-person (actor).

Action Recognition

Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension

1 code implementation EMNLP 2018 Minjoon Seo, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

We formalize a new modular variant of current question answering tasks by enforcing complete independence of the document encoder from the question encoder.

Question Answering Reading Comprehension

Imagine This! Scripts to Compositions to Videos

3 code implementations ECCV 2018 Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.

DOCK: Detecting Objects by transferring Common-sense Knowledge

no code implementations ECCV 2018 Krishna Kumar Singh, Santosh Divvala, Ali Farhadi, Yong Jae Lee

We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories.

Common Sense Reasoning Semantic Similarity +2

Neural Speed Reading via Skim-RNN

1 code implementation ICLR 2018 Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi

Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens.

AJILE Movement Prediction: Multimodal Deep Learning for Natural Human Neural Recordings and Video

no code implementations13 Sep 2017 Nancy Xin Ru Wang, Ali Farhadi, Rajesh Rao, Bingni Brunton

This paper describes our approach to detect and to predict natural human arm movements in the future, a key challenge in brain computer interfacing that has never before been attempted.

Future prediction Multimodal Deep Learning

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

no code implementations CVPR 2017 Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, Hannaneh Hajishirzi

Our analysis shows that a significant portion of questions require complex parsing of the text and the diagrams and reasoning, indicating that our dataset is more complex compared to previous machine comprehension and visual question answering datasets.

Question Answering Reading Comprehension +1

Visual Semantic Planning using Deep Successor Representations

no code implementations ICCV 2017 Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world.

Imitation Learning reinforcement-learning

Re3 : Real-Time Recurrent Regression Networks for Visual Tracking of Generic Objects

10 code implementations17 May 2017 Daniel Gordon, Ali Farhadi, Dieter Fox

Robust object tracking requires knowledge and understanding of the object being tracked: its appearance, its motion, and how it changes over time.

Object Tracking Visual Tracking

SeGAN: Segmenting and Generating the Invisible

1 code implementation CVPR 2018 Kiana Ehsani, Roozbeh Mottaghi, Ali Farhadi

Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation.

Depth Estimation Scene Understanding

See the Glass Half Full: Reasoning about Liquid Containers, their Volume and Content

no code implementations ICCV 2017 Roozbeh Mottaghi, Connor Schenck, Dieter Fox, Ali Farhadi

Doing so requires estimating the volume of the cup, approximating the amount of water in the pitcher, and predicting the behavior of water when we tilt the pitcher.

Commonly Uncommon: Semantic Sparsity in Situation Recognition

2 code implementations CVPR 2017 Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi

Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set.

Structured Prediction

LCNN: Lookup-based Convolutional Neural Network

no code implementations CVPR 2017 Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

We introduce LCNN, a lookup-based convolutional neural network that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs.

Few-Shot Learning

Bidirectional Attention Flow for Machine Comprehension

23 code implementations5 Nov 2016 Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi

Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query.

Cloze Test Open-Domain Question Answering +1

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning

2 code implementations16 Sep 2016 Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi

To address the second issue, we propose AI2-THOR framework, which provides an environment with high-quality 3D scenes and physics engine.

3D Reconstruction Feature Engineering +2

Much Ado About Time: Exhaustive Annotation of Temporal Data

no code implementations25 Jul 2016 Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta

We conclude that the optimal strategy is to ask as many questions as possible in a HIT (up to 52 binary questions after watching a 30-second video clip in our experiments).

Situation Recognition: Visual Semantic Role Labeling for Image Understanding

no code implementations CVPR 2016 Mark Yatskar, Luke Zettlemoyer, Ali Farhadi

This paper introduces situation recognition, the problem of producing a concise summary of the situation an image depicts including: (1) the main activity (e. g., clipping), (2) the participating actors, objects, substances, and locations (e. g., man, shears, sheep, wool, and field) and most importantly (3) the roles these participants play in the activity (e. g., the man is clipping, the shears are his tool, the wool is being clipped from the sheep, and the clipping is in a field).

Activity Recognition Semantic Role Labeling +1

A Task-Oriented Approach for Cost-Sensitive Recognition

no code implementations CVPR 2016 Roozbeh Mottaghi, Hannaneh Hajishirzi, Ali Farhadi

With the recent progress in visual recognition, we have already started to see a surge of vision related real-world applications.

Scene Understanding

Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks

3 code implementations13 Apr 2016 Junyuan Xie, Ross Girshick, Ali Farhadi

As 3D movie viewing becomes mainstream and Virtual Reality (VR) market emerges, the demand for 3D contents is growing rapidly.

Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding

no code implementations6 Apr 2016 Gunnar A. Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, Abhinav Gupta

Each video is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacted objects.

Action Recognition

A Diagram Is Worth A Dozen Images

1 code implementation24 Mar 2016 Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi

We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering.

Visual Question Answering

"What happens if..." Learning to Predict the Effect of Forces in Images

no code implementations17 Mar 2016 Roozbeh Mottaghi, Mohammad Rastegari, Abhinav Gupta, Ali Farhadi

To build a dataset of forces in scenes, we reconstructed all images in SUN RGB-D dataset in a physics simulator to estimate the physical movements of objects caused by external forces applied to them.

Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects

no code implementations2 Feb 2016 Hessam Bagherinezhad, Hannaneh Hajishirzi, Yejin Choi, Ali Farhadi

In this paper, we introduce a method to automatically infer object sizes, leveraging visual and textual information from web.

Visual Reasoning

Toward a Taxonomy and Computational Models of Abnormalities in Images

no code implementations4 Dec 2015 Babak Saleh, Ahmed Elgammal, Jacob Feldman, Ali Farhadi

In this paper we study various types of atypicalities in images in a more comprehensive way than has been done before.

Actions ~ Transformations

1 code implementation CVPR 2016 Xiaolong Wang, Ali Farhadi, Abhinav Gupta

In this paper, we propose a novel representation for actions by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).

Action Recognition

Generating Notifications for Missing Actions: Don't Forget to Turn the Lights Off!

no code implementations ICCV 2015 Bilge Soran, Ali Farhadi, Linda Shapiro

The overall prediction accuracy is 46. 2% when only 10 frames of an action are seen (2/3 of a sec).

Unsupervised Deep Embedding for Clustering Analysis

16 code implementations19 Nov 2015 Junyuan Xie, Ross Girshick, Ali Farhadi

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms.

Image Clustering Unsupervised Image Classification

Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images

no code implementations12 Nov 2015 Roozbeh Mottaghi, Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

Direct and explicit estimation of the forces and the motion of objects from a single image is extremely challenging.

Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing

no code implementations ICCV 2015 Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi

Next, we show that the association of high-quality segmentations to textual phrases aids in richer semantic understanding and reasoning of these textual phrases.

Natural Language Understanding Object Recognition +2

Discriminative and Consistent Similarities in Instance-Level Multiple Instance Learning

no code implementations CVPR 2015 Mohammad Rastegari, Hannaneh Hajishirzi, Ali Farhadi

In this paper we present a bottom-up method to instance-level Multiple Instance Learning (MIL) that learns to discover positive instances with globally constrained reasoning about local pairwise similarities.

Multiple Instance Learning Text Categorization

Abnormal Object Recognition: A Comprehensive Study

no code implementations9 Nov 2014 Babak Saleh, Ali Farhadi, Ahmed Elgammal

When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting.

Anomaly Detection Object Recognition

Predicting Failures of Vision Systems

no code implementations CVPR 2014 Peng Zhang, Jiuling Wang, Ali Farhadi, Martial Hebert, Devi Parikh

We show that a surprisingly straightforward and general approach, that we call ALERT, can predict the likely accuracy (or failure) of a variety of computer vision systems – semantic segmentation, vanishing point and camera parameter estimation, and image memorability prediction – on individual input images.

Semantic Segmentation Zero-Shot Learning

Incorporating Scene Context and Object Layout into Appearance Modeling

no code implementations CVPR 2014 Hamid Izadinia, Fereshteh Sadeghi, Ali Farhadi

In this paper, we propose a method to learn scene structures that can encode three main interlacing components of a scene: the scene category, the context-specific appearance of objects, and their layout.

Scene Understanding

Learning Everything about Anything: Webly-Supervised Visual Concept Learning

no code implementations CVPR 2014 Santosh K. Divvala, Ali Farhadi, Carlos Guestrin

How can we learn a model for any concept that exhaustively covers all its appearance variations, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models?

Adding Unlabeled Samples to Categories by Learned Attributes

no code implementations CVPR 2013 Jonghyun Choi, Mohammad Rastegari, Ali Farhadi, Larry S. Davis

We propose a method to expand the visual coverage of training sets that consist of a small number of labeled examples using learned attributes.

Object-Centric Anomaly Detection by Attribute-Based Reasoning

no code implementations CVPR 2013 Babak Saleh, Ali Farhadi, Ahmed Elgammal

When describing images, humans tend not to talk about the obvious, but rather mention what they find interesting.

Anomaly Detection Image Categorization

Cannot find the paper you are looking for? You can Submit a new open access paper.