Search Results for author: Yezhou Yang

Found 85 papers, 33 papers with code

To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo

1 code implementation ACL 2022 Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who’s Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

1 code implementation1 Sep 2023 Joshua Feinglass, Yezhou Yang

Object proposal generation serves as a standard pre-processing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.).

Graph Generation Image Captioning +4

Adversarial Bayesian Augmentation for Single-Source Domain Generalization

1 code implementation ICCV 2023 Sheng Cheng, Tejas Gokhale, Yezhou Yang

Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings.

Data Augmentation Domain Generalization

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

1 code implementation7 Jun 2023 Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision.

Concept Alignment

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

no code implementations7 Jun 2023 Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang

We rigorously scrutinize our method's secrecy under two distinct scenarios: one where a malicious user attempts to detect the fingerprint, and another where a user possesses a comprehensive understanding of our method.


End-to-end Knowledge Retrieval with Multi-modal Queries

1 code implementation1 Jun 2023 Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral

We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.

Benchmarking Cross-Modal Retrieval +2

CAROM Air -- Vehicle Localization and Traffic Scene Reconstruction from Aerial Videos

no code implementations31 May 2023 Duo Lu, Eric Eaton, Matt Weg, Wei Wang, Steven Como, Jeffrey Wishart, Hongbin Yu, Yezhou Yang

Road traffic scene reconstruction from videos has been desirable by road safety regulators, city planners, researchers, and autonomous driving technology developers.

Autonomous Driving

Attributing Image Generative Models using Latent Fingerprints

1 code implementation17 Apr 2023 GuangYu Nie, Changhoon Kim, Yezhou Yang, Yi Ren

This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff.

Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling

1 code implementation30 Mar 2023 Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang

In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.

Continual Learning Data Poisoning +1

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation20 Dec 2022 Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.


Reasoning about Actions over Visual and Linguistic Modalities: A Survey

no code implementations15 Jul 2022 Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang, Chitta Baral

'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals.

Common Sense Reasoning

Improving Diversity with Adversarially Learned Transformations for Domain Generalization

1 code implementation15 Jun 2022 Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang

To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies.

Domain Generalization

SSR-GNNs: Stroke-based Sketch Representation with Graph Neural Networks

no code implementations27 Apr 2022 Sheng Cheng, Yi Ren, Yezhou Yang

This paper follows cognitive studies to investigate a graph representation for sketches, where the information of strokes, i. e., parts of a sketch, are encoded on vertices and information of inter-stroke on edges.

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

1 code implementation30 Mar 2022 Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation CVPR 2022 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Image Captioning

Semantically Distributed Robust Optimization for Vision-and-Language Inference

1 code implementation Findings (ACL) 2022 Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.

Data Augmentation Natural Language Inference +1

Targeted Attack on Deep RL-based Autonomous Driving with Learned Visual Patterns

1 code implementation16 Sep 2021 Prasanth Buddareddygari, Travis Zhang, Yezhou Yang, Yi Ren

This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical objects in the environment, a threat model that combines the practicality and effectiveness of the existing ones.

Autonomous Driving

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

no code implementations ICCV 2021 Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.

Question Answering Visual Question Answering +1

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations ICCV 2021 Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph

1 code implementation CVPR 2021 Xin Ye, Yezhou Yang

We present a novel two-layer hierarchical reinforcement learning approach equipped with a Goals Relational Graph (GRG) for tackling the partially observable goal-driven task, such as goal-driven visual navigation.

Hierarchical Reinforcement Learning Reinforcement Learning (RL) +1

WeaQA: Weak Supervision via Captions for Visual Question Answering

no code implementations Findings (ACL) 2021 Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets.

Question Answering Visual Question Answering

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

3 code implementations3 Dec 2020 Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes.


Decentralized Attribution of Generative Models

no code implementations ICLR 2021 Changhoon Kim, Yi Ren, Yezhou Yang

Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement.

Efficient Robotic Object Search via HIEM: Hierarchical Policy Learning with Intrinsic-Extrinsic Modeling

no code implementations16 Oct 2020 Xin Ye, Yezhou Yang

Despite the significant success at enabling robots with autonomous behaviors makes deep reinforcement learning a promising approach for robotic object search task, the deep reinforcement learning approach severely suffers from the nature sparse reward setting of the task.

Efficient Exploration reinforcement-learning +1

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

2 code implementations EMNLP 2020 Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.

Out-of-Distribution Generalization Question Answering +2

Low to High Dimensional Modality Hallucination using Aggregated Fields of View

1 code implementation13 Jul 2020 Kausic Gunasekar, Qiang Qiu, Yezhou Yang

While hallucinating data from a modality with richer information, e. g., RGB to depth, has been researched extensively, we investigate the more challenging low-to-high modality hallucination with interesting use cases in robotics and autonomous systems.

Vocal Bursts Intensity Prediction

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

no code implementations21 Jun 2020 Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

2 code implementations ECCV 2020 Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.

Contrastive Learning Person Search +1

Learning hierarchical behavior and motion planning for autonomous driving

1 code implementation8 May 2020 Jingke Wang, Yue Wang, Dongkun Zhang, Yezhou Yang, Rong Xiong

To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution.

Autonomous Driving Decision Making +2

memeBot: Towards Automatic Image Meme Generation

no code implementations30 Apr 2020 Aadhavan Sadasivam, Kausic Gunasekar, Hasan Davulcu, Yezhou Yang

For a given input sentence, an image meme is generated by combining a meme template image and a text caption where the meme template image is selected from a set of popular candidates using a selection module, and the meme caption is generated by an encoder-decoder model.

Meme Captioning Meme Classification

Enabling Incremental Knowledge Transfer for Object Detection at the Edge

no code implementations13 Apr 2020 Mohammad Farhadi Bajestani, Mehdi Ghasemi, Sarma Vrudhula, Yezhou Yang

However, we need a limited knowledge of the observed environment at inference time which can be learned using a shallow neural network (SHNN).

object-detection Object Detection +1

From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)

no code implementations26 Feb 2020 Xin Ye, Yezhou Yang

Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods.

BIG-bench Machine Learning Visual Navigation

VQA-LOL: Visual Question Answering under the Lens of Logic

no code implementations ECCV 2020 Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr\'echet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation.

Question Answering Visual Question Answering

ROS-HPL: Robotic Object Search with Hierarchical Policy Learning and Intrinsic-Extrinsic Modeling

no code implementations25 Sep 2019 Xin Ye, Shibin Zheng, Yezhou Yang

Despite significant progress in Robotic Object Search (ROS) over the recent years with deep reinforcement learning based approaches, the sparsity issue in reward setting as well as the lack of interpretability of the previous ROS approaches leave much to be desired.

A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA

no code implementations5 Sep 2019 Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang

On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks.

Decision Making

Integrating Knowledge and Reasoning in Image Understanding

no code implementations24 Jun 2019 Somak Aditya, Yezhou Yang, Chitta Baral

Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering.

Object Recognition Question Answering +2

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

no code implementations28 May 2019 Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.

Fluorescence Image Histology Pattern Transformation using Image Style Transfer

no code implementations15 May 2019 Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Xiaochun Zhao, Leandro Borba Moreira, Sirin Gandhi, Claudio Cavallo, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

To improve the diagnostic quality of CLE, we used a micrograph of an H&E slide from a glioma tumor biopsy and image style transfer, a neural network method for integrating the content and style of two images.

Style Transfer Test

Active Adversarial Evader Tracking with a Probabilistic Pursuer under the Pursuit-Evasion Game Framework

no code implementations19 Apr 2019 Varun Chandra Jammula, Anshul Rai, Yezhou Yang

To validate the efficiency of the framework, we conduct several experiments in simulation by using Gazebo and evaluate the success rate of tracking an evader in various environments with different pursuer to evader speed ratios.

Modularized Textual Grounding for Counterfactual Resilience

1 code implementation CVPR 2019 Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Natural Language Visual Grounding Phrase Grounding +2

TKD: Temporal Knowledge Distillation for Active Perception

no code implementations4 Mar 2019 Mohammad Farhadi, Yezhou Yang

Deep neural networks based methods have been proved to achieve outstanding performance on object detection and classification tasks.

Knowledge Distillation object-detection +2

Image Decomposition and Classification through a Generative Model

no code implementations9 Feb 2019 Houpu Yao, Malcolm Regan, Yezhou Yang, Yi Ren

We demonstrate in this paper that a generative model can be designed to perform classification tasks under challenging settings, including adversarial attacks and input distribution shifts.

Classification General Classification +1

Augmenting Model Robustness with Transformation-Invariant Attacks

no code implementations31 Jan 2019 Houpu Yao, Zhe Wang, GuangYu Nie, Yassine Mazboudi, Yezhou Yang, Yi Ren

The vulnerability of neural networks under adversarial attacks has raised serious concerns and motivated extensive research.

Image Cropping Translation

How Shall I Drive? Interaction Modeling and Motion Planning towards Empathetic and Socially-Graceful Driving

no code implementations28 Jan 2019 Yi Ren, Steven Elliott, Yiwei Wang, Yezhou Yang, Wenlong Zhang

While intelligence of autonomous vehicles (AVs) has significantly advanced in recent years, accidents involving AVs suggest that these autonomous systems lack gracefulness in driving when interacting with human drivers.

Robotics Computer Science and Game Theory

Spatial Knowledge Distillation to aid Visual Reasoning

no code implementations10 Dec 2018 Somak Aditya, Rudra Saha, Yezhou Yang, Chitta Baral

We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering.

Knowledge Distillation Question Answering +4

GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment

no code implementations21 Sep 2018 Xin Ye, Zhe Lin, Joon-Young Lee, Jianming Zhang, Shibin Zheng, Yezhou Yang

We study the problem of learning a generalizable action policy for an intelligent agent to actively approach an object of interest in an indoor environment solely from its visual inputs.

Semantic Segmentation Visual Navigation

Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots

no code implementations30 Jul 2018 Xin Ye, Zhe Lin, Haoxiang Li, Shibin Zheng, Yezhou Yang

We study the problem of learning a navigation policy for a robot to actively search for an object of interest in an indoor environment solely from its visual inputs.

Object Recognition Visual Navigation

Interpretable Partitioned Embedding for Customized Fashion Outfit Composition

no code implementations13 Jun 2018 Zunlei Feng, Zhenyun Yu, Yezhou Yang, Yongcheng Jing, Junxiao Jiang, Mingli Song

In the supervised attributes module, multiple attributes labels are adopted to ensure that different parts of the overall embedding correspond to different attributes.

Weakly Supervised Attention Learning for Textual Phrases Grounding

no code implementations1 May 2018 Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.

Prospects for Theranostics in Neurosurgical Imaging: Empowering Confocal Laser Endomicroscopy Diagnostics via Deep Learning

no code implementations26 Apr 2018 Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Michael Mooney, Jennifer Eschbacher, Peter Nakaji, Yezhou Yang, Mark C. Preul

We present an overview and discuss deep learning models for automatic detection of the diagnostic CLE images and discuss various training regimes and ensemble modeling effect on the power of deep learning predictive models.

Weakly-Supervised Learning-Based Feature Localization in Confocal Laser Endomicroscopy Glioma Images

no code implementations25 Apr 2018 Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Claudio Cavallo, Xiaochun Zhao, Sirin Gandhi, Leandro Borba Moreira, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

To overcome this problem, we propose a Weakly-Supervised Learning (WSL)-based model for feature localization that trains on image-level annotations, and then localizes incidences of a class-of-interest in the test image.

Decision Making Image Segmentation +5

Transductive Unbiased Embedding for Zero-Shot Learning

no code implementations CVPR 2018 Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, Mingli Song

Most existing Zero-Shot Learning (ZSL) methods have the strong bias problem, in which instances of unseen (target) classes tend to be categorized as one of the seen (source) classes.

Transductive Learning Zero-Shot Learning

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

no code implementations23 Mar 2018 Somak Aditya, Yezhou Yang, Chitta Baral

Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image.

Question Answering Visual Question Answering

Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields

1 code implementation ECCV 2018 Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, DaCheng Tao, Mingli Song

In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control.

Style Transfer

DeepSIC: Deep Semantic Image Compression

no code implementations29 Jan 2018 Sihui Luo, Yezhou Yang, Mingli Song

The same practice also enable the compressed code to carry the image semantic information during storage and transmission.

Benchmarking Image Compression +1

TripletGAN: Training Generative Model with Triplet Loss

no code implementations14 Nov 2017 Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song

As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.

Face Recognition General Classification +1

Convolutional Neural Networks: Ensemble Modeling, Fine-Tuning and Unsupervised Semantic Localization for Intraoperative CLE Images

no code implementations10 Sep 2017 Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Michael Mooney, Nikolay Martirosyan, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

While manual examination of thousands of nondiagnostic images during surgery would be impractical, this creates an opportunity for a model to select diagnostic images for the pathologists or surgeon's review.


On the Importance of Consistency in Training Deep Neural Networks

no code implementations2 Aug 2017 Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

We conclude this paper with the construction of a novel contractive neural network.

Neural Style Transfer: A Review

8 code implementations11 May 2017 Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, Mingli Song

We first propose a taxonomy of current algorithms in the field of NST.

Style Transfer

Fast Task-Specific Target Detection via Graph Based Constraints Representation and Checking

no code implementations14 Nov 2016 Went Luan, Yezhou Yang, Cornelia Fermuller, John S. Baras

In this work, we present a fast target detection framework for real-world robotics applications.

Prediction of Manipulation Actions

no code implementations3 Oct 2016 Cornelia Fermüller, Fang Wang, Yezhou Yang, Konstantinos Zampogiannis, Yi Zhang, Francisco Barranco, Michael Pfeiffer

In psychophysical experiments, we evaluated human observers' skills in predicting actions from video sequences of different length, depicting the hand movement in the preparation and execution of actions before and after contact with the object.

Co-active Learning to Adapt Humanoid Movement for Manipulation

no code implementations12 Sep 2016 Ren Mao, John S. Baras, Yezhou Yang, Cornelia Fermuller

It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with various constraints.

Active Learning

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

1 code implementation9 May 2016 Chengxi Ye, Chen Zhao, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

LightNet is a lightweight, versatile and purely Matlab-based deep learning framework.

What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots

no code implementations29 Jan 2016 Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

For robots that have the capability to interact with the physical environment through their end effectors, understanding the surrounding scenes is not merely a task of image classification or object recognition.

Image Classification Object Recognition +1

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

no code implementations10 Dec 2015 Yezhou Yang, Yi Li, Cornelia Fermuller, Yiannis Aloimonos

In this paper we consider the problem of continuously discovering image contents by actively asking image based questions and subsequently answering the questions being asked.

Question Answering Question Generation +2

From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge

no code implementations10 Nov 2015 Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, Yiannis Aloimonos

Specifically, commonsense reasoning is applied on (a) detections obtained from existing perception methods on given images, (b) a "commonsense" knowledge base constructed using natural language processing of image annotations and (c) lexical ontological knowledge from resources such as WordNet.

image-sentence alignment

Detection of Manipulation Action Consequences (MAC)

no code implementations CVPR 2013 Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions.

Action Recognition Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.