Search Results for author: Yezhou Yang

Found 96 papers, 36 papers with code

To Find Waldo You Need Contextual Cues: Debiasing Who’s Waldo

1 code implementation • ACL 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who’s Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding +1

Paper
Code

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

1 code implementation • 12 Apr 2024 • Agneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang

Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance.

Monocular Depth Estimation

Paper
Code

`Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning

no code implementations • 12 Apr 2024 • Joshua Feinglass, Jayaraman J. Thiagarajan, Rushil Anirudh, T. S. Jayram, Yezhou Yang

Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.

Attribute Generalized Zero-Shot Learning

Paper
Add Code

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Paper
Code

eTraM: Event-based Traffic Monitoring Dataset

no code implementations • 29 Mar 2024 • Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang

Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields.

Paper
Add Code

Learning Decomposable and Debiased Representations via Attribute-Centric Information Bottlenecks

no code implementations • 21 Mar 2024 • Jinyung Hong, Eun Som Jeon, Changhoon Kim, Keun Hee Park, Utkarsh Nath, Yezhou Yang, Pavan Turaga, Theodore P. Pavlic

Biased attributes, spuriously correlated with target labels in a dataset, can problematically lead to neural networks that learn improper shortcuts for classifications and limit their capabilities for out-of-distribution (OOD) generalization.

Attribute Representation Learning

Paper
Add Code

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

no code implementations • 17 Mar 2024 • Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set.

Translation

Paper
Add Code

$λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

no code implementations • 7 Feb 2024 • Maitreya Patel, Sangmin Jung, Chitta Baral, Yezhou Yang

While LDMs offer distinct advantages, P-T2I methods' reliance on the latent space of these diffusion models significantly escalates resource demands, leading to inconsistent results and necessitating numerous iterations for a single desired image.

Concept Alignment Philosophy

Paper
Add Code

Segment Anything Model Can Not Segment Anything: Assessing AI Foundation Model's Generalizability in Permafrost Mapping

no code implementations • 16 Jan 2024 • Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna Liljedahl, Chandi Witharana, Yili Yang, Brendan M. Rogers, Samantha T. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis

To evaluate the performance of large AI vision models, especially Meta's Segment Anything Model (SAM), we implemented different instance segmentation pipelines that minimize the changes to SAM to leverage its power as a foundation model.

Instance Segmentation Semantic Segmentation

Paper
Add Code

Open-TI: Open Traffic Intelligence with Augmented Language Model

1 code implementation • 30 Dec 2023 • Longchao Da, Kuanru Liou, Tiejin Chen, Xuesong Zhou, Xiangyong Luo, Yezhou Yang, Hua Wei

Transportation has greatly benefited the cities' development in the modern civilization process.

Language Modelling

Paper
Code

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

no code implementations • 7 Dec 2023 • Maitreya Patel, Changhoon Kim, Sheng Cheng, Chitta Baral, Yezhou Yang

The T2I prior model alone adds a billion parameters compared to the Latent Diffusion Models, which increases the computational and high-quality data requirements.

Contrastive Learning

Paper
Add Code

SKoPe3D: A Synthetic Dataset for Vehicle Keypoint Perception in 3D from Traffic Monitoring Cameras

no code implementations • 4 Sep 2023 • Himanshu Pahadia, Duo Lu, Bharatesh Chakravarthi, Yezhou Yang

Intelligent transportation systems (ITS) have revolutionized modern road infrastructure, providing essential functionalities such as traffic monitoring, road safety assessment, congestion reduction, and law enforcement.

Keypoint Detection Transfer Learning +1

Paper
Add Code

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

1 code implementation • 1 Sep 2023 • Joshua Feinglass, Yezhou Yang

Object proposal generation serves as a standard pre-processing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.).

Graph Generation Image Captioning +5

Paper
Code

Adversarial Bayesian Augmentation for Single-Source Domain Generalization

1 code implementation • ICCV 2023 • Sheng Cheng, Tejas Gokhale, Yezhou Yang

Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings.

Data Augmentation Domain Generalization

Paper
Code

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models

no code implementations • 7 Jun 2023 • Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang

The rapid advancement of generative models, facilitating the creation of hyper-realistic images from textual descriptions, has concurrently escalated critical societal concerns such as misinformation.

Misinformation

Paper
Add Code

ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

1 code implementation • 7 Jun 2023 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a. k. a.

Concept Alignment

Paper
Code

End-to-end Knowledge Retrieval with Multi-modal Queries

1 code implementation • 1 Jun 2023 • Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral

We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.

Benchmarking Cross-Modal Retrieval +2

Paper
Code

CAROM Air -- Vehicle Localization and Traffic Scene Reconstruction from Aerial Videos

no code implementations • 31 May 2023 • Duo Lu, Eric Eaton, Matt Weg, Wei Wang, Steven Como, Jeffrey Wishart, Hongbin Yu, Yezhou Yang

Road traffic scene reconstruction from videos has been desirable by road safety regulators, city planners, researchers, and autonomous driving technology developers.

Autonomous Driving

Paper
Add Code

Attributing Image Generative Models using Latent Fingerprints

1 code implementation • 17 Apr 2023 • GuangYu Nie, Changhoon Kim, Yezhou Yang, Yi Ren

This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff.

Attribute

Paper
Code

Mole Recruitment: Poisoning of Image Classifiers via Selective Batch Sampling

1 code implementation • 30 Mar 2023 • Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang

In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.

Continual Learning Data Poisoning +1

Paper
Code

Benchmarking Spatial Relationships in Text-to-Image Generation

1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang

We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.

Benchmarking Text-to-Image Generation

Paper
Code

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task

1 code implementation • 7 Dec 2022 • Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral

'Actions' play a vital role in how humans interact with the world.

Graph Question Answering Question Answering

Paper
Code

Learning Action-Effect Dynamics from Pairs of Scene-graphs

no code implementations • 7 Dec 2022 • Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral

'Actions' play a vital role in how humans interact with the world.

Paper
Add Code

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

1 code implementation • 7 Nov 2022 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

Videos often capture objects, their visible properties, their motion, and the interactions between different objects.

Ranked #1 on Counterfactual Planning on CRIPP-VQA

Add - PO Add - PQ +12

Paper
Code

Reasoning about Actions over Visual and Linguistic Modalities: A Survey

no code implementations • 15 Jul 2022 • Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang, Chitta Baral

'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals.

Common Sense Reasoning

Paper
Add Code

Formalizing and Evaluating Requirements of Perception Systems for Automated Vehicles using Spatio-Temporal Perception Logic

1 code implementation • 29 Jun 2022 • Mohammad Hekmatnejad, Bardh Hoxha, Jyotirmoy V. Deshmukh, Yezhou Yang, Georgios Fainekos

Automated vehicles (AV) heavily depend on robust perception systems.

Paper
Code

Improving Diversity with Adversarially Learned Transformations for Domain Generalization

1 code implementation • 15 Jun 2022 • Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang

To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies.

Domain Generalization

Paper
Code

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

1 code implementation • 28 Apr 2022 • Arnav Chakravarthy, Zhiyuan Fang, Yezhou Yang

In videos that contain actions performed unintentionally, agents do not achieve their desired goals.

Action Understanding Video Captioning

Paper
Code

SSR-GNNs: Stroke-based Sketch Representation with Graph Neural Networks

no code implementations • 27 Apr 2022 • Sheng Cheng, Yi Ren, Yezhou Yang

This paper follows cognitive studies to investigate a graph representation for sketches, where the information of strokes, i. e., parts of a sketch, are encoded on vertices and information of inter-stroke on edges.

Paper
Add Code

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

1 code implementation • 30 Mar 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.

Benchmarking Person-centric Visual Grounding +1

Paper
Code

Injecting Semantic Concepts into End-to-End Image Captioning

1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.

Caption Generation Image Captioning

Paper
Code

Semantically Distributed Robust Optimization for Vision-and-Language Inference

1 code implementation • Findings (ACL) 2022 • Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.

Data Augmentation Natural Language Inference +2

Paper
Code

Targeted Attack on Deep RL-based Autonomous Driving with Learned Visual Patterns

1 code implementation • 16 Sep 2021 • Prasanth Buddareddygari, Travis Zhang, Yezhou Yang, Yi Ren

This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical objects in the environment, a threat model that combines the practicality and effectiveness of the existing ones.

Autonomous Driving

Paper
Code

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

no code implementations • ICCV 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.

Question Answering Visual Question Answering +1

Paper
Add Code

SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis

1 code implementation • ACL 2021 • Joshua Feinglass, Yezhou Yang

The open-ended nature of visual captioning makes it a challenging area for evaluation.

Image Captioning

Paper
Code

CLEVR\_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

1 code implementation • NAACL 2021 • Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Chitta Baral

Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video.

Question Answering Visual Question Answering

Paper
Code

CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

1 code implementation • 13 Apr 2021 • Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Chitta Baral

Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video.

Question Answering Visual Question Answering

Paper
Code

Compressing Visual-linguistic Model via Knowledge Distillation

no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu

In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.

Image Captioning Knowledge Distillation +2

Paper
Add Code

Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph

1 code implementation • CVPR 2021 • Xin Ye, Yezhou Yang

We present a novel two-layer hierarchical reinforcement learning approach equipped with a Goals Relational Graph (GRG) for tackling the partially observable goal-driven task, such as goal-driven visual navigation.

Hierarchical Reinforcement Learning Reinforcement Learning (RL) +1

Paper
Code

SEED: Self-supervised Distillation For Visual Representation

1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu

This paper is concerned with self-supervised learning for small models.

Knowledge Distillation Self-Supervised Learning +1

Paper
Code

WeaQA: Weak Supervision via Captions for Visual Question Answering

no code implementations • Findings (ACL) 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets.

Question Answering Visual Question Answering

Paper
Add Code

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

3 code implementations • 3 Dec 2020 • Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang

While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes.

Attribute

Paper
Code

Decentralized Attribution of Generative Models

no code implementations • ICLR 2021 • Changhoon Kim, Yi Ren, Yezhou Yang

Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement.

Paper
Add Code

Efficient Robotic Object Search via HIEM: Hierarchical Policy Learning with Intrinsic-Extrinsic Modeling

no code implementations • 16 Oct 2020 • Xin Ye, Yezhou Yang

Despite the significant success at enabling robots with autonomous behaviors makes deep reinforcement learning a promising approach for robotic object search task, the deep reinforcement learning approach severely suffers from the nature sparse reward setting of the task.

Efficient Exploration Object +2

Paper
Add Code

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

2 code implementations • EMNLP 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.

Out-of-Distribution Generalization Question Answering +1

Paper
Code

Low to High Dimensional Modality Hallucination using Aggregated Fields of View

1 code implementation • 13 Jul 2020 • Kausic Gunasekar, Qiang Qiu, Yezhou Yang

While hallucinating data from a modality with richer information, e. g., RGB to depth, has been researched extensively, we investigate the more challenging low-to-high modality hallucination with interesting use cases in robotics and autonomous systems.

Hallucination Vocal Bursts Intensity Prediction

Paper
Code

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

no code implementations • 21 Jun 2020 • Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang

The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.

Paper
Add Code

Resisting Crowd Occlusion and Hard Negatives for Pedestrian Detection in the Wild

no code implementations • 15 May 2020 • Zhe Wang, Jun Wang, Yezhou Yang

Pedestrian detection has been heavily studied in the last decade due to its wide application.

object-detection Object Detection +1

Paper
Add Code

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

2 code implementations • ECCV 2020 • Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang

Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.

Ranked #18 on Text based Person Retrieval on CUHK-PEDES

Attribute Contrastive Learning +2

Paper
Code

Learning hierarchical behavior and motion planning for autonomous driving

1 code implementation • 8 May 2020 • Jingke Wang, Yue Wang, Dongkun Zhang, Yezhou Yang, Rong Xiong

To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution.

Autonomous Driving Decision Making +2

Paper
Code

Visuo-Linguistic Question Answering (VLQA) Challenge

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shailaja Keyur Sampat, Yezhou Yang, Chitta Baral

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems.

Question Answering Reading Comprehension +1

Paper
Code

memeBot: Towards Automatic Image Meme Generation

no code implementations • 30 Apr 2020 • Aadhavan Sadasivam, Kausic Gunasekar, Hasan Davulcu, Yezhou Yang

For a given input sentence, an image meme is generated by combining a meme template image and a text caption where the meme template image is selected from a set of popular candidates using a selection module, and the meme caption is generated by an encoder-decoder model.

Meme Captioning Meme Classification +1

Paper
Add Code

Enabling Incremental Knowledge Transfer for Object Detection at the Edge

no code implementations • 13 Apr 2020 • Mohammad Farhadi Bajestani, Mehdi Ghasemi, Sarma Vrudhula, Yezhou Yang

However, we need a limited knowledge of the observed environment at inference time which can be learned using a shallow neural network (SHNN).

Object object-detection +2

Paper
Add Code

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

2 code implementations • EMNLP 2020 • Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene.

Question Answering Video Captioning +1

Paper
Code

From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)

no code implementations • 26 Feb 2020 • Xin Ye, Yezhou Yang

Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods.

BIG-bench Machine Learning Visual Navigation

Paper
Add Code

VQA-LOL: Visual Question Answering under the Lens of Logic

no code implementations • ECCV 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr\'echet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation.

Negation Question Answering +2

Paper
Add Code

Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

no code implementations • 21 Oct 2019 • Jiaying Lu, Xin Ye, Yi Ren, Yezhou Yang

Multiple-choice VQA has drawn increasing attention from researchers and end-users recently.

Data Augmentation Decision Making +5

Paper
Add Code

ROS-HPL: Robotic Object Search with Hierarchical Policy Learning and Intrinsic-Extrinsic Modeling

no code implementations • 25 Sep 2019 • Xin Ye, Shibin Zheng, Yezhou Yang

Despite significant progress in Robotic Object Search (ROS) over the recent years with deep reinforcement learning based approaches, the sparsity issue in reward setting as well as the lack of interpretability of the previous ROS approaches leave much to be desired.

Object

Paper
Add Code

A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA

no code implementations • 5 Sep 2019 • Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang

On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks.

Decision Making

Paper
Add Code

Integrating Knowledge and Reasoning in Image Understanding

no code implementations • 24 Jun 2019 • Somak Aditya, Yezhou Yang, Chitta Baral

Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering.

Object Recognition Question Answering +2

Paper
Add Code

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs

no code implementations • 28 May 2019 • Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral

The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.

Paper
Add Code

Fluorescence Image Histology Pattern Transformation using Image Style Transfer

no code implementations • 15 May 2019 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Xiaochun Zhao, Leandro Borba Moreira, Sirin Gandhi, Claudio Cavallo, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

To improve the diagnostic quality of CLE, we used a micrograph of an H&E slide from a glioma tumor biopsy and image style transfer, a neural network method for integrating the content and style of two images.

Style Transfer

Paper
Add Code

Active Adversarial Evader Tracking with a Probabilistic Pursuer under the Pursuit-Evasion Game Framework

no code implementations • 19 Apr 2019 • Varun Chandra Jammula, Anshul Rai, Yezhou Yang

To validate the efficiency of the framework, we conduct several experiments in simulation by using Gazebo and evaluate the success rate of tracking an evader in various environments with different pursuer to evader speed ratios.

Paper
Add Code

Modularized Textual Grounding for Counterfactual Resilience

1 code implementation • CVPR 2019 • Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang

Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.

Attribute counterfactual +4

Paper
Code

TKD: Temporal Knowledge Distillation for Active Perception

no code implementations • 4 Mar 2019 • Mohammad Farhadi, Yezhou Yang

Deep neural networks based methods have been proved to achieve outstanding performance on object detection and classification tasks.

Knowledge Distillation Object +3

Paper
Add Code

Image Decomposition and Classification through a Generative Model

no code implementations • 9 Feb 2019 • Houpu Yao, Malcolm Regan, Yezhou Yang, Yi Ren

We demonstrate in this paper that a generative model can be designed to perform classification tasks under challenging settings, including adversarial attacks and input distribution shifts.

Classification General Classification

Paper
Add Code

Augmenting Model Robustness with Transformation-Invariant Attacks

no code implementations • 31 Jan 2019 • Houpu Yao, Zhe Wang, GuangYu Nie, Yassine Mazboudi, Yezhou Yang, Yi Ren

The vulnerability of neural networks under adversarial attacks has raised serious concerns and motivated extensive research.

Image Cropping Translation

Paper
Add Code

How Shall I Drive? Interaction Modeling and Motion Planning towards Empathetic and Socially-Graceful Driving

no code implementations • 28 Jan 2019 • Yi Ren, Steven Elliott, Yiwei Wang, Yezhou Yang, Wenlong Zhang

While intelligence of autonomous vehicles (AVs) has significantly advanced in recent years, accidents involving AVs suggest that these autonomous systems lack gracefulness in driving when interacting with human drivers.

Robotics Computer Science and Game Theory

Paper
Add Code

Spatial Knowledge Distillation to aid Visual Reasoning

no code implementations • 10 Dec 2018 • Somak Aditya, Rudra Saha, Yezhou Yang, Chitta Baral

We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering.

Knowledge Distillation Question Answering +3

Paper
Add Code

GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment

no code implementations • 21 Sep 2018 • Xin Ye, Zhe Lin, Joon-Young Lee, Jianming Zhang, Shibin Zheng, Yezhou Yang

We study the problem of learning a generalizable action policy for an intelligent agent to actively approach an object of interest in an indoor environment solely from its visual inputs.

Semantic Segmentation Visual Navigation

Paper
Add Code

Active Object Perceiver: Recognition-guided Policy Learning for Object Searching on Mobile Robots

no code implementations • 30 Jul 2018 • Xin Ye, Zhe Lin, Haoxiang Li, Shibin Zheng, Yezhou Yang

We study the problem of learning a navigation policy for a robot to actively search for an object of interest in an indoor environment solely from its visual inputs.

Object Object Recognition +1

Paper
Add Code

Interpretable Partitioned Embedding for Customized Fashion Outfit Composition

no code implementations • 13 Jun 2018 • Zunlei Feng, Zhenyun Yu, Yezhou Yang, Yongcheng Jing, Junxiao Jiang, Mingli Song

In the supervised attributes module, multiple attributes labels are adopted to ensure that different parts of the overall embedding correspond to different attributes.

Attribute

Paper
Add Code

Weakly Supervised Attention Learning for Textual Phrases Grounding

no code implementations • 1 May 2018 • Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang

Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.

Paper
Add Code

Prospects for Theranostics in Neurosurgical Imaging: Empowering Confocal Laser Endomicroscopy Diagnostics via Deep Learning

no code implementations • 26 Apr 2018 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Michael Mooney, Jennifer Eschbacher, Peter Nakaji, Yezhou Yang, Mark C. Preul

We present an overview and discuss deep learning models for automatic detection of the diagnostic CLE images and discuss various training regimes and ensemble modeling effect on the power of deep learning predictive models.

Paper
Add Code

Weakly-Supervised Learning-Based Feature Localization in Confocal Laser Endomicroscopy Glioma Images

no code implementations • 25 Apr 2018 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Claudio Cavallo, Xiaochun Zhao, Sirin Gandhi, Leandro Borba Moreira, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

To overcome this problem, we propose a Weakly-Supervised Learning (WSL)-based model for feature localization that trains on image-level annotations, and then localizes incidences of a class-of-interest in the test image.

Decision Making Image Segmentation +4

Paper
Add Code

Transductive Unbiased Embedding for Zero-Shot Learning

no code implementations • CVPR 2018 • Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, Mingli Song

Most existing Zero-Shot Learning (ZSL) methods have the strong bias problem, in which instances of unseen (target) classes tend to be categorized as one of the seen (source) classes.

Transductive Learning Zero-Shot Learning

Paper
Add Code

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

no code implementations • 23 Mar 2018 • Somak Aditya, Yezhou Yang, Chitta Baral

Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image.

Question Answering Visual Question Answering

Paper
Add Code

Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields

1 code implementation • ECCV 2018 • Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, DaCheng Tao, Mingli Song

In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control.

Style Transfer

158

Paper
Code

DeepSIC: Deep Semantic Image Compression

no code implementations • 29 Jan 2018 • Sihui Luo, Yezhou Yang, Mingli Song

The same practice also enable the compressed code to carry the image semantic information during storage and transmission.

Benchmarking Image Compression +1

Paper
Add Code

Improving utility of brain tumor confocal laser endomicroscopy: objective value assessment and diagnostic frame detection with convolutional neural networks

no code implementations • 6 Jan 2018 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Nikolay Martirosyan, Jennifer Eschbacher, Peter Nakaji, Yezhou Yang, Mark C. Preul

Examining all the hundreds or thousands of images from a single case to discriminate diagnostic images from nondiagnostic ones can be tedious.

Specificity

Paper
Add Code

TripletGAN: Training Generative Model with Triplet Loss

no code implementations • 14 Nov 2017 • Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song

As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.

Face Recognition General Classification +1

Paper
Add Code

Convolutional Neural Networks: Ensemble Modeling, Fine-Tuning and Unsupervised Semantic Localization for Intraoperative CLE Images

no code implementations • 10 Sep 2017 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Michael Mooney, Nikolay Martirosyan, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang

While manual examination of thousands of nondiagnostic images during surgery would be impractical, this creates an opportunity for a model to select diagnostic images for the pathologists or surgeon's review.

Paper
Add Code

On the Importance of Consistency in Training Deep Neural Networks

no code implementations • 2 Aug 2017 • Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

We conclude this paper with the construction of a novel contractive neural network.

Paper
Add Code

Neural Style Transfer: A Review

8 code implementations • 11 May 2017 • Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, Mingli Song

We first propose a taxonomy of current algorithms in the field of NST.

Style Transfer

1,621

Paper
Code

Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic

no code implementations • 17 Nov 2016 • Somak Aditya, Yezhou Yang, Chitta Baral, Yiannis Aloimonos

We compile a dataset of over 3k riddles where each riddle consists of 4 images and a groundtruth answer.

Activity Recognition Question Answering

Paper
Add Code

Fast Task-Specific Target Detection via Graph Based Constraints Representation and Checking

no code implementations • 14 Nov 2016 • Went Luan, Yezhou Yang, Cornelia Fermuller, John S. Baras

In this work, we present a fast target detection framework for real-world robotics applications.

Object

Paper
Add Code

Prediction of Manipulation Actions

no code implementations • 3 Oct 2016 • Cornelia Fermüller, Fang Wang, Yezhou Yang, Konstantinos Zampogiannis, Yi Zhang, Francisco Barranco, Michael Pfeiffer

In psychophysical experiments, we evaluated human observers' skills in predicting actions from video sequences of different length, depicting the hand movement in the preparation and execution of actions before and after contact with the object.

Paper
Add Code

Co-active Learning to Adapt Humanoid Movement for Manipulation

no code implementations • 12 Sep 2016 • Ren Mao, John S. Baras, Yezhou Yang, Cornelia Fermuller

It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with various constraints.

Active Learning

Paper
Add Code

Reliable Attribute-Based Object Recognition Using High Predictive Value Classifiers

no code implementations • 12 Sep 2016 • Wentao Luan, Yezhou Yang, Cornelia Fermuller, John Baras

We consider the problem of object recognition in 3D using an ensemble of attribute-based classifiers.

Attribute Decision Making +3

Paper
Add Code

LightNet: A Versatile, Standalone Matlab-based Environment for Deep Learning

1 code implementation • 9 May 2016 • Chengxi Ye, Chen Zhao, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

LightNet is a lightweight, versatile and purely Matlab-based deep learning framework.

267

Paper
Code

What Can I Do Around Here? Deep Functional Scene Understanding for Cognitive Robots

no code implementations • 29 Jan 2016 • Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

For robots that have the capability to interact with the physical environment through their end effectors, understanding the surrounding scenes is not merely a task of image classification or object recognition.

Image Classification Object Recognition +1

Paper
Add Code

Neural Self Talk: Image Understanding via Continuous Questioning and Answering

no code implementations • 10 Dec 2015 • Yezhou Yang, Yi Li, Cornelia Fermuller, Yiannis Aloimonos

In this paper we consider the problem of continuously discovering image contents by actively asking image based questions and subsequently answering the questions being asked.

Ranked #3 on Question Generation on COCO Visual Question Answering (VQA) real images 1.0 open ended

Question Answering Question Generation +2

Paper
Add Code

Learning the Semantics of Manipulation Action

no code implementations • IJCNLP 2015 • Yezhou Yang, Yiannis Aloimonos, Cornelia Fermuller, Eren Erdal Aksoy

In this paper we present a formal computational framework for modeling manipulation actions.

Semantic Parsing

Paper
Add Code

From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge

no code implementations • 10 Nov 2015 • Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, Yiannis Aloimonos

Specifically, commonsense reasoning is applied on (a) detections obtained from existing perception methods on given images, (b) a "commonsense" knowledge base constructed using natural language processing of image annotations and (c) lexical ontological knowledge from resources such as WordNet.

image-sentence alignment Sentence

Paper
Add Code

Grasp Type Revisited: A Modern Perspective on a Classical Feature for Vision

no code implementations • CVPR 2015 • Yezhou Yang, Cornelia Fermuller, Yi Li, Yiannis Aloimonos

The grasp type provides crucial information about human action.

Action Segmentation Action Understanding +2

Paper
Add Code

Detection of Manipulation Action Consequences (MAC)

no code implementations • CVPR 2013 • Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions.

Action Recognition Temporal Action Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.