1 code implementation • ACL 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
We find that the original Who’s Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.
no code implementations • 27 Nov 2024 • Maitreya Patel, Song Wen, Dimitris N. Metaxas, Yezhou Yang
In this work, we first develop a theoretical and empirical understanding of the vector field dynamics of RFMs in efficiently guiding the denoising trajectory.
1 code implementation • 7 Nov 2024 • Sheng Cheng, Maitreya Patel, Yezhou Yang
Despite advancements in text-to-image models, generating images that precisely align with textual descriptions remains challenging due to misalignment in training data.
no code implementations • 4 Nov 2024 • Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang
However, the lack of compositional diversity in contemporary image-text datasets limits the compositional reasoning ability of CLIP.
1 code implementation • 17 Oct 2024 • Shailaja Keyur Sampat, Maitreya Patel, Yezhou Yang, Chitta Baral
An ability to learn about new objects from a small amount of visual data and produce convincing linguistic justification about the presence/absence of certain concepts (that collectively compose the object) in novel scenarios is an important characteristic of human cognition.
1 code implementation • 17 Oct 2024 • Shailaja Keyur Sampat, Yezhou Yang, Chitta Baral
We present baseline results of ActionCOMET over the collected dataset and compare them with the performance of the best existing VQA approaches.
1 code implementation • 17 Oct 2024 • Shailaja Keyur Sampat, Mutsumi Nakamura, Shankar Kailas, Kartik Aggarwal, Mandy Zhou, Yezhou Yang, Chitta Baral
We show that this benchmark is quite challenging for existing large-scale vision-language models and encourage development of systems that possess robust visuo-linguistic reasoning capabilities.
1 code implementation • 30 Sep 2024 • Joshua Feinglass, Yezhou Yang
Zero-shot inference, where pre-trained models perform tasks without specific training data, is an exciting emergent ability of large models like CLIP.
no code implementations • 5 Sep 2024 • Sheng Cheng, Deqian Kong, Jianwen Xie, Kookjin Lee, Ying Nian Wu, Yezhou Yang
This family of models generates each data point in the time series by a neural emission model, which is a non-linear transformation of a latent state vector.
no code implementations • 1 Sep 2024 • Manthan Chelenahalli Satish, Duo Lu, Bharatesh Chakravarthi, Mohammad Farhadi, Yezhou Yang
This research represents an advancement in roundabout DZ data mining and forecasting, contributing to the assurance of intersection safety in the era of autonomous vehicles.
1 code implementation • 24 Aug 2024 • Bharatesh Chakravarthi, Aayush Atul Verma, Kostas Daniilidis, Cornelia Fermuller, Yezhou Yang
Event-based vision, inspired by the human visual system, offers transformative capabilities such as low latency, high dynamic range, and reduced power consumption.
1 code implementation • 18 Aug 2024 • Tiejin Chen, Prithvi Shirke, Bharatesh Chakravarthi, Arpitsinh Vaghela, Longchao Da, Duo Lu, Yezhou Yang, Hua Wei
This paper introduces SynTraC, the first public image-based traffic signal control dataset, aimed at bridging the gap between simulated environments and real-world traffic management challenges.
no code implementations • 12 Aug 2024 • Utkarsh Nath, Rajeev Goel, Eun Som Jeon, Changhoon Kim, Kyle Min, Yezhou Yang, Yingzhen Yang, Pavan Turaga
To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines.
no code implementations • 5 Aug 2024 • Agneet Chatterjee, Yiran Luo, Tejas Gokhale, Yezhou Yang, Chitta Baral
Text-to-Image (T2I) and multimodal large language models (MLLMs) have been adopted in solutions for several computer vision and multimodal learning tasks.
no code implementations • 25 May 2024 • Changhoon Kim, Kyle Min, Yezhou Yang
In the evolving landscape of text-to-image (T2I) diffusion models, the remarkable capability to generate high-quality images from textual descriptions faces challenges with the potential misuse of reproducing sensitive content.
2 code implementations • 24 May 2024 • Yiran Luo, Joshua Feinglass, Tejas Gokhale, Kuan-Cheng Lee, Chitta Baral, Yezhou Yang
We first introduce two new quantitative measures ICV and IDD to describe domain shifts in terms of consistency of classes within one domain and similarity between two stylistic domains.
no code implementations • 12 Apr 2024 • Joshua Feinglass, Jayaraman J. Thiagarajan, Rushil Anirudh, T. S. Jayram, Yezhou Yang
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
2 code implementations • CVPR 2024 • Agneet Chatterjee, Tejas Gokhale, Chitta Baral, Yezhou Yang
Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance.
1 code implementation • 12 Apr 2024 • Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang
In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator.
1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.
no code implementations • CVPR 2024 • Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang
Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields.
no code implementations • 21 Mar 2024 • Jinyung Hong, Eun Som Jeon, Changhoon Kim, Keun Hee Park, Utkarsh Nath, Yezhou Yang, Pavan Turaga, Theodore P. Pavlic
Biased attributes, spuriously correlated with target labels in a dataset, can problematically lead to neural networks that learn improper shortcuts for classifications and limit their capabilities for out-of-distribution (OOD) generalization.
no code implementations • 17 Mar 2024 • Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang
Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set.
1 code implementation • 7 Feb 2024 • Maitreya Patel, Sangmin Jung, Chitta Baral, Yezhou Yang
While LDMs offer distinct advantages, P-T2I methods' reliance on the latent space of these diffusion models significantly escalates resource demands, leading to inconsistent results and necessitating numerous iterations for a single desired image.
no code implementations • 16 Jan 2024 • Wenwen Li, Chia-Yu Hsu, Sizhe Wang, Yezhou Yang, Hyunho Lee, Anna Liljedahl, Chandi Witharana, Yili Yang, Brendan M. Rogers, Samantha T. Arundel, Matthew B. Jones, Kenton McHenry, Patricia Solis
To evaluate the performance of large AI vision models, especially Meta's Segment Anything Model (SAM), we implemented different instance segmentation pipelines that minimize the changes to SAM to leverage its power as a foundation model.
1 code implementation • 30 Dec 2023 • Longchao Da, Kuanru Liou, Tiejin Chen, Xuesong Zhou, Xiangyong Luo, Yezhou Yang, Hua Wei
Transportation has greatly benefited the cities' development in the modern civilization process.
no code implementations • CVPR 2024 • Maitreya Patel, Changhoon Kim, Sheng Cheng, Chitta Baral, Yezhou Yang
The T2I prior model alone adds a billion parameters compared to the Latent Diffusion Models, which increases the computational and high-quality data requirements.
no code implementations • 4 Sep 2023 • Himanshu Pahadia, Duo Lu, Bharatesh Chakravarthi, Yezhou Yang
Intelligent transportation systems (ITS) have revolutionized modern road infrastructure, providing essential functionalities such as traffic monitoring, road safety assessment, congestion reduction, and law enforcement.
1 code implementation • 1 Sep 2023 • Joshua Feinglass, Yezhou Yang
Object proposal generation serves as a standard pre-processing step in Vision-Language (VL) tasks (image captioning, visual question answering, etc.).
1 code implementation • ICCV 2023 • Sheng Cheng, Tejas Gokhale, Yezhou Yang
Generalizing to unseen image domains is a challenging problem primarily due to the lack of diverse training data, inaccessible target data, and the large domain shift that may exist in many real-world settings.
Ranked #5 on Photo to Rest Generalization on PACS
1 code implementation • 7 Jun 2023 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
To quantify the ability of T2I models in learning and synthesizing novel visual concepts (a. k. a.
1 code implementation • CVPR 2024 • Changhoon Kim, Kyle Min, Maitreya Patel, Sheng Cheng, Yezhou Yang
This paper introduces a novel approach to model fingerprinting that assigns responsibility for the generated images, thereby serving as a potential countermeasure to model misuse.
1 code implementation • 1 Jun 2023 • Man Luo, Zhiyuan Fang, Tejas Gokhale, Yezhou Yang, Chitta Baral
We investigate knowledge retrieval with multi-modal queries, i. e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval.
1 code implementation • 31 May 2023 • Duo Lu, Eric Eaton, Matt Weg, Wei Wang, Steven Como, Jeffrey Wishart, Hongbin Yu, Yezhou Yang
Road traffic scene reconstruction from videos has been desirable by road safety regulators, city planners, researchers, and autonomous driving technology developers.
1 code implementation • 17 Apr 2023 • GuangYu Nie, Changhoon Kim, Yezhou Yang, Yi Ren
This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff.
1 code implementation • 30 Mar 2023 • Ethan Wisdom, Tejas Gokhale, Chaowei Xiao, Yezhou Yang
In this work, we present a data poisoning attack that confounds machine learning models without any manipulation of the image or label.
1 code implementation • 20 Dec 2022 • Tejas Gokhale, Hamid Palangi, Besmira Nushi, Vibhav Vineet, Eric Horvitz, Ece Kamar, Chitta Baral, Yezhou Yang
We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
no code implementations • 7 Dec 2022 • Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral
'Actions' play a vital role in how humans interact with the world.
1 code implementation • 7 Dec 2022 • Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral
'Actions' play a vital role in how humans interact with the world.
1 code implementation • 7 Nov 2022 • Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
Videos often capture objects, their visible properties, their motion, and the interactions between different objects.
Ranked #1 on Counterfactual Planning on CRIPP-VQA
no code implementations • 15 Jul 2022 • Shailaja Keyur Sampat, Maitreya Patel, Subhasish Das, Yezhou Yang, Chitta Baral
'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals.
1 code implementation • 29 Jun 2022 • Mohammad Hekmatnejad, Bardh Hoxha, Jyotirmoy V. Deshmukh, Yezhou Yang, Georgios Fainekos
Automated vehicles (AV) heavily depend on robust perception systems.
1 code implementation • 15 Jun 2022 • Tejas Gokhale, Rushil Anirudh, Jayaraman J. Thiagarajan, Bhavya Kailkhura, Chitta Baral, Yezhou Yang
To be successful in single source domain generalization, maximizing diversity of synthesized domains has emerged as one of the most effective strategies.
1 code implementation • 28 Apr 2022 • Arnav Chakravarthy, Zhiyuan Fang, Yezhou Yang
In videos that contain actions performed unintentionally, agents do not achieve their desired goals.
no code implementations • 27 Apr 2022 • Sheng Cheng, Yi Ren, Yezhou Yang
This paper follows cognitive studies to investigate a graph representation for sketches, where the information of strokes, i. e., parts of a sketch, are encoded on vertices and information of inter-stroke on edges.
1 code implementation • 30 Mar 2022 • Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image.
1 code implementation • CVPR 2022 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lin Liang, Zhe Gan, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.
1 code implementation • Findings (ACL) 2022 • Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang
Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms.
1 code implementation • 16 Sep 2021 • Prasanth Buddareddygari, Travis Zhang, Yezhou Yang, Yi Ren
This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical objects in the environment, a threat model that combines the practicality and effectiveness of the existing ones.
no code implementations • ICCV 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.
1 code implementation • ACL 2021 • Joshua Feinglass, Yezhou Yang
The open-ended nature of visual captioning makes it a challenging area for evaluation.
1 code implementation • NAACL 2021 • Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Chitta Baral
Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video.
1 code implementation • 13 Apr 2021 • Shailaja Keyur Sampat, Akshay Kumar, Yezhou Yang, Chitta Baral
Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video.
no code implementations • ICCV 2021 • Zhiyuan Fang, JianFeng Wang, Xiaowei Hu, Lijuan Wang, Yezhou Yang, Zicheng Liu
In this paper, we study knowledge distillation (KD) to effectively compress a transformer-based large VL model into a small VL model.
1 code implementation • CVPR 2021 • Xin Ye, Yezhou Yang
We present a novel two-layer hierarchical reinforcement learning approach equipped with a Goals Relational Graph (GRG) for tackling the partially observable goal-driven task, such as goal-driven visual navigation.
Hierarchical Reinforcement Learning Reinforcement Learning (RL) +1
1 code implementation • ICLR 2021 • Zhiyuan Fang, JianFeng Wang, Lijuan Wang, Lei Zhang, Yezhou Yang, Zicheng Liu
This paper is concerned with self-supervised learning for small models.
no code implementations • Findings (ACL) 2021 • Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral
Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets.
3 code implementations • 3 Dec 2020 • Tejas Gokhale, Rushil Anirudh, Bhavya Kailkhura, Jayaraman J. Thiagarajan, Chitta Baral, Yezhou Yang
While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes.
no code implementations • ICLR 2021 • Changhoon Kim, Yi Ren, Yezhou Yang
Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement.
no code implementations • 16 Oct 2020 • Xin Ye, Yezhou Yang
Despite the significant success at enabling robots with autonomous behaviors makes deep reinforcement learning a promising approach for robotic object search task, the deep reinforcement learning approach severely suffers from the nature sparse reward setting of the task.
2 code implementations • EMNLP 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.
1 code implementation • 13 Jul 2020 • Kausic Gunasekar, Qiang Qiu, Yezhou Yang
While hallucinating data from a modality with richer information, e. g., RGB to depth, has been researched extensively, we investigate the more challenging low-to-high modality hallucination with interesting use cases in robotics and autonomous systems.
no code implementations • 21 Jun 2020 • Zhiyuan Fang, Shu Kong, Zhe Wang, Charless Fowlkes, Yezhou Yang
The referring attention is our designed mechanism acting as a scoring function for grounding the given queries over frames temporally.
no code implementations • 15 May 2020 • Zhe Wang, Jun Wang, Yezhou Yang
Pedestrian detection has been heavily studied in the last decade due to its wide application.
2 code implementations • ECCV 2020 • Zhe Wang, Zhiyuan Fang, Jun Wang, Yezhou Yang
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.
Ranked #20 on Text based Person Retrieval on CUHK-PEDES
1 code implementation • 8 May 2020 • Jingke Wang, Yue Wang, Dongkun Zhang, Yezhou Yang, Rong Xiong
To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shailaja Keyur Sampat, Yezhou Yang, Chitta Baral
Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems.
no code implementations • 30 Apr 2020 • Aadhavan Sadasivam, Kausic Gunasekar, Hasan Davulcu, Yezhou Yang
For a given input sentence, an image meme is generated by combining a meme template image and a text caption where the meme template image is selected from a set of popular candidates using a selection module, and the meme caption is generated by an encoder-decoder model.
no code implementations • 13 Apr 2020 • Mohammad Farhadi Bajestani, Mehdi Ghasemi, Sarma Vrudhula, Yezhou Yang
However, we need a limited knowledge of the observed environment at inference time which can be learned using a shallow neural network (SHNN).
2 code implementations • EMNLP 2020 • Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene.
no code implementations • 26 Feb 2020 • Xin Ye, Yezhou Yang
Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods.
no code implementations • ECCV 2020 • Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr\'echet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation.
no code implementations • 21 Oct 2019 • Jiaying Lu, Xin Ye, Yi Ren, Yezhou Yang
Multiple-choice VQA has drawn increasing attention from researchers and end-users recently.
no code implementations • 25 Sep 2019 • Xin Ye, Shibin Zheng, Yezhou Yang
Despite significant progress in Robotic Object Search (ROS) over the recent years with deep reinforcement learning based approaches, the sparsity issue in reward setting as well as the lack of interpretability of the previous ROS approaches leave much to be desired.
no code implementations • 5 Sep 2019 • Mohammad Farhadi, Mehdi Ghasemi, Yezhou Yang
On the other hand, for a large chunk of recognition challenges, a system can classify images correctly using simple models or so-called shallow networks.
no code implementations • 24 Jun 2019 • Somak Aditya, Yezhou Yang, Chitta Baral
Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering.
no code implementations • 28 May 2019 • Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral
The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence.
no code implementations • 15 May 2019 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Xiaochun Zhao, Leandro Borba Moreira, Sirin Gandhi, Claudio Cavallo, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang
To improve the diagnostic quality of CLE, we used a micrograph of an H&E slide from a glioma tumor biopsy and image style transfer, a neural network method for integrating the content and style of two images.
no code implementations • 19 Apr 2019 • Varun Chandra Jammula, Anshul Rai, Yezhou Yang
To validate the efficiency of the framework, we conduct several experiments in simulation by using Gazebo and evaluate the success rate of tracking an evader in various environments with different pursuer to evader speed ratios.
1 code implementation • CVPR 2019 • Zhiyuan Fang, Shu Kong, Charless Fowlkes, Yezhou Yang
Computer Vision applications often require a textual grounding module with precision, interpretability, and resilience to counterfactual inputs/queries.
no code implementations • 4 Mar 2019 • Mohammad Farhadi, Yezhou Yang
Deep neural networks based methods have been proved to achieve outstanding performance on object detection and classification tasks.
no code implementations • 9 Feb 2019 • Houpu Yao, Malcolm Regan, Yezhou Yang, Yi Ren
We demonstrate in this paper that a generative model can be designed to perform classification tasks under challenging settings, including adversarial attacks and input distribution shifts.
no code implementations • 31 Jan 2019 • Houpu Yao, Zhe Wang, GuangYu Nie, Yassine Mazboudi, Yezhou Yang, Yi Ren
The vulnerability of neural networks under adversarial attacks has raised serious concerns and motivated extensive research.
no code implementations • 28 Jan 2019 • Yi Ren, Steven Elliott, Yiwei Wang, Yezhou Yang, Wenlong Zhang
While intelligence of autonomous vehicles (AVs) has significantly advanced in recent years, accidents involving AVs suggest that these autonomous systems lack gracefulness in driving when interacting with human drivers.
Robotics Computer Science and Game Theory
no code implementations • 10 Dec 2018 • Somak Aditya, Rudra Saha, Yezhou Yang, Chitta Baral
We propose a framework that combines recent advances in knowledge distillation (teacher-student framework), relational reasoning and probabilistic logical languages to incorporate such knowledge in existing neural networks for the task of Visual Question Answering.
no code implementations • 21 Sep 2018 • Xin Ye, Zhe Lin, Joon-Young Lee, Jianming Zhang, Shibin Zheng, Yezhou Yang
We study the problem of learning a generalizable action policy for an intelligent agent to actively approach an object of interest in an indoor environment solely from its visual inputs.
no code implementations • 30 Jul 2018 • Xin Ye, Zhe Lin, Haoxiang Li, Shibin Zheng, Yezhou Yang
We study the problem of learning a navigation policy for a robot to actively search for an object of interest in an indoor environment solely from its visual inputs.
no code implementations • 13 Jun 2018 • Zunlei Feng, Zhenyun Yu, Yezhou Yang, Yongcheng Jing, Junxiao Jiang, Mingli Song
In the supervised attributes module, multiple attributes labels are adopted to ensure that different parts of the overall embedding correspond to different attributes.
no code implementations • 1 May 2018 • Zhiyuan Fang, Shu Kong, Tianshu Yu, Yezhou Yang
Grounding textual phrases in visual content is a meaningful yet challenging problem with various potential applications such as image-text inference or text-driven multimedia interaction.
no code implementations • 26 Apr 2018 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Michael Mooney, Jennifer Eschbacher, Peter Nakaji, Yezhou Yang, Mark C. Preul
We present an overview and discuss deep learning models for automatic detection of the diagnostic CLE images and discuss various training regimes and ensemble modeling effect on the power of deep learning predictive models.
no code implementations • 25 Apr 2018 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Claudio Cavallo, Xiaochun Zhao, Sirin Gandhi, Leandro Borba Moreira, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang
To overcome this problem, we propose a Weakly-Supervised Learning (WSL)-based model for feature localization that trains on image-level annotations, and then localizes incidences of a class-of-interest in the test image.
no code implementations • CVPR 2018 • Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, Mingli Song
Most existing Zero-Shot Learning (ZSL) methods have the strong bias problem, in which instances of unseen (target) classes tend to be categorized as one of the seen (source) classes.
no code implementations • 23 Mar 2018 • Somak Aditya, Yezhou Yang, Chitta Baral
Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image.
1 code implementation • ECCV 2018 • Yongcheng Jing, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, DaCheng Tao, Mingli Song
In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control.
no code implementations • 29 Jan 2018 • Sihui Luo, Yezhou Yang, Mingli Song
The same practice also enable the compressed code to carry the image semantic information during storage and transmission.
no code implementations • 6 Jan 2018 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Nikolay Martirosyan, Jennifer Eschbacher, Peter Nakaji, Yezhou Yang, Mark C. Preul
Examining all the hundreds or thousands of images from a single case to discriminate diagnostic images from nondiagnostic ones can be tedious.
no code implementations • 14 Nov 2017 • Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song
As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts.
no code implementations • 10 Sep 2017 • Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Michael Mooney, Nikolay Martirosyan, Jennifer Eschbacher, Peter Nakaji, Mark C. Preul, Yezhou Yang
While manual examination of thousands of nondiagnostic images during surgery would be impractical, this creates an opportunity for a model to select diagnostic images for the pathologists or surgeon's review.
no code implementations • 2 Aug 2017 • Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos
We conclude this paper with the construction of a novel contractive neural network.
8 code implementations • 11 May 2017 • Yongcheng Jing, Yezhou Yang, Zunlei Feng, Jingwen Ye, Yizhou Yu, Mingli Song
We first propose a taxonomy of current algorithms in the field of NST.
no code implementations • 17 Nov 2016 • Somak Aditya, Yezhou Yang, Chitta Baral, Yiannis Aloimonos
We compile a dataset of over 3k riddles where each riddle consists of 4 images and a groundtruth answer.
no code implementations • 14 Nov 2016 • Went Luan, Yezhou Yang, Cornelia Fermuller, John S. Baras
In this work, we present a fast target detection framework for real-world robotics applications.
no code implementations • 3 Oct 2016 • Cornelia Fermüller, Fang Wang, Yezhou Yang, Konstantinos Zampogiannis, Yi Zhang, Francisco Barranco, Michael Pfeiffer
In psychophysical experiments, we evaluated human observers' skills in predicting actions from video sequences of different length, depicting the hand movement in the preparation and execution of actions before and after contact with the object.
no code implementations • 12 Sep 2016 • Wentao Luan, Yezhou Yang, Cornelia Fermuller, John Baras
We consider the problem of object recognition in 3D using an ensemble of attribute-based classifiers.
no code implementations • 12 Sep 2016 • Ren Mao, John S. Baras, Yezhou Yang, Cornelia Fermuller
It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with various constraints.
1 code implementation • 9 May 2016 • Chengxi Ye, Chen Zhao, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos
LightNet is a lightweight, versatile and purely Matlab-based deep learning framework.
no code implementations • 29 Jan 2016 • Chengxi Ye, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos
For robots that have the capability to interact with the physical environment through their end effectors, understanding the surrounding scenes is not merely a task of image classification or object recognition.
no code implementations • 10 Dec 2015 • Yezhou Yang, Yi Li, Cornelia Fermuller, Yiannis Aloimonos
In this paper we consider the problem of continuously discovering image contents by actively asking image based questions and subsequently answering the questions being asked.
no code implementations • IJCNLP 2015 • Yezhou Yang, Yiannis Aloimonos, Cornelia Fermuller, Eren Erdal Aksoy
In this paper we present a formal computational framework for modeling manipulation actions.
no code implementations • 10 Nov 2015 • Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, Yiannis Aloimonos
Specifically, commonsense reasoning is applied on (a) detections obtained from existing perception methods on given images, (b) a "commonsense" knowledge base constructed using natural language processing of image annotations and (c) lexical ontological knowledge from resources such as WordNet.
no code implementations • CVPR 2015 • Yezhou Yang, Cornelia Fermuller, Yi Li, Yiannis Aloimonos
The grasp type provides crucial information about human action.
no code implementations • CVPR 2013 • Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos
There is a small set of fundamental primitive action consequences that provides a systematic high-level classification of manipulation actions.