Search Results for author: Qi Wu

Found 84 papers, 29 papers with code

Memory Regulation and Alignment toward Generalizer RGB-Infrared Person

1 code implementation18 Sep 2021 Feng Chen, Fei Wu, Qi Wu, Zhiguo Wan

The domain shift, coming from unneglectable modality gap and non-overlapped identity classes between training and test sets, is a major issue of RGB-Infrared person re-identification.

Metric Learning Person Re-Identification

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

no code implementations13 Aug 2021 Markus Loecher, Qi Wu

For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores, leading to very similar rankings and interpretations.

Feature Importance

Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

1 code implementation5 Aug 2021 Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo

In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.

Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography

no code implementations20 Jul 2021 Olivia Byrnes, Wendy La, Hu Wang, Congbo Ma, Minhui Xue, Qi Wu

Data hiding is the process of embedding information into a noise-tolerant signal such as a piece of audio, video, or image.

Neighbor-view Enhanced Model for Vision and Language Navigation

1 code implementation15 Jul 2021 Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan

Specifically, our NvEM utilizes a subject module and a reference module to collect contexts from neighbor views.

Vision and Language Navigation

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

no code implementations CVPR 2021 Chaorui Deng, ShiZhe Chen, Da Chen, Yuan He, Qi Wu

The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.

Dense Video Captioning

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

1 code implementation CVPR 2021 Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction.

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

no code implementations5 May 2021 Wei Suo, Mengyang Sun, Peng Wang, Qi Wu

Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering.

Question Answering Referring Expression Comprehension +2

Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads

no code implementations30 Apr 2021 Chenyu Gao, Qi Zhu, Peng Wang, Qi Wu

Based on this observation, we design a dynamic chopping module that can automatically remove heads and layers of the VisualBERT at an instance level when dealing with different questions.

Question Answering Visual Question Answering +1

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

1 code implementation CVPR 2021 Guanghui Xu, Shuaicheng Niu, Mingkui Tan, Yucheng Luo, Qing Du, Qi Wu

This task, however, is very challenging because an image often contains complex texts and visual information that is hard to be described comprehensively.

Image Captioning

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

no code implementations9 Apr 2021 Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

Diagnosing Vision-and-Language Navigation: What Really Matters

no code implementations30 Mar 2021 Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang

Results show that indoor navigation agents refer to both object tokens and direction tokens in the instruction when making decisions.

Vision and Language Navigation

Jo-SRC: A Contrastive Approach for Combating Noisy Labels

1 code implementation CVPR 2021 Yazhou Yao, Zeren Sun, Chuanyi Zhang, Fumin Shen, Qi Wu, Jian Zhang, Zhenmin Tang

Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels usually results in inferior model performance.

Contrastive Learning

Higher-Order Orthogonal Causal Learning for Treatment Effect

no code implementations22 Mar 2021 Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu

Most existing studies on the double/debiased machine learning method concentrate on the causal parameter estimation recovering from the first-order orthogonal score function.

Learning for Visual Navigation by Imagining the Success

no code implementations28 Feb 2021 Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success, e. g. either a sub-goal state that is important to reach before the target, or the goal state itself.

Visual Navigation

Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline

no code implementations24 Jan 2021 Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li, Chunhua Shen

To address these issues, in this work we carefully design our settings and propose a new dataset including both synthetic and real traffic data in more complex scenarios.

How to Train Your Agent to Read and Write

1 code implementation4 Jan 2021 Li Liu, Mengge He, Guanghui Xu, Mingkui Tan, Qi Wu

Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice.

KG-to-Text Generation Knowledge Graphs

Semantics for Robotic Mapping, Perception and Interaction: A Survey

no code implementations2 Jan 2021 Sourav Garg, Niko Sünderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke, Michael Milford

In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning.

Autonomous Driving Human robot interaction

Memory-Gated Recurrent Networks

1 code implementation24 Dec 2020 Yaquan Zhang, Qi Wu, Nanbo Peng, Min Dai, Jing Zhang, Hu Wang

The essence of multivariate sequential learning is all about how to extract dependencies in data.

Time Series

The Causal Learning of Retail Delinquency

no code implementations17 Dec 2020 Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Nanbo Peng, Dongdong Wang, Zhixiang Huang

Classical estimators overlook the confounding effects and hence the estimation error can be magnificent.

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

1 code implementation9 Dec 2020 Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Texts appearing in daily scenes that can be recognized by OCR (Optical Character Recognition) tools contain significant information, such as street name, product brand and prices.

Image Captioning Optical Character Recognition +2

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

1 code implementation7 Dec 2020 Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens.

Image Captioning Optical Character Recognition

Generative Learning of Heterogeneous Tail Dependence

no code implementations26 Nov 2020 Xiangqian Sun, Xing Yan, Qi Wu

We propose a multivariate generative model to capture the complex dependence structure often encountered in business and financial data.

Modular Graph Attention Network for Complex Visual Relational Reasoning

no code implementations22 Nov 2020 Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, YaoWei Wang, Qi Wu

Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.

Graph Attention Question Answering +4

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

no code implementations22 Nov 2020 Weixia Zhang, Chao Ma, Qi Wu, Xiaokang Yang

We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference.

Imitation Learning Vision and Language Navigation

Language and Visual Entity Relationship Graph for Agent Navigation

1 code implementation NeurIPS 2020 Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould

From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.

Dynamic Time Warping Test unseen +1

Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning

no code implementations NeurIPS 2018 Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, Qi Wu

We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns.

Time Series

MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition with Multi-Domain Deep Learning Model

no code implementations20 Sep 2020 Ling Pei, Songpengcheng Xia, Lei Chu, Fanyi Xiao, Qi Wu, Wenxian Yu, Robert Qiu

Together with the rapid development of the Internet of Things (IoT), human activity recognition (HAR) using wearable Inertial Measurement Units (IMUs) becomes a promising technology for many research areas.

Activity Recognition Motion Capture +1

CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation

1 code implementation16 Sep 2020 Jing Yu, Yuan Chai, Yujing Wang, Yue Hu, Qi Wu

We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model.

 Ranked #1 on Scene Graph Generation on Visual Genome (mean Recall @20 metric)

Graph Generation Scene Graph Generation

Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze

1 code implementation15 Sep 2020 Jinquan Li, Ling Pei, Danping Zou, Songpengcheng Xia, Qi Wu, Tao Li, Zhen Sun, Wenxian Yu

This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM.

Simultaneous Localization and Mapping

Data-driven Meta-set Based Fine-Grained Visual Classification

1 code implementation6 Aug 2020 Chuanyi Zhang, Yazhou Yao, Xiangbo Shu, Zechao Li, Zhenmin Tang, Qi Wu

To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.

Classification Fine-Grained Image Classification +3

Object-and-Action Aware Model for Visual Language Navigation

no code implementations ECCV 2020 Yuankai Qi, Zizheng Pan, Shengping Zhang, Anton Van Den Hengel, Qi Wu

The first is object description (e. g., 'table', 'door'), each presenting as a tip for the agent to determine the next action by finding the item visible in the environment, and the second is action specification (e. g., 'go straight', 'turn left') which allows the robot to directly predict the next movements without relying on visual perceptions.

Vision and Language Navigation

Soft Expert Reward Learning for Vision-and-Language Navigation

no code implementations ECCV 2020 Hu Wang, Qi Wu, Chunhua Shen

In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.

Vision and Language Navigation

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

1 code implementation ECCV 2020 Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, Xiaokang Yang

However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure -- an $\langle image, question, answer\rangle$ triplet needs to be maintained correctly.

Adversarial Attack Data Augmentation +2

Length-Controllable Image Captioning

1 code implementation ECCV 2020 Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu

We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.

Image Captioning

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

1 code implementation7 Jul 2020 Xiaoze Jiang, Jing Yu, Yajing Sun, Zengchang Qin, Zihao Zhu, Yue Hu, Qi Wu

The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation.

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review

no code implementations16 Jun 2020 Joya Chen, Qi Wu, Dong Liu, Tong Xu

Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer vision.

Object Detection

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering

no code implementations16 Jun 2020 Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu

In this paper, we depict an image by a multi-modal heterogeneous graph, which contains multiple layers of information corresponding to the visual, semantic and factual features.

Question Answering Visual Question Answering

Structured Multimodal Attentions for TextVQA

no code implementations1 Jun 2020 Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton Van Den Hengel, Qi Wu

Most of the state-of-the-art (SoTA) VQA methods fail to answer these questions because of i) poor text reading ability; ii) lacking of text-visual reasoning capacity; and iii) adopting a discriminative answering mechanism instead of a generative one which is hard to cover both OCR tokens and general text tokens in the final answer.

Graph Attention Optical Character Recognition +3

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

no code implementations7 Apr 2020 Mahdi Kazemi Moghaddam, Qi Wu, Ehsan Abbasnejad, Javen Qinfeng Shi

Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate.

Visual Navigation

Sub-Instruction Aware Vision-and-Language Navigation

1 code implementation EMNLP 2020 Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.

Vision and Language Navigation

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

1 code implementation CVPR 2020 Shizhe Chen, Qin Jin, Peng Wang, Qi Wu

From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions according to the graph structure.

Image Captioning

Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

1 code implementation CVPR 2020 Qi Chen, Qi Wu, Rui Tang, Yu-Han Wang, Shuai Wang, Mingkui Tan

To this end, we propose a House Plan Generative Model (HPGM) that first translates the language input to a structural graph representation and then predicts the layout of rooms with a Graph Conditioned Layout Prediction Network (GC LPN) and generates the interior texture with a Language Conditioned Texture GAN (LCT-GAN).

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

2 code implementations CVPR 2020 Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu

To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.

Cross-Modal Retrieval Text Matching +1

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

1 code implementation17 Nov 2019 Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu

More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.

Feature Selection Question Answering +2

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

no code implementations15 Oct 2019 Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu

This notebook paper presents our model in the VATEX video captioning challenge.

Video Captioning

Neural Learning of Online Consumer Credit Risk

no code implementations5 Jun 2019 Di Wang, Qi Wu, Wen Zhang

This paper takes a deep learning approach to understand consumer credit risk when e-commerce platforms issue unsecured credit to finance customers' purchase.

Time Series

Understanding Distributional Ambiguity via Non-robust Chance Constraint

no code implementations3 Jun 2019 Qi Wu, Shumin Ma, Cheuk Hang Leung, Wei Liu, Nanbo Peng

Without the boundedness constraint, the CCO problem is shown to perform uniformly better than the DRO problem, irrespective of the radius of the ambiguity set, the choice of the divergence measure, or the tail heaviness of the center distribution.

Portfolio Optimization

Show, Price and Negotiate: A Negotiator with Online Value Look-Ahead

no code implementations7 May 2019 Amin Parvaneh, Ehsan Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton Van Den Hengel

Negotiation, as an essential and complicated aspect of online shopping, is still challenging for an intelligent agent.

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation CVPR 2020 Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Vision and Language Navigation

You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding

no code implementations12 Feb 2019 Chaorui Deng, Qi Wu, Guanghui Xu, Zhuliang Yu, Yanwu Xu, Kui Jia, Mingkui Tan

Most state-of-the-art methods in VG operate in a two-stage manner, wherein the first stage an object detector is adopted to generate a set of object proposals from the input image and the second stage is simply formulated as a cross-modal matching problem that finds the best match between the language query and all region proposals.

Object Detection Region Proposal +1

What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions

no code implementations CVPR 2019 Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output.

Information Seeking Visual Dialog

Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning

no code implementations CVPR 2020 Ehsan Abbasnejad, Iman Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

For each potential action a distribution of the expected outcomes is calculated, and the value of the potential information gain assessed.

Visual Dialog

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks

no code implementations CVPR 2019 Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton Van Den Hengel

Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches.

Graph Attention Referring Expression Comprehension

Deep Template Matching for Offline Handwritten Chinese Character Recognition

no code implementations15 Nov 2018 Zhiyuan Li, Min Jin, Qi Wu, Huaxiang Lu

Just like its remarkable achievements in many computer vision tasks, the convolutional neural networks (CNN) provide an end-to-end solution in handwritten Chinese character recognition (HCCR) with great success.

Offline Handwritten Chinese Character Recognition Template Matching

Goal-Oriented Visual Question Generation via Intermediate Rewards

no code implementations ECCV 2018 Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.

Question Generation

Connecting Language and Vision to Actions

no code implementations ACL 2018 Peter Anderson, Abhishek Das, Qi Wu

A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment.

Image Captioning Language Modelling +3

Topological Data Analysis Made Easy with the Topology ToolKit

no code implementations21 Jun 2018 Guillaume Favelier, Charles Gueunet, Attila Gyulassy, Julien Kitware, Joshua Levine, Jonas Lukasczyk, Daisuke Sakurai, Maxime Soler, Julien Tierny, Will Usher, Qi Wu

This tutorial presents topological methods for the analysis and visualization of scientific data from a user's perspective, with the Topology ToolKit (TTK), a recently released open-source library for topological data analysis.

Topological Data Analysis

Visual Grounding via Accumulated Attention

no code implementations CVPR 2018 Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan

There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object.

Visual Grounding

Learning Semantic Concepts and Order for Image and Sentence Matching

no code implementations CVPR 2018 Yan Huang, Qi Wu, Liang Wang

This mainly arises from that the representation of pixel-level image usually lacks of high-level semantic information as in its matched sentence.

Cross-Modal Retrieval

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

no code implementations21 Nov 2017 Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.

Question Generation

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

7 code implementations CVPR 2018 Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Vision and Language Navigation Visual Navigation +1

Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

no code implementations19 Nov 2017 Jun-Jie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu

These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags.

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

no code implementations CVPR 2018 Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.

Object Discovery Referring Expression Comprehension

Visual Question Answering with Memory-Augmented Networks

no code implementations CVPR 2018 Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton Van Den Hengel, Ian Reid

In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set.

Question Answering Visual Question Answering

Classification of Medical Images and Illustrations in the Biomedical Literature Using Synergic Deep Learning

no code implementations28 Jun 2017 Jianpeng Zhang, Yong Xia, Qi Wu, Yutong Xie

The Classification of medical images and illustrations in the literature aims to label a medical image according to the modality it was produced or label an illustration according to its production attributes.

Classification General Classification +1

Care about you: towards large-scale human-centric visual relationship detection

no code implementations28 May 2017 Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.

Human-Object Interaction Detection Visual Relationship Detection

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

no code implementations CVPR 2017 Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel

To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best.

Question Answering Visual Question Answering

Multi-Label Image Classification with Regional Latent Semantic Dependencies

no code implementations4 Dec 2016 Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu

Recent state-of-the-art approaches to multi-label image classification exploit the label dependencies in an image, at global level, largely improving the labeling capacity.

Classification General Classification +1

Visual Question Answering: A Survey of Methods and Datasets

1 code implementation20 Jul 2016 Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities.

Visual Question Answering

FVQA: Fact-based Visual Question Answering

no code implementations17 Jun 2016 Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick

We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.

Common Sense Reasoning Question Answering +1

Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

no code implementations CVPR 2016 Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel

Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach.

Question Answering Visual Question Answering

Explicit Knowledge-based Reasoning for Visual Question Answering

no code implementations9 Nov 2015 Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick

We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base.

Question Answering Visual Question Answering

What value do explicit high level concepts have in vision to language problems?

1 code implementation CVPR 2016 Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, Anton Van Den Hengel

Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Image Captioning Question Answering +1

The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs

1 code implementation1 May 2015 Hongping Cai, Qi Wu, Tadeo Corradi, Peter Hall

The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc.

Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.