no code implementations • 1 Feb 2023 • Anthony Manchin, Jamie Sherrah, Qi Wu, Anton Van Den Hengel
The ability to use inductive reasoning to extract general rules from multiple observations is a vital indicator of intelligence.
no code implementations • 8 Dec 2022 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang
Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.
no code implementations • 18 Nov 2022 • Yufan Liao, Qi Wu, Xing Yan
Invariant learning methods try to find an invariant predictor across several environments and have become popular in OOD generalization.
no code implementations • 7 Nov 2022 • Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li, Dan Zhu, Mengdi Sun, Ran Duan, Yan Gao, Lingshun Kong, Long Sun, Xiang Li, Xingdong Zhang, Jiawei Zhang, Yaqi Wu, Jinshan Pan, Gaocheng Yu, Jin Zhang, Feng Zhang, Zhe Ma, Hongbin Wang, Hojin Cho, Steve Kim, Huaen Li, Yanbo Ma, Ziwei Luo, Youwei Li, Lei Yu, Zhihong Wen, Qi Wu, Haoqiang Fan, Shuaicheng Liu, Lize Zhang, Zhikai Zong, Jeremy Kwon, Junxi Zhang, Mengyuan Li, Nianxiang Fu, Guanchen Ding, Han Zhu, Zhenzhong Chen, Gen Li, Yuanfan Zhang, Lei Sun, Dafeng Zhang, Neo Yang, Fitz Liu, Jerry Zhao, Mustafa Ayazoglu, Bahri Batuhan Bilecen, Shota Hirose, Kasidis Arunruangsirilert, Luo Ao, Ho Chun Leung, Andrew Wei, Jie Liu, Qiang Liu, Dahai Yu, Ao Li, Lei Luo, Ce Zhu, Seongmin Hong, Dongwon Park, Joonhee Lee, Byeong Hyun Lee, Seunggyu Lee, Se Young Chun, Ruiyuan He, Xuhao Jiang, Haihang Ruan, Xinjian Zhang, Jing Liu, Garas Gendy, Nabil Sabor, Jingchao Hou, Guanghui He
While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints.
1 code implementation • ECCV 2022 2022 • Wei Suo, Mengyang Sun, Kai Niu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu
Text-based person search aims to associate pedestrian images with natural language descriptions.
no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen
Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.
no code implementations • 20 Sep 2022 • David Bauer, Qi Wu, Kwan-Liu Ma
We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization.
1 code implementation • 17 Sep 2022 • Qi Chen, Chaorui Deng, Qi Wu
Our innovative idea is to explore the rich modes in the training caption corpus to learn a set of "mode embeddings", and further use them to control the mode of the generated captions for existing image captioning models.
no code implementations • 5 Sep 2022 • Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Shumin Ma, Zhiri Yuan, Dongdong Wang, Zhixiang Huang
Theoretically, the RCL estimators i) are as consistent and doubly robust as the DML estimators, and ii) can get rid of the error-compounding issue.
no code implementations • 5 Sep 2022 • Yiyan Huang, Cheuk Hang Leung, Shumin Ma, Qi Wu, Dongdong Wang, Zhixiang Huang
In this paper, we propose a moderately-balanced representation learning (MBRL) framework based on recent covariates balanced representation learning methods and orthogonal machine learning theory.
no code implementations • 28 Aug 2022 • Yutong Xie, Jianpeng Zhang, Yong Xia, Anton Van Den Hengel, Qi Wu
Besides, we further extend the clustering-guided attention from single-scale to multi-scale, which is conducive to dense prediction tasks.
2 code implementations • 24 Aug 2022 • Ziwei Luo, Youwei Li, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan, Shuaicheng Liu
The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI.
no code implementations • 23 Jul 2022 • Qi Wu, David Bauer, Michael J. Doyle, Kwan-Liu Ma
Neural networks have shown great potential in compressing volume data for visualization.
no code implementations • 22 Jun 2022 • Qi Wu, Yixiao Zhu, Hexun Jiang, Qunbi Zhuge, Weisheng Hu
For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the spectral efficiency for larger potential capacity.
no code implementations • 7 May 2022 • Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang
Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation.
1 code implementation • 18 Apr 2022 • Ziwei Luo, Youwei Li, Shen Cheng, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan, Jian Sun, Shuaicheng Liu
To overcome the challenges in BurstSR, we propose a Burst Super-Resolution Transformer (BSRT), which can significantly improve the capability of extracting inter-frame information and reconstruction.
Ranked #1 on
Burst Image Super-Resolution
on BurstSR
1 code implementation • ACL 2022 • Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang
A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks.
1 code implementation • CVPR 2022 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN).
1 code implementation • CVPR 2022 • Yang Ding, Jing Yu, Bang Liu, Yue Hu, Mingxin Cui, Qi Wu
Knowledge-based visual question answering requires the ability of associating external knowledge for open-ended cross-modal scene understanding.
1 code implementation • CVPR 2022 • Yicong Hong, Zun Wang, Qi Wu, Stephen Gould
To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.
1 code implementation • 16 Feb 2022 • Aaron Babier, Rafid Mahmood, Binghao Zhang, Victor G. L. Alves, Ana Maria Barragán-Montero, Joel Beaudry, Carlos E. Cardenas, Yankui Chang, Zijie Chen, Jaehee Chun, Kelly Diaz, Harold David Eraso, Erik Faustmann, Sibaji Gaj, Skylar Gay, Mary Gronberg, Bingqi Guo, Junjun He, Gerd Heilemann, Sanchit Hira, Yuliang Huang, Fuxin Ji, Dashan Jiang, Jean Carlo Jimenez Giraldo, Hoyeon Lee, Jun Lian, Shuolin Liu, Keng-Chi Liu, José Marrugo, Kentaro Miki, Kunio Nakamura, Tucker Netherton, Dan Nguyen, Hamidreza Nourzadeh, Alexander F. I. Osman, Zhao Peng, José Darío Quinto Muñoz, Christian Ramsl, Dong Joo Rhee, Juan David Rodriguez, Hongming Shan, Jeffrey V. Siebers, Mumtaz H. Soomro, Kay Sun, Andrés Usuga Hoyos, Carlos Valderrama, Rob Verbeek, Enpei Wang, Siri Willems, Qi Wu, Xuanang Xu, Sen yang, Lulin Yuan, Simeng Zhu, Lukas Zimmermann, Kevin L. Moore, Thomas G. Purdie, Andrea L. McNiven, Timothy C. Y. Chan
The dose predictions were input to four optimization models to form 76 unique KBP pipelines that generated 7600 plans.
1 code implementation • CVPR 2022 • Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu
Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions.
no code implementations • 19 Dec 2021 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hiroya Takamura, Qi Wu
We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i. e. number of frames.
1 code implementation • 17 Dec 2021 • Yutong Xie, Jianpeng Zhang, Yong Xia, Qi Wu
In this paper, we advocate bringing a wealth of 2D images like chest X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS.
1 code implementation • NeurIPS 2021 • Zhiquan Wen, Guanghui Xu, Mingkui Tan, Qingyao Wu, Qi Wu
From the sample perspective, we construct two types of negative samples to assist the training of the models, without introducing additional annotations.
1 code implementation • NeurIPS 2021 • Keji He, Yan Huang, Qi Wu, Jianhua Yang, Dong An, Shuanglin Sima, Liang Wang
In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions.
no code implementations • CVPR 2022 • Qi Chen, Yuanqing Li, Yuankai Qi, Jiaqiu Zhou, Mingkui Tan, Qi Wu
Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a speech with desired voice specified by a reference audio.
no code implementations • 19 Nov 2021 • Zhihong Lin, Donghao Zhang, Qingyi Tac, Danli Shi, Gholamreza Haffari, Qi Wu, Mingguang He, ZongYuan Ge
Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges.
no code implementations • 28 Oct 2021 • Siyi Wang, Xing Yan, Bangqi Zheng, Hu Wang, Wangli Xu, Nanbo Peng, Qi Wu
We design a system for risk-analyzing and pricing portfolios of non-performing consumer credit loans.
1 code implementation • 18 Sep 2021 • Feng Chen, Fei Wu, Qi Wu, Zhiguo Wan
The domain shift, coming from unneglectable modality gap and non-overlapped identity classes between training and test sets, is a major issue of RGB-Infrared person re-identification.
no code implementations • 13 Aug 2021 • Markus Loecher, Qi Wu
For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores, leading to very similar rankings and interpretations.
1 code implementation • 5 Aug 2021 • Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo
In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.
no code implementations • 20 Jul 2021 • Olivia Byrnes, Wendy La, Hu Wang, Congbo Ma, Minhui Xue, Qi Wu
Data hiding is the process of embedding information into a noise-tolerant signal such as a piece of audio, video, or image.
1 code implementation • 15 Jul 2021 • Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan
Specifically, our NvEM utilizes a subject module and a reference module to collect contexts from neighbor views.
Ranked #78 on
Vision and Language Navigation
on VLN Challenge
1 code implementation • CVPR 2021 • Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu
The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction.
no code implementations • CVPR 2021 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
no code implementations • CVPR 2021 • Chaorui Deng, ShiZhe Chen, Da Chen, Yuan He, Qi Wu
The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.
no code implementations • 5 May 2021 • Wei Suo, Mengyang Sun, Peng Wang, Qi Wu
Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering.
no code implementations • 30 Apr 2021 • Chenyu Gao, Qi Zhu, Peng Wang, Qi Wu
Based on this observation, we design a dynamic chopping module that can automatically remove heads and layers of the VisualBERT at an instance level when dealing with different questions.
1 code implementation • CVPR 2021 • Guanghui Xu, Shuaicheng Niu, Mingkui Tan, Yucheng Luo, Qing Du, Qi Wu
This task, however, is very challenging because an image often contains complex texts and visual information that is hard to be described comprehensively.
1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu
Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.
1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang
Results show that indoor navigation agents refer to both object and direction tokens when making decisions.
1 code implementation • CVPR 2021 • Yazhou Yao, Tao Chen, GuoSen Xie, Chuanyi Zhang, Fumin Shen, Qi Wu, Zhenmin Tang, Jian Zhang
To further mine the non-salient region objects, we propose to exert the segmentation network's self-correction ability.
Weakly supervised Semantic Segmentation
Weakly-Supervised Semantic Segmentation
no code implementations • CVPR 2021 • Yazhou Yao, Zeren Sun, Chuanyi Zhang, Fumin Shen, Qi Wu, Jian Zhang, Zhenmin Tang
Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels usually results in inferior model performance.
no code implementations • 22 Mar 2021 • Yiyan Huang, Cheuk Hang Leung, Qi Wu, Xing Yan
Theoretically, the RCL estimators i) satisfy the (higher-order) orthogonal condition and are as \textit{consistent and doubly robust} as the DML estimators, and ii) get rid of the error-compounding issue.
no code implementations • 28 Feb 2021 • Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel
ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success, e. g. either a sub-goal state that is important to reach before the target, or the goal state itself.
no code implementations • 24 Jan 2021 • Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li, Chunhua Shen
To address these issues, in this work we carefully design our settings and propose a new dataset including both synthetic and real traffic data in more complex scenarios.
1 code implementation • 4 Jan 2021 • Li Liu, Mengge He, Guanghui Xu, Mingkui Tan, Qi Wu
Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice.
Ranked #3 on
KG-to-Text Generation
on AGENDA
no code implementations • 2 Jan 2021 • Sourav Garg, Niko Sünderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke, Michael Milford
In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning.
1 code implementation • 24 Dec 2020 • Yaquan Zhang, Qi Wu, Nanbo Peng, Min Dai, Jing Zhang, Hu Wang
The essence of multivariate sequential learning is all about how to extract dependencies in data.
no code implementations • 17 Dec 2020 • Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Nanbo Peng, Dongdong Wang, Zhixiang Huang
Classical estimators overlook the confounding effects and hence the estimation error can be magnificent.
no code implementations • SEMEVAL 2020 • Qi Wu, Peng Wang, Chenghao Huang
Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis.
1 code implementation • 9 Dec 2020 • Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu
Texts appearing in daily scenes that can be recognized by OCR (Optical Character Recognition) tools contain significant information, such as street name, product brand and prices.
1 code implementation • 7 Dec 2020 • Zhaokai Wang, Renda Bao, Qi Wu, Si Liu
Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens.
1 code implementation • 26 Nov 2020 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
no code implementations • 26 Nov 2020 • Xiangqian Sun, Xing Yan, Qi Wu
We propose a multivariate generative model to capture the complex dependence structure often encountered in business and financial data.
no code implementations • 22 Nov 2020 • Weixia Zhang, Chao Ma, Qi Wu, Xiaokang Yang
We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference.
no code implementations • 22 Nov 2020 • Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, YaoWei Wang, Qi Wu
Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.
Ranked #2 on
Referring Expression Comprehension
on CLEVR-Ref+
1 code implementation • NeurIPS 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould
From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.
no code implementations • NeurIPS 2018 • Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, Qi Wu
We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns.
no code implementations • 20 Sep 2020 • Ling Pei, Songpengcheng Xia, Lei Chu, Fanyi Xiao, Qi Wu, Wenxian Yu, Robert Qiu
Together with the rapid development of the Internet of Things (IoT), human activity recognition (HAR) using wearable Inertial Measurement Units (IMUs) becomes a promising technology for many research areas.
1 code implementation • 16 Sep 2020 • Jing Yu, Yuan Chai, Yujing Wang, Yue Hu, Qi Wu
We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model.
Ranked #2 on
Scene Graph Generation
on Visual Genome
(mean Recall @20 metric)
1 code implementation • 15 Sep 2020 • Jinquan Li, Ling Pei, Danping Zou, Songpengcheng Xia, Qi Wu, Tao Li, Zhen Sun, Wenxian Yu
This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM.
1 code implementation • 6 Aug 2020 • Chuanyi Zhang, Yazhou Yao, Xiangbo Shu, Zechao Li, Zhenmin Tang, Qi Wu
To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
no code implementations • ECCV 2020 • Yuankai Qi, Zizheng Pan, Shengping Zhang, Anton Van Den Hengel, Qi Wu
The first is object description (e. g., 'table', 'door'), each presenting as a tip for the agent to determine the next action by finding the item visible in the environment, and the second is action specification (e. g., 'go straight', 'turn left') which allows the robot to directly predict the next movements without relying on visual perceptions.
no code implementations • ECCV 2020 • Hu Wang, Qi Wu, Chunhua Shen
In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
1 code implementation • ECCV 2020 • Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu
We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.
1 code implementation • ECCV 2020 • Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, Xiaokang Yang
However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure -- an $\langle image, question, answer\rangle$ triplet needs to be maintained correctly.
no code implementations • 19 Jul 2020 • Yanyuan Qiao, Chaorui Deng, Qi Wu
In this survey, we first examine the state of the art by comparing modern approaches to the problem.
1 code implementation • 7 Jul 2020 • Xiaoze Jiang, Jing Yu, Yajing Sun, Zengchang Qin, Zihao Zhu, Yue Hu, Qi Wu
The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation.
no code implementations • 16 Jun 2020 • Joya Chen, Qi Wu, Dong Liu, Tong Xu
Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer vision.
no code implementations • 16 Jun 2020 • Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu
In this paper, we depict an image by a multi-modal heterogeneous graph, which contains multiple layers of information corresponding to the visual, semantic and factual features.
no code implementations • 2 Jun 2020 • Peng Wang, Dongyang Liu, Hui Li, Qi Wu
In this case, we need to use commonsense knowledge to identify the objects in the image.
2 code implementations • 1 Jun 2020 • Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton Van Den Hengel, Qi Wu
In this paper, we propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.
no code implementations • 7 Apr 2020 • Mahdi Kazemi Moghaddam, Qi Wu, Ehsan Abbasnejad, Javen Qinfeng Shi
Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate.
1 code implementation • EMNLP 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould
Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.
1 code implementation • CVPR 2020 • Qi Chen, Qi Wu, Rui Tang, Yu-Han Wang, Shuai Wang, Mingkui Tan
To this end, we propose a House Plan Generative Model (HPGM) that first translates the language input to a structural graph representation and then predicts the layout of rooms with a Graph Conditioned Layout Prediction Network (GC LPN) and generates the interior texture with a Language Conditioned Texture GAN (LCT-GAN).
4 code implementations • CVPR 2020 • Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.
1 code implementation • CVPR 2020 • Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions according to the graph structure.
no code implementations • CVPR 2020 • Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, Qi Wu
To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features.
1 code implementation • 17 Nov 2019 • Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu
More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.
Ranked #6 on
Visual Dialog
on VisDial v0.9 val
1 code implementation • Geoscientific Model Development 2019 • Xiaomeng Huang, Xing Huang, Dong Wang, Qi Wu, Yi Li, Shixun Zhang, YuWen Chen, Mingqing Wang, Yuan Gao, Qiang Tang, Yue Chen, Zheng Fang, Zhenya Song, Guangwen Yang
In this work, we design a simple computing library to bridge the gap and decouple the work of ocean modeling from parallel computing.
no code implementations • 15 Oct 2019 • Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu
This notebook paper presents our model in the VATEX video captioning challenge.
no code implementations • 21 Jun 2019 • Joshua Zoen Git Hiew, Xin Huang, Hao Mou, Duan Li, Qi Wu, Yabo Xu
On the other hand, by combining with the other two commonly-used methods when it comes to building the sentiment index in the financial literature, i. e., the option-implied and the market-implied approaches, we propose a more general and comprehensive framework for the financial sentiment analysis, and further provide convincing outcomes for the predictability of individual stock return by combining LSTM (with a feature of a nonlinear mapping).
no code implementations • 5 Jun 2019 • Di Wang, Qi Wu, Wen Zhang
This paper takes a deep learning approach to understand consumer credit risk when e-commerce platforms issue unsecured credit to finance customers' purchase.
no code implementations • 3 Jun 2019 • Qi Wu, Shumin Ma, Cheuk Hang Leung, Wei Liu, Nanbo Peng
Without the boundedness constraint, the CCO problem is shown to perform uniformly better than the DRO problem, irrespective of the radius of the ambiguity set, the choice of the divergence measure, or the tail heaviness of the center distribution.
no code implementations • 7 May 2019 • Amin Parvaneh, Ehsan Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton Van Den Hengel
Negotiation, as an essential and complicated aspect of online shopping, is still challenging for an intelligent agent.
1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel
One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.
no code implementations • 12 Feb 2019 • Chaorui Deng, Qi Wu, Guanghui Xu, Zhuliang Yu, Yanwu Xu, Kui Jia, Mingkui Tan
Most state-of-the-art methods in VG operate in a two-stage manner, wherein the first stage an object detector is adopted to generate a set of object proposals from the input image and the second stage is simply formulated as a cross-modal matching problem that finds the best match between the language query and all region proposals.
no code implementations • CVPR 2019 • Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel
We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output.
no code implementations • CVPR 2020 • Ehsan Abbasnejad, Iman Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel
For each potential action a distribution of the expected outcomes is calculated, and the value of the potential information gain assessed.
no code implementations • CVPR 2019 • Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton Van Den Hengel
Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches.
no code implementations • 15 Nov 2018 • Zhiyuan Li, Min Jin, Qi Wu, Huaxiang Lu
Just like its remarkable achievements in many computer vision tasks, the convolutional neural networks (CNN) provide an end-to-end solution in handwritten Chinese character recognition (HCCR) with great success.
Offline Handwritten Chinese Character Recognition
Template Matching
no code implementations • ECCV 2018 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel
Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.
no code implementations • ACL 2018 • Peter Anderson, Abhishek Das, Qi Wu
A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment.
no code implementations • 21 Jun 2018 • Guillaume Favelier, Charles Gueunet, Attila Gyulassy, Julien Kitware, Joshua Levine, Jonas Lukasczyk, Daisuke Sakurai, Maxime Soler, Julien Tierny, Will Usher, Qi Wu
This tutorial presents topological methods for the analysis and visualization of scientific data from a user's perspective, with the Topology ToolKit (TTK), a recently released open-source library for topological data analysis.
no code implementations • CVPR 2018 • Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan
There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object.
no code implementations • CVPR 2018 • Yan Huang, Qi Wu, Liang Wang
This mainly arises from that the representation of pixel-level image usually lacks of high-level semantic information as in its matched sentence.
Ranked #10 on
Image Retrieval
on Flickr30K 1K test
no code implementations • CVPR 2018 • Qi Wu, Peng Wang, Chunhua Shen, Ian Reid, Anton Van Den Hengel
The Visual Dialogue task requires an agent to engage in a conversation about an image with a human.
Ranked #4 on
Visual Dialog
on VisDial v0.9 val
no code implementations • 21 Nov 2017 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel
Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.
7 code implementations • CVPR 2018 • Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel
This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.
Ranked #3 on
Visual Navigation
on R2R
no code implementations • 19 Nov 2017 • Jun-Jie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu
These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags.
no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.
no code implementations • CVPR 2018 • Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton Van Den Hengel, Ian Reid
In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set.
no code implementations • 28 Jun 2017 • Jianpeng Zhang, Yong Xia, Qi Wu, Yutong Xie
The Classification of medical images and illustrations in the literature aims to label a medical image according to the modality it was produced or label an illustration according to its production attributes.
no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.
Human-Object Interaction Detection
Visual Relationship Detection
no code implementations • CVPR 2017 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel
To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best.
no code implementations • 4 Dec 2016 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu
Recent state-of-the-art approaches to multi-label image classification exploit the label dependencies in an image, at global level, largely improving the labeling capacity.
1 code implementation • 20 Jul 2016 • Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel
Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities.
no code implementations • 17 Jun 2016 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick
We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.
no code implementations • 9 Mar 2016 • Qi Wu, Chunhua Shen, Anton Van Den Hengel, Peng Wang, Anthony Dick
Much recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
no code implementations • CVPR 2016 • Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel
Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach.
no code implementations • 9 Nov 2015 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick
We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base.
1 code implementation • CVPR 2016 • Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, Anton Van Den Hengel
Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).
1 code implementation • 1 May 2015 • Hongping Cai, Qi Wu, Tadeo Corradi, Peter Hall
The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc.