Search Results for author: Qi Wu

Found 161 papers, 74 papers with code

G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images

1 code implementation • 11 Apr 2024 • Zixiong Huang, Qi Chen, Libo Sun, Yifan Yang, Naizhou Wang, Mingkui Tan, Qi Wu

Novel view synthesis aims to generate new view images of a given view image collection.

Paper
Code

PairAug: What Can Augmented Image-Text Pairs Do for Radiology?

2 code implementations • 7 Apr 2024 • Yutong Xie, Qi Chen, Sinuo Wang, Minh-Son To, Iris Lee, Ee Win Khoo, Kerolos Hendy, Daniel Koh, Yong Xia, Qi Wu

Acknowledging this limitation, our objective is to devise a framework capable of concurrently augmenting medical image and text data.

Image Classification Language Modelling +3

107

Paper
Code

VL-Mamba: Exploring State Space Models for Multimodal Learning

no code implementations • 20 Mar 2024 • Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, Jing Liu

The extensive experiments on diverse multimodal benchmarks with competitive performance show the effectiveness of our proposed VL-Mamba and demonstrate the great potential of applying state space models for multimodal learning tasks.

Ranked #61 on Visual Question Answering on MM-Vet

Language Modelling Large Language Model +1

Paper
Add Code

Thermal-NeRF: Neural Radiance Fields from an Infrared Camera

no code implementations • 15 Mar 2024 • Tianxiang Ye, Qi Wu, Junyuan Deng, Guoqing Liu, Liu Liu, Songpengcheng Xia, Liang Pang, Wenxian Yu, Ling Pei

In recent years, Neural Radiance Fields (NeRFs) have demonstrated significant potential in encoding highly-detailed 3D geometry and environmental appearance, positioning themselves as a promising alternative to traditional explicit representation for 3D scene reconstruction.

3D Scene Reconstruction

Paper
Add Code

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

1 code implementation • 12 Mar 2024 • Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, BoWen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans

Medical vision language pre-training (VLP) has emerged as a frontier of research, enabling zero-shot pathological recognition by comparing the query image with the textual descriptions for each disease.

Language Modelling Large Language Model

Paper
Code

Unveiling the Potential of Robustness in Evaluating Causal Inference Models

no code implementations • 28 Feb 2024 • Yiyan Huang, Cheuk Hang Leung, Siyi Wang, Yijun Li, Qi Wu

The growing demand for personalized decision-making has led to a surge of interest in estimating the Conditional Average Treatment Effect (CATE).

Causal Inference counterfactual +2

Paper
Add Code

Explicit Interaction for Fusion-Based Place Recognition

1 code implementation • 27 Feb 2024 • Jingyi Xu, Junyi Ma, Qi Wu, Zijie Zhou, Yue Wang, Xieyuanli Chen, Ling Pei

Fusion-based place recognition is an emerging technique jointly utilizing multi-modal perception data, to recognize previously visited places in GPS-denied scenarios for robots and autonomous vehicles.

Autonomous Vehicles

Paper
Code

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

no code implementations • 24 Feb 2024 • Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang

Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions.

Decision Making Instruction Following +3

Paper
Add Code

ModaVerse: Efficiently Transforming Modalities with LLMs

1 code implementation • 12 Jan 2024 • Xinyu Wang, Bohan Zhuang, Qi Wu

This alignment process, which synchronizes a language model trained on textual data with encoders and decoders trained on multi-modal data, often necessitates extensive training of several projection layers in multiple stages.

Language Modelling Large Language Model

Paper
Code

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

no code implementations • 2 Jan 2024 • Xixu Hu, Runkai Zheng, Jindong Wang, Cheuk Hang Leung, Qi Wu, Xing Xie

In this study, we address this gap by introducing SpecFormer, specifically designed to enhance ViTs' resilience against adversarial attacks, with support from carefully derived theoretical guarantees.

Computational Efficiency

Paper
Add Code

WebVLN: Vision-and-Language Navigation on Websites

1 code implementation • 25 Dec 2023 • Qi Chen, Dileepa Pitawela, Chongyang Zhao, Gengze Zhou, Hsiang-Ting Chen, Qi Wu

Vision-and-Language Navigation (VLN) task aims to enable AI agents to accurately understand and follow natural language instructions to navigate through real-world environments, ultimately reaching specific target locations.

Navigate Vision and Language Navigation

Paper
Code

Subject-Oriented Video Captioning

no code implementations • 20 Dec 2023 • Yunchuan Ma, Chang Teng, Yuankai Qi, Guorong Li, Laiyu Qing, Qi Wu, Qingming Huang

To address this problem, we propose a new video captioning task, subject-oriented video captioning, which allows users to specify the describing target via a bounding box.

Video Captioning

Paper
Add Code

MMBaT: A Multi-task Framework for mmWave-based Human Body Reconstruction and Translation Prediction

no code implementations • 16 Dec 2023 • Jiarui Yang, Songpengcheng Xia, YiFan Song, Qi Wu, Ling Pei

Human body reconstruction with Millimeter Wave (mmWave) radar point clouds has gained significant interest due to its ability to work in adverse environments and its capacity to mitigate privacy concerns associated with traditional camera-based solutions.

Paper
Add Code

The Causal Impact of Credit Lines on Spending Distributions

1 code implementation • 16 Dec 2023 • Yijun Li, Cheuk Hang Leung, Xiangqian Sun, Chaoqun Wang, Yiyan Huang, Xing Yan, Qi Wu, Dongdong Wang, Zhixiang Huang

Consumer credit services offered by e-commerce platforms provide customers with convenient loan access during shopping and have the potential to stimulate sales.

Paper
Code

Invariant Random Forest: Tree-Based Model Solution for OOD Generalization

no code implementations • 7 Dec 2023 • Yufan Liao, Qi Wu, Xing Yan

This paper introduces a novel and effective solution for OOD generalization of decision tree models, named Invariant Decision Tree (IDT).

Paper
Add Code

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

1 code implementation • 2 Dec 2023 • Yu Zhang, Songpengcheng Xia, Lei Chu, Jiarui Yang, Qi Wu, Ling Pei

This paper introduces a novel human pose estimation approach using sparse inertial sensors, addressing the shortcomings of previous methods reliant on synthetic data.

Pose Estimation

Paper
Code

Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning

1 code implementation • 29 Nov 2023 • Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Qi Wu, Yong Xia

In this paper, we reconsider versatile self-supervised learning from the perspective of continual learning and propose MedCoSS, a continuous self-supervised learning approach for multi-modal medical data.

Continual Learning Representation Learning +1

Paper
Code

Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

1 code implementation • 10 Nov 2023 • Yuanmin Tang, Jing Yu, Keke Gai, Xiangyan Qu, Yue Hu, Gang Xiong, Qi Wu

Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks.

Model extraction

Paper
Code

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition

no code implementations • 30 Oct 2023 • Xiangyu Shi, Yanyuan Qiao, Qi Wu, Lingqiao Liu, Feras Dayoub

Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments.

Object object-detection +2

Paper
Add Code

Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

1 code implementation • 28 Sep 2023 • Yuanmin Tang, Jing Yu, Keke Gai, Yujing Wang, Yue Hu, Gang Xiong, Qi Wu

Conventional research mainly studies from the view of modeling the implicit correlations between images and texts for query-ads matching, ignoring the alignment of detailed product information and resulting in suboptimal search performance. In this work, we propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text, which leverages the co-occurrence structure consistency between vision and language spaces without requiring expensive labeled training data.

Ranked #1 on Image-text matching on CommercialAdsDataset

Image-text matching Natural Language Queries

Paper
Code

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

1 code implementation • 28 Sep 2023 • Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Yue Hu, Qi Wu

Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute.

Ranked #1 on Zero-Shot Composed Image Retrieval (ZS-CIR) on MS COCO

Attribute Image Retrieval +4

Paper
Code

SwitchGPT: Adapting Large Language Models for Non-Text Outputs

no code implementations • 14 Sep 2023 • Xinyu Wang, Bohan Zhuang, Qi Wu

To bridge this gap, we propose a novel approach, \methodname, from a modality conversion perspective that evolves a text-based LLM into a multi-modal one.

Paper
Add Code

S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning

no code implementations • CVPR 2023 • Wei Suo, Mengyang Sun, Weisong Liu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu

VQA Natural Language Explanation (VQA-NLE) task aims to explain the decision-making process of VQA models in natural language.

Decision Making Visual Question Answering (VQA)

Paper
Add Code

DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture Instantaneous and Long-term Effects in Time Series

no code implementations • 26 Aug 2023 • Chaoqun Wang, Yijun Li, Xiangqian Sun, Qi Wu, Dongdong Wang, Zhixiang Huang

The tensorized LSTM assigns each variable with a unique hidden state making up a matrix $\mathbf{h}_t$, and the standard LSTM models all the variables with a shared hidden state $\mathbf{H}_t$.

Time Series Time Series Forecasting

Paper
Add Code

BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset

1 code implementation • 22 Aug 2023 • Biao Wu, Yutong Xie, Zeyu Zhang, Jinchao Ge, Kaspar Yaxley, Suzan Bahadir, Qi Wu, Yifan Liu, Minh-Son To

Intracranial hemorrhage (ICH) is a pathological condition characterized by bleeding inside the skull or brain, which can be attributed to various factors.

Image Segmentation Medical Image Segmentation +2

Paper
Code

March in Chat: Interactive Prompting for Remote Embodied Referring Expression

1 code implementation • ICCV 2023 • Yanyuan Qiao, Yuankai Qi, Zheng Yu, Jing Liu, Qi Wu

Nevertheless, this poses more challenges than other VLN tasks since it requires agents to infer a navigation plan only based on a short instruction.

Referring Expression Vision and Language Navigation

Paper
Code

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

1 code implementation • ICCV 2023 • Yanyuan Qiao, Zheng Yu, Qi Wu

The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models.

Ranked #2 on Visual Navigation on Cooperative Vision-and-Dialogue Navigation

Transfer Learning Vision and Language Navigation +1

Paper
Code

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

1 code implementation • 16 Aug 2023 • Qi Chen, Chaorui Deng, Zixiong Huang, BoWen Zhang, Mingkui Tan, Qi Wu

In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i. e., a higher likelihood indicates better perceptual quality and better text-image alignment.

Text-to-Image Generation

Paper
Code

Identity-Consistent Aggregation for Video Object Detection

1 code implementation • ICCV 2023 • Chaorui Deng, Da Chen, Qi Wu

In Video Object Detection (VID), a common practice is to leverage the rich temporal contexts from the video to enhance the object representations in each frame.

Ranked #1 on Video Object Detection on ImageNet VID (MAP metric)

Object object-detection +1

Paper
Code

Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation

1 code implementation • 15 Aug 2023 • Qi Wu, Yuyao Zhang, Marawan Elbatel

Recent advancements in large foundation models have shown promising potential in the medical industry due to their flexible prompting capability.

Image Segmentation Medical Image Segmentation +2

Paper
Code

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

1 code implementation • ICCV 2023 • Chaorui Deng, Qi Chen, Pengda Qin, Da Chen, Qi Wu

In text-video retrieval, recent works have benefited from the powerful learning capabilities of pre-trained text-image foundation models (e. g., CLIP) by adapting them to the video domain.

Retrieval Video Captioning +1

Paper
Code

AerialVLN: Vision-and-Language Navigation for UAVs

1 code implementation • ICCV 2023 • Shubo Liu, Hongsheng Zhang, Yuankai Qi, Peng Wang, Yaning Zhang, Qi Wu

Navigating in the sky is more complicated than on the ground because agents need to consider the flying height and more complex spatial relationship reasoning.

Navigate Vision and Language Navigation

Paper
Code

Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes

no code implementations • 7 Aug 2023 • Chongyang Zhao, Yuankai Qi, Qi Wu

Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction.

Navigate Vision and Language Navigation

Paper
Add Code

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

135

Paper
Code

Probabilistic Learning of Multivariate Time Series with Temporal Irregularity

1 code implementation • 15 Jun 2023 • Yijun Li, Cheuk Hang Leung, Qi Wu

Multivariate sequential data collected in practice often exhibit temporal irregularities, including nonuniform time intervals and component misalignment.

Imputation Time Series

Paper
Code

Deep into The Domain Shift: Transfer Learning through Dependence Regularization

1 code implementation • 31 May 2023 • Shumin Ma, Zhiri Yuan, Qi Wu, Yiyan Huang, Xixu Hu, Cheuk Hang Leung, Dongdong Wang, Zhixiang Huang

This paper proposes a new domain adaptation approach in which one can measure the differences in the internal dependence structure separately from those in the marginals.

Domain Adaptation Transfer Learning

Paper
Code

Attention Mechanisms in Medical Image Segmentation: A Survey

no code implementations • 29 May 2023 • Yutong Xie, Bing Yang, Qingbiao Guan, Jianpeng Zhang, Qi Wu, Yong Xia

This paper systematically reviews the basic principles of attention mechanisms and their applications in medical image segmentation.

Image Segmentation Medical Image Segmentation +3

Paper
Add Code

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

1 code implementation • 26 May 2023 • Gengze Zhou, Yicong Hong, Qi Wu

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling.

Instruction Following Vision and Language Navigation +1

Paper
Code

S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts

1 code implementation • 26 May 2023 • Qi Chen, Yutong Xie, Biao Wu, Minh-Son To, James Ang, Qi Wu

In this paper, we seek to design a report generation model that is able to generate reasonable reports even given different images of various body parts.

Paper
Code

Realistic Noise Synthesis with Diffusion Models

no code implementations • 23 May 2023 • Qi Wu, Mingyan Han, Ting Jiang, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

Deep image denoising models often rely on large amount of training data for the high quality performance.

Image Denoising Noise Estimation

Paper
Add Code

Photon Field Networks for Dynamic Real-Time Volumetric Global Illumination

no code implementations • 14 Apr 2023 • David Bauer, Qi Wu, Kwan-Liu Ma

In this paper, we present a novel method to enable real-time global illumination for volume data visualization.

Data Visualization

Paper
Add Code

DIPNet: Efficiency Distillation and Iterative Pruning for Image Super-Resolution

no code implementations • 14 Apr 2023 • Lei Yu, Xinpeng Li, Youwei Li, Ting Jiang, Qi Wu, Haoqiang Fan, Shuaicheng Liu

To address this issue, we propose a novel multi-stage lightweight network boosting method, which can enable lightweight networks to achieve outstanding performance.

Image Super-Resolution Network Pruning

Paper
Add Code

HyperINR: A Fast and Predictive Hypernetwork for Implicit Neural Representations via Knowledge Distillation

no code implementations • 9 Apr 2023 • Qi Wu, David Bauer, Yuyang Chen, Kwan-Liu Ma

Implicit Neural Representations (INRs) have recently exhibited immense potential in the field of scientific visualization for both data generation and visualization tasks.

Knowledge Distillation Novel View Synthesis +1

Paper
Add Code

Distributed Neural Representation for Reactive in situ Visualization

no code implementations • 28 Mar 2023 • Qi Wu, Joseph A. Insley, Victor A. Mateevitsi, Silvio Rizzi, Michael E. Papka, Kwan-Liu Ma

In this work, we develop an implicit neural representation for distributed volume data and incorporate it into the DIVA reactive programming system.

Paper
Add Code

Program Generation from Diverse Video Demonstrations

no code implementations • 1 Feb 2023 • Anthony Manchin, Jamie Sherrah, Qi Wu, Anton Van Den Hengel

The ability to use inductive reasoning to extract general rules from multiple observations is a vital indicator of intelligence.

Paper
Add Code

Dynamic CVaR Portfolio Construction with Attention-Powered Generative Factor Learning

no code implementations • 18 Jan 2023 • Chuting Sun, Qi Wu, Xing Yan

The dynamic portfolio construction problem requires dynamic modeling of the joint distribution of multivariate stock returns.

Portfolio Optimization

Paper
Add Code

ShapeScaffolder: Structure-Aware 3D Shape Generation from Text

no code implementations • ICCV 2023 • Xi Tian, Yong-Liang Yang, Qi Wu

However, humans tend to understand both shape and text as being structure-based.

3D Shape Generation Text Matching

Paper
Add Code

Learning to Dub Movies via Hierarchical Prosody Models

1 code implementation • CVPR 2023 • Gaoxiang Cong, Liang Li, Yuankai Qi, ZhengJun Zha, Qi Wu, Wenyu Wang, Bin Jiang, Ming-Hsuan Yang, Qingming Huang

Given a piece of text, a video clip and a reference audio, the movie dubbing (also known as visual voice clone V2C) task aims to generate speeches that match the speaker's emotion presented in the video using the desired speaker voice as reference.

Paper
Code

Decorr: Environment Partitioning for Invariant Learning and OOD Generalization

no code implementations • 18 Nov 2022 • Yufan Liao, Qi Wu, Xing Yan

Invariant learning methods try to find an invariant predictor across several environments and have become popular in OOD generalization.

Paper
Add Code

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

2 code implementations • 7 Nov 2022 • Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li, Dan Zhu, Mengdi Sun, Ran Duan, Yan Gao, Lingshun Kong, Long Sun, Xiang Li, Xingdong Zhang, Jiawei Zhang, Yaqi Wu, Jinshan Pan, Gaocheng Yu, Jin Zhang, Feng Zhang, Zhe Ma, Hongbin Wang, Hojin Cho, Steve Kim, Huaen Li, Yanbo Ma, Ziwei Luo, Youwei Li, Lei Yu, Zhihong Wen, Qi Wu, Haoqiang Fan, Shuaicheng Liu, Lize Zhang, Zhikai Zong, Jeremy Kwon, Junxi Zhang, Mengyuan Li, Nianxiang Fu, Guanchen Ding, Han Zhu, Zhenzhong Chen, Gen Li, Yuanfan Zhang, Lei Sun, Dafeng Zhang, Neo Yang, Fitz Liu, Jerry Zhao, Mustafa Ayazoglu, Bahri Batuhan Bilecen, Shota Hirose, Kasidis Arunruangsirilert, Luo Ao, Ho Chun Leung, Andrew Wei, Jie Liu, Qiang Liu, Dahai Yu, Ao Li, Lei Luo, Ce Zhu, Seongmin Hong, Dongwon Park, Joonhee Lee, Byeong Hyun Lee, Seunggyu Lee, Se Young Chun, Ruiyuan He, Xuhao Jiang, Haihang Ruan, Xinjian Zhang, Jing Liu, Garas Gendy, Nabil Sabor, Jingchao Hou, Guanghui He

While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints.

Image Super-Resolution

104

Paper
Code

A Simple and Robust Correlation Filtering Method for Text-based Person Search

1 code implementation • ECCV 2022 2022 • Wei Suo, Mengyang Sun, Kai Niu, Yiqi Gao, Peng Wang, Yanning Zhang, Qi Wu

Text-based person search aims to associate pedestrian images with natural language descriptions.

Ranked #8 on Text based Person Retrieval on ICFG-PEDES

Denoising Person Search +3

Paper
Code

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR) +2

Paper
Add Code

FoVolNet: Fast Volume Rendering using Foveated Deep Neural Networks

no code implementations • 20 Sep 2022 • David Bauer, Qi Wu, Kwan-Liu Ma

We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization.

Data Visualization Image Reconstruction +1

Paper
Add Code

Learning Distinct and Representative Styles for Image Captioning

1 code implementation • 17 Sep 2022 • Qi Chen, Chaorui Deng, Qi Wu

Our innovative idea is to explore the rich modes in the training caption corpus to learn a set of "mode embeddings", and further use them to control the mode of the generated captions for existing image captioning models.

Image Captioning Word Embeddings

Paper
Code

Moderately-Balanced Representation Learning for Treatment Effects with Orthogonality Information

no code implementations • 5 Sep 2022 • Yiyan Huang, Cheuk Hang Leung, Shumin Ma, Qi Wu, Dongdong Wang, Zhixiang Huang

In this paper, we propose a moderately-balanced representation learning (MBRL) framework based on recent covariates balanced representation learning methods and orthogonal machine learning theory.

Learning Theory Multi-Task Learning +2

Paper
Add Code

Robust Causal Learning for the Estimation of Average Treatment Effects

no code implementations • 5 Sep 2022 • Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Shumin Ma, Zhiri Yuan, Dongdong Wang, Zhixiang Huang

Theoretically, the RCL estimators i) are as consistent and doubly robust as the DML estimators, and ii) can get rid of the error-compounding issue.

Decision Making

Paper
Add Code

ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers

no code implementations • 28 Aug 2022 • Yutong Xie, Jianpeng Zhang, Yong Xia, Anton Van Den Hengel, Qi Wu

Besides, we further extend the clustering-guided attention from single-scale to multi-scale, which is conducive to dense prediction tasks.

Clustering Language Modelling

Paper
Add Code

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

2 code implementations • 24 Aug 2022 • Ziwei Luo, Youwei Li, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan, Shuaicheng Liu

The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI.

Image Super-Resolution Quantization

Paper
Code

Interactive Volume Visualization via Multi-Resolution Hash Encoding based Neural Representation

1 code implementation • 23 Jul 2022 • Qi Wu, David Bauer, Michael J. Doyle, Kwan-Liu Ma

Neural networks have shown great potential in compressing volume data for visualization.

Paper
Code

Optical Field Recovery in Jones Space

no code implementations • 22 Jun 2022 • Qi Wu, Yixiao Zhu, Hexun Jiang, Qunbi Zhuge, Weisheng Hu

For cost-sensitive short-reach optical networks, some advanced single-polarization (SP) optical field recovery schemes are recently proposed to avoid chromatic dispersion-induced power fading effect, and improve the spectral efficiency for larger potential capacity.

Paper
Add Code

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

no code implementations • 7 May 2022 • Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang

Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation.

Text Generation Video Captioning

Paper
Add Code

BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment

1 code implementation • 18 Apr 2022 • Ziwei Luo, Youwei Li, Shen Cheng, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan, Jian Sun, Shuaicheng Liu

To overcome the challenges in BurstSR, we propose a Burst Super-Resolution Transformer (BSRT), which can significantly improve the capability of extracting inter-frame information and reconstruction.

Ranked #1 on Burst Image Super-Resolution on BurstSR

Burst Image Reconstruction Burst Image Super-Resolution +2

176

Paper
Code

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu

Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation (VLN).

Ranked #4 on Visual Navigation on R2R

Decision Making Language Modelling +3

Paper
Code

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

1 code implementation • ACL 2022 • Jing Gu, Eliana Stefani, Qi Wu, Jesse Thomason, Xin Eric Wang

A long-term goal of AI research is to build intelligent agents that can communicate with humans in natural language, perceive the environment, and perform real-world tasks.

Vision and Language Navigation

271

Paper
Code

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering

1 code implementation • CVPR 2022 • Yang Ding, Jing Yu, Bang Liu, Yue Hu, Mingxin Cui, Qi Wu

Knowledge-based visual question answering requires the ability of associating external knowledge for open-ended cross-modal scene understanding.

Implicit Relations Question Answering +2

Paper
Code

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

1 code implementation • CVPR 2022 • Yicong Hong, Zun Wang, Qi Wu, Stephen Gould

To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.

Imitation Learning Vision and Language Navigation

Paper
Code

OpenKBP-Opt: An international and reproducible evaluation of 76 knowledge-based planning pipelines

1 code implementation • 16 Feb 2022 • Aaron Babier, Rafid Mahmood, Binghao Zhang, Victor G. L. Alves, Ana Maria Barragán-Montero, Joel Beaudry, Carlos E. Cardenas, Yankui Chang, Zijie Chen, Jaehee Chun, Kelly Diaz, Harold David Eraso, Erik Faustmann, Sibaji Gaj, Skylar Gay, Mary Gronberg, Bingqi Guo, Junjun He, Gerd Heilemann, Sanchit Hira, Yuliang Huang, Fuxin Ji, Dashan Jiang, Jean Carlo Jimenez Giraldo, Hoyeon Lee, Jun Lian, Shuolin Liu, Keng-Chi Liu, José Marrugo, Kentaro Miki, Kunio Nakamura, Tucker Netherton, Dan Nguyen, Hamidreza Nourzadeh, Alexander F. I. Osman, Zhao Peng, José Darío Quinto Muñoz, Christian Ramsl, Dong Joo Rhee, Juan David Rodriguez, Hongming Shan, Jeffrey V. Siebers, Mumtaz H. Soomro, Kay Sun, Andrés Usuga Hoyos, Carlos Valderrama, Rob Verbeek, Enpei Wang, Siri Willems, Qi Wu, Xuanang Xu, Sen yang, Lulin Yuan, Simeng Zhu, Lukas Zimmermann, Kevin L. Moore, Thomas G. Purdie, Andrea L. McNiven, Timothy C. Y. Chan

The dose predictions were input to four optimization models to form 76 unique KBP pipelines that generated 7600 plans.

Paper
Code

Maintaining Reasoning Consistency in Compositional Visual Question Answering

1 code implementation • CVPR 2022 • Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu

Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions.

Question Answering Visual Question Answering

Paper
Code

LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach

no code implementations • 19 Dec 2021 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hiroya Takamura, Qi Wu

We propose LocFormer, a Transformer-based model for video grounding which operates at a constant memory footprint regardless of the video length, i. e. number of frames.

Inductive Bias Video Grounding

Paper
Add Code

UniMiSS: Universal Medical Self-Supervised Learning via Breaking Dimensionality Barrier

1 code implementation • 17 Dec 2021 • Yutong Xie, Jianpeng Zhang, Yong Xia, Qi Wu

In this paper, we advocate bringing a wealth of 2D images like chest X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS.

Image Classification Medical Image Classification +2

Paper
Code

Debiased Visual Question Answering from Feature and Sample Perspectives

1 code implementation • NeurIPS 2021 • Zhiquan Wen, Guanghui Xu, Mingkui Tan, Qingyao Wu, Qi Wu

From the sample perspective, we construct two types of negative samples to assist the training of the models, without introducing additional annotations.

Bias Detection Question Answering +1

Paper
Code

Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision

1 code implementation • NeurIPS 2021 • Keji He, Yan Huang, Qi Wu, Jianhua Yang, Dong An, Shuanglin Sima, Liang Wang

In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions.

Navigate Vision and Language Navigation

Paper
Code

V2C: Visual Voice Cloning

no code implementations • CVPR 2022 • Qi Chen, Yuanqing Li, Yuankai Qi, Jiaqiu Zhou, Mingkui Tan, Qi Wu

Existing Voice Cloning (VC) tasks aim to convert a paragraph text to a speech with desired voice specified by a reference audio.

Voice Cloning

Paper
Add Code

Medical Visual Question Answering: A Survey

no code implementations • 19 Nov 2021 • Zhihong Lin, Donghao Zhang, Qingyi Tao, Danli Shi, Gholamreza Haffari, Qi Wu, Mingguang He, ZongYuan Ge

Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges.

Medical Visual Question Answering Question Answering +1

Paper
Add Code

Risk and return prediction for pricing portfolios of non-performing consumer credit

no code implementations • 28 Oct 2021 • Siyi Wang, Xing Yan, Bangqi Zheng, Hu Wang, Wangli Xu, Nanbo Peng, Qi Wu

We design a system for risk-analyzing and pricing portfolios of non-performing consumer credit loans.

Paper
Add Code

Memory Regulation and Alignment toward Generalizer RGB-Infrared Person

1 code implementation • 18 Sep 2021 • Feng Chen, Fei Wu, Qi Wu, Zhiguo Wan

The domain shift, coming from unneglectable modality gap and non-overlapped identity classes between training and test sets, is a major issue of RGB-Infrared person re-identification.

Attribute Metric Learning +1

Paper
Code

Data-driven advice for interpreting local and global model predictions in bioinformatics problems

no code implementations • 13 Aug 2021 • Markus Loecher, Qi Wu

For random forests, we find extremely high similarities and correlations of both local and global SHAP values and CFC scores, leading to very similar rankings and interpretations.

Feature Importance

Paper
Add Code

Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

1 code implementation • 5 Aug 2021 • Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo

In a series of experiments, we demonstrate that human gesture cues, even without predefined semantics, improve the object-goal navigation for an embodied agent, outperforming various state-of-the-art methods.

Paper
Code

Data Hiding with Deep Learning: A Survey Unifying Digital Watermarking and Steganography

no code implementations • 20 Jul 2021 • Zihan Wang, Olivia Byrnes, Hu Wang, Ruoxi Sun, Congbo Ma, Huaming Chen, Qi Wu, Minhui Xue

The advancement of secure communication and identity verification fields has significantly increased through the use of deep learning techniques for data hiding.

Paper
Add Code

Neighbor-view Enhanced Model for Vision and Language Navigation

1 code implementation • 15 Jul 2021 • Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan

Specifically, our NvEM utilizes a subject module and a reference module to collect contexts from neighbor views.

Ranked #82 on Vision and Language Navigation on VLN Challenge

Navigate Vision and Language Navigation

Paper
Code

Sketch, Ground, and Refine: Top-Down Dense Video Captioning

no code implementations • CVPR 2021 • Chaorui Deng, ShiZhe Chen, Da Chen, Yuan He, Qi Wu

The dense video captioning task aims to detect and describe a sequence of events in a video for detailed and coherent storytelling.

Dense Video Captioning Sentence

Paper
Add Code

VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

no code implementations • CVPR 2021 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

In this paper we propose a recurrent BERT model that is time-aware for use in VLN.

Decision Making Referring Expression +1

Paper
Add Code

Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression

1 code implementation • CVPR 2021 • Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu

The Remote Embodied Referring Expression (REVERIE) is a recently raised task that requires an agent to navigate to and localise a referred remote object according to a high-level language instruction.

Instruction Following Navigate +2

Paper
Code

Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention

no code implementations • 5 May 2021 • Wei Suo, Mengyang Sun, Peng Wang, Qi Wu

Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering.

Question Answering Referring Expression +3

Paper
Add Code

Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads

no code implementations • 30 Apr 2021 • Chenyu Gao, Qi Zhu, Peng Wang, Qi Wu

Based on this observation, we design a dynamic chopping module that can automatically remove heads and layers of the VisualBERT at an instance level when dealing with different questions.

Question Answering Visual Question Answering +1

Paper
Add Code

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

1 code implementation • CVPR 2021 • Guanghui Xu, Shuaicheng Niu, Mingkui Tan, Yucheng Luo, Qing Du, Qi Wu

This task, however, is very challenging because an image often contains complex texts and visual information that is hard to be described comprehensively.

Caption Generation Image Captioning

Paper
Code

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

1 code implementation • ICCV 2021 • Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

Paper
Code

Diagnosing Vision-and-Language Navigation: What Really Matters

1 code implementation • NAACL 2022 • Wanrong Zhu, Yuankai Qi, Pradyumna Narayana, Kazoo Sone, Sugato Basu, Xin Eric Wang, Qi Wu, Miguel Eckstein, William Yang Wang

Results show that indoor navigation agents refer to both object and direction tokens when making decisions.

Object Vision and Language Navigation

Paper
Code

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

1 code implementation • CVPR 2021 • Yazhou Yao, Tao Chen, GuoSen Xie, Chuanyi Zhang, Fumin Shen, Qi Wu, Zhenmin Tang, Jian Zhang

To further mine the non-salient region objects, we propose to exert the segmentation network's self-correction ability.

Object Segmentation +2

Paper
Code

Jo-SRC: A Contrastive Approach for Combating Noisy Labels

no code implementations • CVPR 2021 • Yazhou Yao, Zeren Sun, Chuanyi Zhang, Fumin Shen, Qi Wu, Jian Zhang, Zhenmin Tang

Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels usually results in inferior model performance.

Contrastive Learning Memorization

Paper
Add Code

Robust Orthogonal Machine Learning of Treatment Effects

no code implementations • 22 Mar 2021 • Yiyan Huang, Cheuk Hang Leung, Qi Wu, Xing Yan

Theoretically, the RCL estimators i) satisfy the (higher-order) orthogonal condition and are as \textit{consistent and doubly robust} as the DML estimators, and ii) get rid of the error-compounding issue.

BIG-bench Machine Learning

Paper
Add Code

Learning for Visual Navigation by Imagining the Success

no code implementations • 28 Feb 2021 • Mahdi Kazemi Moghaddam, Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success, e. g. either a sub-goal state that is important to reach before the target, or the goal state itself.

Navigate Reinforcement Learning (RL) +1

Paper
Add Code

Multi-intersection Traffic Optimisation: A Benchmark Dataset and a Strong Baseline

no code implementations • 24 Jan 2021 • Hu Wang, Hao Chen, Qi Wu, Congbo Ma, Yidong Li, Chunhua Shen

To address these issues, in this work we carefully design our settings and propose a new dataset including both synthetic and real traffic data in more complex scenarios.

Paper
Add Code

How to Train Your Agent to Read and Write

1 code implementation • 4 Jan 2021 • Li Liu, Mengge He, Guanghui Xu, Mingkui Tan, Qi Wu

Typically, this requires an agent to fully understand the knowledge from the given text materials and generate correct and fluent novel paragraphs, which is very challenging in practice.

Ranked #3 on KG-to-Text Generation on AGENDA

KG-to-Text Generation Knowledge Graphs

Paper
Code

Semantics for Robotic Mapping, Perception and Interaction: A Survey

no code implementations • 2 Jan 2021 • Sourav Garg, Niko Sünderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke, Michael Milford

In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning.

Autonomous Driving Navigate

Paper
Add Code

Memory-Gated Recurrent Networks

1 code implementation • 24 Dec 2020 • Yaquan Zhang, Qi Wu, Nanbo Peng, Min Dai, Jing Zhang, Hu Wang

The essence of multivariate sequential learning is all about how to extract dependencies in data.

Time Series Time Series Analysis

Paper
Code

The Causal Learning of Retail Delinquency

no code implementations • 17 Dec 2020 • Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Nanbo Peng, Dongdong Wang, Zhixiang Huang

Classical estimators overlook the confounding effects and hence the estimation error can be magnificent.

Paper
Add Code

MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets

no code implementations • SEMEVAL 2020 • Qi Wu, Peng Wang, Chenghao Huang

Natural language processing (NLP) has been applied to various fields including text classification and sentiment analysis.

Sentiment Analysis text-classification +1

Paper
Add Code

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

1 code implementation • 9 Dec 2020 • Qi Zhu, Chenyu Gao, Peng Wang, Qi Wu

Texts appearing in daily scenes that can be recognized by OCR (Optical Character Recognition) tools contain significant information, such as street name, product brand and prices.

Image Captioning Optical Character Recognition +3

Paper
Code

Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

1 code implementation • 7 Dec 2020 • Zhaokai Wang, Renda Bao, Qi Wu, Si Liu

Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens.

Image Captioning Optical Character Recognition +1

Paper
Code

Generative Learning of Heterogeneous Tail Dependence

no code implementations • 26 Nov 2020 • Xiangqian Sun, Xing Yan, Qi Wu

We propose a multivariate generative model to capture the complex dependence structure often encountered in business and financial data.

Paper
Add Code

A Recurrent Vision-and-Language BERT for Navigation

1 code implementation • 26 Nov 2020 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

In this paper we propose a recurrent BERT model that is time-aware for use in VLN.

Ranked #7 on Visual Navigation on R2R

Decision Making Multi-Task Learning +3

143

Paper
Code

Modular Graph Attention Network for Complex Visual Relational Reasoning

no code implementations • 22 Nov 2020 • Yihan Zheng, Zhiquan Wen, Mingkui Tan, Runhao Zeng, Qi Chen, YaoWei Wang, Qi Wu

Moreover, to capture the complex logic in a query, we construct a relational graph to represent the visual objects and their relationships, and propose a multi-step reasoning method to progressively understand the complex logic.

Ranked #2 on Referring Expression Comprehension on CLEVR-Ref+

Graph Attention Question Answering +5

Paper
Add Code

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

no code implementations • 22 Nov 2020 • Weixia Zhang, Chao Ma, Qi Wu, Xiaokang Yang

We then propose to recursively alternate the learning schemes of imitation and exploration to narrow the discrepancy between training and inference.

Imitation Learning Navigate +1

Paper
Add Code

Language and Visual Entity Relationship Graph for Agent Navigation

1 code implementation • NeurIPS 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould

From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.

Dynamic Time Warping Navigate +2

Paper
Code

Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning

no code implementations • NeurIPS 2018 • Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, Qi Wu

We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns.

regression Time Series +1

Paper
Add Code

MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition with Multi-Domain Deep Learning Model

no code implementations • 20 Sep 2020 • Ling Pei, Songpengcheng Xia, Lei Chu, Fanyi Xiao, Qi Wu, Wenxian Yu, Robert Qiu

Together with the rapid development of the Internet of Things (IoT), human activity recognition (HAR) using wearable Inertial Measurement Units (IMUs) becomes a promising technology for many research areas.

Human Activity Recognition Transfer Learning

Paper
Add Code

CogTree: Cognition Tree Loss for Unbiased Scene Graph Generation

1 code implementation • 16 Sep 2020 • Jing Yu, Yuan Chai, Yujing Wang, Yue Hu, Qi Wu

We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model.

Ranked #2 on Scene Graph Generation on Visual Genome (mean Recall @20 metric)

Graph Generation Unbiased Scene Graph Generation

Paper
Code

Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze

1 code implementation • 15 Sep 2020 • Jinquan Li, Ling Pei, Danping Zou, Songpengcheng Xia, Qi Wu, Tao Li, Zhen Sun, Wenxian Yu

This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM.

Simultaneous Localization and Mapping

Paper
Code

Data-driven Meta-set Based Fine-Grained Visual Classification

1 code implementation • 6 Aug 2020 • Chuanyi Zhang, Yazhou Yao, Xiangbo Shu, Zechao Li, Zhenmin Tang, Qi Wu

To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.

Classification Fine-Grained Image Classification +3

Paper
Code

Object-and-Action Aware Model for Visual Language Navigation

no code implementations • ECCV 2020 • Yuankai Qi, Zizheng Pan, Shengping Zhang, Anton Van Den Hengel, Qi Wu

The first is object description (e. g., 'table', 'door'), each presenting as a tip for the agent to determine the next action by finding the item visible in the environment, and the second is action specification (e. g., 'go straight', 'turn left') which allows the robot to directly predict the next movements without relying on visual perceptions.

Object Vision and Language Navigation

Paper
Add Code

Soft Expert Reward Learning for Vision-and-Language Navigation

no code implementations • ECCV 2020 • Hu Wang, Qi Wu, Chunhua Shen

In this paper, we introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.

Reinforcement Learning (RL) Vision and Language Navigation

Paper
Add Code

Referring Expression Comprehension: A Survey of Methods and Datasets

no code implementations • 19 Jul 2020 • Yanyuan Qiao, Chaorui Deng, Qi Wu

In this survey, we first examine the state of the art by comparing modern approaches to the problem.

object-detection Object Detection +2

Paper
Add Code

Length-Controllable Image Captioning

1 code implementation • ECCV 2020 • Chaorui Deng, Ning Ding, Mingkui Tan, Qi Wu

We verify the merit of the proposed length level embedding on three models: two state-of-the-art (SOTA) autoregressive models with different types of decoder, as well as our proposed non-autoregressive model, to show its generalization ability.

controllable image captioning

Paper
Code

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

1 code implementation • ECCV 2020 • Ruixue Tang, Chao Ma, Wei Emma Zhang, Qi Wu, Xiaokang Yang

However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure -- an $\langle image, question, answer\rangle$ triplet needs to be maintained correctly.

Adversarial Attack Data Augmentation +2

Paper
Code

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

4 code implementations • 7 Jul 2020 • Xiaoze Jiang, Jing Yu, Yajing Sun, Zengchang Qin, Zihao Zhu, Yue Hu, Qi Wu

The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation.

Paper
Code

Foreground-Background Imbalance Problem in Deep Object Detectors: A Review

no code implementations • 16 Jun 2020 • Joya Chen, Qi Wu, Dong Liu, Tong Xu

Recent years have witnessed the remarkable developments made by deep learning techniques for object detection, a fundamentally challenging problem of computer vision.

Object object-detection +1

Paper
Add Code

Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering

no code implementations • 16 Jun 2020 • Zihao Zhu, Jing Yu, Yujing Wang, Yajing Sun, Yue Hu, Qi Wu

In this paper, we depict an image by a multi-modal heterogeneous graph, which contains multiple layers of information corresponding to the visual, semantic and factual features.

Question Answering Visual Question Answering

Paper
Add Code

Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge

1 code implementation • 2 Jun 2020 • Peng Wang, Dongyang Liu, Hui Li, Qi Wu

In this case, we need to use commonsense knowledge to identify the objects in the image.

16k Referring Expression +1

Paper
Code

Structured Multimodal Attentions for TextVQA

2 code implementations • 1 Jun 2020 • Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton Van Den Hengel, Qi Wu

In this paper, we propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.

Graph Attention Optical Character Recognition (OCR) +3

Paper
Code

Optimistic Agent: Accurate Graph-Based Value Estimation for More Successful Visual Navigation

no code implementations • 7 Apr 2020 • Mahdi Kazemi Moghaddam, Qi Wu, Ehsan Abbasnejad, Javen Qinfeng Shi

Through empirical studies, we show that our agent, dubbed as the optimistic agent, has a more realistic estimate of the state value during a navigation episode which leads to a higher success rate.

Reinforcement Learning (RL) Visual Navigation

Paper
Add Code

Sub-Instruction Aware Vision-and-Language Navigation

1 code implementation • EMNLP 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.

Navigate Vision and Language Navigation

Paper
Code

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

4 code implementations • CVPR 2020 • Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu

To improve fine-grained video-text retrieval, we propose a Hierarchical Graph Reasoning (HGR) model, which decomposes video-text matching into global-to-local levels.

Cross-Modal Retrieval Retrieval +3

220

Paper
Code

Intelligent Home 3D: Automatic 3D-House Design from Linguistic Descriptions Only

1 code implementation • CVPR 2020 • Qi Chen, Qi Wu, Rui Tang, Yu-Han Wang, Shuai Wang, Mingkui Tan

To this end, we propose a House Plan Generative Model (HPGM) that first translates the language input to a structural graph representation and then predicts the layout of rooms with a Graph Conditioned Layout Prediction Network (GC LPN) and generates the interior texture with a Language Conditioned Texture GAN (LCT-GAN).

Text to 3D

Paper
Code

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

1 code implementation • CVPR 2020 • Shizhe Chen, Qin Jin, Peng Wang, Qi Wu

From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions according to the graph structure.

Attribute Caption Generation +1

198

Paper
Code

Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension

no code implementations • CVPR 2020 • Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, Qi Wu

To bridge the gap, we propose a new dataset for visual reasoning in context of referring expression comprehension with two main features.

Referring Expression Referring Expression Comprehension +1

Paper
Add Code

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

1 code implementation • 17 Nov 2019 • Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu, Qi Wu

More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.

Ranked #6 on Visual Dialog on VisDial v0.9 val

feature selection Question Answering +2

Paper
Code

OpenArray v1.0: a simple operator library for the decoupling of ocean modeling and parallel computing

1 code implementation • Geoscientific Model Development 2019 • Xiaomeng Huang, Xing Huang, Dong Wang, Qi Wu, Yi Li, Shixun Zhang, YuWen Chen, Mingqing Wang, Yuan Gao, Qiang Tang, Yue Chen, Zheng Fang, Zhenya Song, Guangwen Yang

In this work, we design a simple computing library to bridge the gap and decouple the work of ocean modeling from parallel computing.

332

Paper
Code

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

no code implementations • 15 Oct 2019 • Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu

This notebook paper presents our model in the VATEX video captioning challenge.

Video Captioning

Paper
Add Code

BERT-based Financial Sentiment Index and LSTM-based Stock Return Predictability

no code implementations • 21 Jun 2019 • Joshua Zoen Git Hiew, Xin Huang, Hao Mou, Duan Li, Qi Wu, Yabo Xu

On the other hand, by combining with the other two commonly-used methods when it comes to building the sentiment index in the financial literature, i. e., the option-implied and the market-implied approaches, we propose a more general and comprehensive framework for the financial sentiment analysis, and further provide convincing outcomes for the predictability of individual stock return by combining LSTM (with a feature of a nonlinear mapping).

Sentiment Analysis

Paper
Add Code

Neural Learning of Online Consumer Credit Risk

no code implementations • 5 Jun 2019 • Di Wang, Qi Wu, Wen Zhang

This paper takes a deep learning approach to understand consumer credit risk when e-commerce platforms issue unsecured credit to finance customers' purchase.

Time Series Time Series Analysis

Paper
Add Code

Understanding Distributional Ambiguity via Non-robust Chance Constraint

no code implementations • 3 Jun 2019 • Qi Wu, Shumin Ma, Cheuk Hang Leung, Wei Liu, Nanbo Peng

Without the boundedness constraint, the CCO problem is shown to perform uniformly better than the DRO problem, irrespective of the radius of the ambiguity set, the choice of the divergence measure, or the tail heaviness of the center distribution.

Portfolio Optimization

Paper
Add Code

Show, Price and Negotiate: A Negotiator with Online Value Look-Ahead

no code implementations • 7 May 2019 • Amin Parvaneh, Ehsan Abbasnejad, Qi Wu, Javen Qinfeng Shi, Anton Van Den Hengel

Negotiation, as an essential and complicated aspect of online shopping, is still challenging for an intelligent agent.

Paper
Add Code

REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments

1 code implementation • CVPR 2020 • Yuankai Qi, Qi Wu, Peter Anderson, Xin Wang, William Yang Wang, Chunhua Shen, Anton Van Den Hengel

One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language.

Referring Expression Vision and Language Navigation

105

Paper
Code

You Only Look & Listen Once: Towards Fast and Accurate Visual Grounding

no code implementations • 12 Feb 2019 • Chaorui Deng, Qi Wu, Guanghui Xu, Zhuliang Yu, Yanwu Xu, Kui Jia, Mingkui Tan

Most state-of-the-art methods in VG operate in a two-stage manner, wherein the first stage an object detector is adopted to generate a set of object proposals from the input image and the second stage is simply formulated as a cross-modal matching problem that finds the best match between the language query and all region proposals.

object-detection Object Detection +2

Paper
Add Code

Gold Seeker: Information Gain from Policy Distributions for Goal-oriented Vision-and-Langauge Reasoning

no code implementations • CVPR 2020 • Ehsan Abbasnejad, Iman Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

For each potential action a distribution of the expected outcomes is calculated, and the value of the potential information gain assessed.

Visual Dialog

Paper
Add Code

What's to know? Uncertainty as a Guide to Asking Goal-oriented Questions

no code implementations • CVPR 2019 • Ehsan Abbasnejad, Qi Wu, Javen Shi, Anton Van Den Hengel

We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output.

Visual Dialog

Paper
Add Code

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks

no code implementations • CVPR 2019 • Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton Van Den Hengel

Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches.

Graph Attention Object +2

Paper
Add Code

Deep Template Matching for Offline Handwritten Chinese Character Recognition

no code implementations • 15 Nov 2018 • Zhiyuan Li, Min Jin, Qi Wu, Huaxiang Lu

Just like its remarkable achievements in many computer vision tasks, the convolutional neural networks (CNN) provide an end-to-end solution in handwritten Chinese character recognition (HCCR) with great success.

Binary Classification Offline Handwritten Chinese Character Recognition +1

Paper
Add Code

Goal-Oriented Visual Question Generation via Intermediate Rewards

no code implementations • ECCV 2018 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel

Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge.

Informativeness Question Generation +2

Paper
Add Code

Connecting Language and Vision to Actions

no code implementations • ACL 2018 • Peter Anderson, Abhishek Das, Qi Wu

A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment.

Image Captioning Language Modelling +3

Paper
Add Code

Topological Data Analysis Made Easy with the Topology ToolKit

no code implementations • 21 Jun 2018 • Guillaume Favelier, Charles Gueunet, Attila Gyulassy, Julien Kitware, Joshua Levine, Jonas Lukasczyk, Daisuke Sakurai, Maxime Soler, Julien Tierny, Will Usher, Qi Wu

This tutorial presents topological methods for the analysis and visualization of scientific data from a user's perspective, with the Topology ToolKit (TTK), a recently released open-source library for topological data analysis.

Topological Data Analysis

Paper
Add Code

Visual Grounding via Accumulated Attention

no code implementations • CVPR 2018 • Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, Mingkui Tan

There are three main challenges in VG: 1) what is the main focus in a query; 2) how to understand an image; 3) how to locate an object.

Sentence Visual Grounding

Paper
Add Code

Learning Semantic Concepts and Order for Image and Sentence Matching

no code implementations • CVPR 2018 • Yan Huang, Qi Wu, Liang Wang

This mainly arises from that the representation of pixel-level image usually lacks of high-level semantic information as in its matched sentence.

Ranked #11 on Image Retrieval on Flickr30K 1K test

Cross-Modal Retrieval Sentence

Paper
Add Code

Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

no code implementations • CVPR 2018 • Qi Wu, Peng Wang, Chunhua Shen, Ian Reid, Anton Van Den Hengel

The Visual Dialogue task requires an agent to engage in a conversation about an image with a human.

Ranked #4 on Visual Dialog on VisDial v0.9 val

Question Answering Visual Dialog +1

Paper
Add Code

Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards

no code implementations • 21 Nov 2017 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu, Anton Van Den Hengel

Informativeness Question Generation +2

Paper
Add Code

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

8 code implementations • CVPR 2018 • Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Ranked #10 on Visual Navigation on R2R

Translation Vision and Language Navigation +2

451

Paper
Code

Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement

no code implementations • 19 Nov 2017 • Jun-Jie Zhang, Qi Wu, Jian Zhang, Chunhua Shen, Jianfeng Lu

These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags.

Retrieval TAG

Paper
Add Code

Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.

Object Object Discovery +2

Paper
Add Code

Visual Question Answering with Memory-Augmented Networks

no code implementations • CVPR 2018 • Chao Ma, Chunhua Shen, Anthony Dick, Qi Wu, Peng Wang, Anton Van Den Hengel, Ian Reid

In this paper, we exploit a memory-augmented neural network to predict accurate answers to visual questions, even when those answers occur rarely in the training set.

Question Answering Visual Question Answering

Paper
Add Code

Classification of Medical Images and Illustrations in the Biomedical Literature Using Synergic Deep Learning

no code implementations • 28 Jun 2017 • Jianpeng Zhang, Yong Xia, Qi Wu, Yutong Xie

The Classification of medical images and illustrations in the literature aims to label a medical image according to the modality it was produced or label an illustration according to its production attributes.

General Classification Image Classification +2

Paper
Add Code

Care about you: towards large-scale human-centric visual relationship detection

no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel

In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.

Human-Object Interaction Detection Relationship Detection +1

Paper
Add Code

The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions

no code implementations • CVPR 2017 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel

To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best.

BIG-bench Machine Learning Question Answering +1

Paper
Add Code

Multi-Label Image Classification with Regional Latent Semantic Dependencies

no code implementations • 4 Dec 2016 • Jun-Jie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, Jianfeng Lu

Recent state-of-the-art approaches to multi-label image classification exploit the label dependencies in an image, at global level, largely improving the labeling capacity.

Classification General Classification +1

Paper
Add Code

Visual Question Answering: A Survey of Methods and Datasets

1 code implementation • 20 Jul 2016 • Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel

Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities.

General Knowledge Visual Question Answering

436

Paper
Code

FVQA: Fact-based Visual Question Answering

no code implementations • 17 Jun 2016 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick

We evaluate several baseline models on the FVQA dataset, and describe a novel model which is capable of reasoning about an image on the basis of supporting facts.

Ranked #2 on Visual Question Answering (VQA) on F-VQA

Common Sense Reasoning Question Answering +1

Paper
Add Code

Image Captioning and Visual Question Answering Based on Attributes and External Knowledge

no code implementations • 9 Mar 2016 • Qi Wu, Chunhua Shen, Anton Van Den Hengel, Peng Wang, Anthony Dick

Much recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Ranked #9 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 open ended

General Knowledge Image Captioning +2

Paper
Add Code

Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources

no code implementations • CVPR 2016 • Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, Anton Van Den Hengel

Priming a recurrent neural network with this combined information, and the submitted question, leads to a very flexible visual question answering approach.

General Knowledge Question Answering +1

Paper
Add Code

Explicit Knowledge-based Reasoning for Visual Question Answering

no code implementations • 9 Nov 2015 • Peng Wang, Qi Wu, Chunhua Shen, Anton Van Den Hengel, Anthony Dick

We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base.

Question Answering Visual Question Answering

Paper
Add Code

What value do explicit high level concepts have in vision to language problems?

1 code implementation • CVPR 2016 • Qi Wu, Chunhua Shen, Lingqiao Liu, Anthony Dick, Anton Van Den Hengel

Much of the recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).

Image Captioning Question Answering +1

Paper
Code

The Cross-Depiction Problem: Computer Vision Algorithms for Recognising Objects in Artwork and in Photographs

1 code implementation • 1 May 2015 • Hongping Cai, Qi Wu, Tadeo Corradi, Peter Hall

The cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc.

Domain Adaptation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.