Search Results for author: Hanwang Zhang

Found 63 papers, 41 papers with code

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

1 code implementation3 Oct 2021 Long Chen, Yuhang Zheng, Yulei Niu, Hanwang Zhang, Jun Xiao

Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST).

Question Answering Visual Question Answering

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations ICCV 2021 Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Image Captioning Question Answering +1

Causal Attention for Unbiased Visual Recognition

1 code implementation ICCV 2021 Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang

Attention module does not always help deep models learn causal features that are robust in any confounding context, e. g., a foreground object feature is invariant to different backgrounds.

Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion

1 code implementation ACL 2021 Yixin Cao, Xiang Ji, Xin Lv, Juanzi Li, Yonggang Wen, Hanwang Zhang

We present InferWiki, a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns.

Knowledge Graph Completion

Transporting Causal Mechanisms for Unsupervised Domain Adaptation

1 code implementation ICCV 2021 Zhongqi Yue, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang

However, the theoretical solution provided by transportability is far from practical for UDA, because it requires the stratification and representation of the unobserved confounder that is the cause of the domain gap.

Unsupervised Domain Adaptation

Adversarial Visual Robustness by Causal Intervention

2 code implementations17 Jun 2021 Kaihua Tang, Mingyuan Tao, Hanwang Zhang

As these visual confounders are imperceptible in general, we propose to use the instrumental variable that achieves causal intervention without the need for confounder observation.

Empowering Language Understanding with Counterfactual Reasoning

1 code implementation6 Jun 2021 Fuli Feng, Jizhi Zhang, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning.

Natural Language Inference Sentiment Analysis

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

no code implementations12 May 2021 Wenbo Ma, Long Chen, Hanwang Zhang, Jian Shao, Yueting Zhuang, Jun Xiao

In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i. e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i. e., query-aware).

Text Matching

TransferNet: An Effective and Transparent Framework for Multi-hop Question Answering over Relation Graph

1 code implementation15 Apr 2021 Jiaxin Shi, Shulin Cao, Lei Hou, Juanzi Li, Hanwang Zhang

Multi-hop Question Answering (QA) is a challenging task because it requires precise reasoning with entity relations at every step towards the answer.

Multi-hop Question Answering Question Answering

Causal Attention for Vision-Language Tasks

no code implementations CVPR 2021 Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

Distilling Causal Effect of Data in Class-Incremental Learning

1 code implementation CVPR 2021 Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang

We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation.

class-incremental learning Incremental Learning

Counterfactual Zero-Shot and Open-Set Visual Recognition

1 code implementation CVPR 2021 Zhongqi Yue, Tan Wang, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua

We show that the key reason is that the generation is not Counterfactual Faithful, and thus we propose a faithful one, whose generation is from the sample-specific counterfactual question: What would the sample look like, if we set its class attribute to a certain class, while keeping its sample attribute unchanged?

Open Set Learning Zero-Shot Learning

Interventional Few-Shot Learning

1 code implementation NeurIPS 2020 Zhongqi Yue, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua

Specifically, we develop three effective IFSL algorithmic implementations based on the backdoor adjustment, which is essentially a causal intervention towards the SCM of many-shot learning: the upper-bound of FSL in a causal view.

Few-Shot Learning

Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue

no code implementations21 Sep 2020 Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

However, we argue that there is a significant gap between clicks and user satisfaction -- it is common that a user is "cheated" to click an item by the attractive title/cover of the item.

Click-Through Rate Prediction Counterfactual Inference

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

1 code implementation3 Sep 2020 Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Shih-Fu Chang

The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals.

Feature Pyramid Transformer

1 code implementation ECCV 2020 Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua, Qianru Sun

Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.

Instance Segmentation Object Detection +1

KQA Pro: A Large-Scale Dataset with Interpretable Programs and Accurate SPARQLs for Complex Question Answering over Knowledge Base

no code implementations8 Jul 2020 Jiaxin Shi, Shulin Cao, Liangming Pan, Yutong Xiang, Lei Hou, Juanzi Li, Hanwang Zhang, Bin He

Existing benchmarks have some shortcomings that limit the development of Complex KBQA: 1) they only provide QA pairs without explicit reasoning processes; 2) questions are either generated by templates, leading to poor diversity, or on a small scale.

Question Answering Semantic Parsing

Counterfactual VQA: A Cause-Effect Look at Language Bias

1 code implementation CVPR 2021 Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen

VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language.

Counterfactual Inference Question Answering +1

Iterative Context-Aware Graph Inference for Visual Dialog

1 code implementation CVPR 2020 Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.

Graph Attention Graph Embedding +1

Learning to Segment the Tail

1 code implementation CVPR 2020 Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, Hanwang Zhang

Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data.

Few-Shot Learning Incremental Learning

More Grounded Image Captioning by Distilling Image-Text Matching Model

1 code implementation CVPR 2020 Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang

To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision.

Image Captioning Knowledge Distillation +2

Counterfactual Samples Synthesizing for Robust Visual Question Answering

2 code implementations CVPR 2020 Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, ShiLiang Pu, Yueting Zhuang

To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP.

 Ranked #1 on Visual Question Answering on VQA-CP (using extra training data)

Question Answering Visual Question Answering

Deconfounded Image Captioning: A Causal Retrospect

no code implementations9 Mar 2020 Xu Yang, Hanwang Zhang, Jianfei Cai

The dataset bias in vision-language tasks is becoming one of the main problems that hinder the progress of our community.

Causal Inference Image Captioning

Cross-GCN: Enhancing Graph Convolutional Network with $k$-Order Feature Interactions

no code implementations5 Mar 2020 Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

Graph Convolutional Network (GCN) is an emerging technique that performs learning and reasoning on graph data.

Document Classification

Visual Commonsense R-CNN

2 code implementations CVPR 2020 Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun

We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA.

Image Captioning Representation Learning +1

Unbiased Scene Graph Generation from Biased Training

4 code implementations CVPR 2020 Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang

Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e. g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach".

Causal Inference Graph Generation +1

General Partial Label Learning via Dual Bipartite Graph Autoencoder

no code implementations5 Jan 2020 Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.

Partial Label Learning

Two Causal Principles for Improving Visual Dialog

1 code implementation CVPR 2020 Jiaxin Qi, Yulei Niu, Jianqiang Huang, Hanwang Zhang

This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial).

Visual Dialog

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

no code implementations8 Jul 2019 Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang

Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.

Multiple Instance Learning

Joint Visual Grounding with Language Scene Graphs

no code implementations9 Jun 2019 Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, Qianru Sun

In this paper, we alleviate the missing-annotation problem and enable the joint reasoning by leveraging the language scene graph which covers both labeled referent and unlabeled contexts (other objects, attributes, and relationships).

Visual Grounding

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

1 code implementation6 Jun 2019 Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.

Image Captioning Image Paragraph Captioning +1

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

no code implementations5 Jun 2019 Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, Hanwang Zhang

Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space.

Visual Grounding Visual Reasoning

Learning to Collocate Neural Modules for Image Captioning

no code implementations ICCV 2019 Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Image Captioning Visual Question Answering +1

Making History Matter: History-Advantage Sequence Training for Visual Dialog

no code implementations ICCV 2019 Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang

We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history.

Visual Dialog Visual Reasoning

Learning to Assemble Neural Module Tree Networks for Visual Grounding

no code implementations ICCV 2019 Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha

In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed.

Dependency Parsing Natural Language Visual Grounding +3

Recursive Visual Attention in Visual Dialog

1 code implementation CVPR 2019 Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Question Answering Visual Dialog +1

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

no code implementations ICCV 2019 Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, ShiLiang Pu, Shih-Fu Chang

CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.

Graph Generation Scene Graph Generation +1

Auto-Encoding Scene Graphs for Image Captioning

1 code implementation CVPR 2019 Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Image Captioning

Explainable and Explicit Visual Reasoning over Scene Graphs

2 code implementations CVPR 2019 Jiaxin Shi, Hanwang Zhang, Juanzi Li

We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using scene graphs --- objects as nodes and the pairwise relationships as edges --- for explainable and explicit reasoning with structured knowledge.

Visual Question Answering Visual Reasoning

Learning to Compose Dynamic Tree Structures for Visual Contexts

5 code implementations CVPR 2019 Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu

We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.

Graph Generation Scene Graph Generation +2

Learning to Embed Sentences Using Attentive Recursive Trees

2 code implementations6 Nov 2018 Jiaxin Shi, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang

Sentence embedding is an effective feature representation for most deep learning-based NLP tasks.

Sentence Embedding

Stochastic Dynamics for Video Infilling

no code implementations1 Sep 2018 Qiangeng Xu, Hanwang Zhang, Weiyue Wang, Peter N. Belhumeur, Ulrich Neumann

In this paper, we introduce a stochastic dynamics video infilling (SDVI) framework to generate frames between long intervals in a video.

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

1 code implementation16 Aug 2018 Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu

To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning.

Image Captioning

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features

1 code implementation ECCV 2018 Xu Yang, Hanwang Zhang, Jianfei Cai

By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.

Discrete Factorization Machines for Fast Feature-based Recommendation

1 code implementation6 May 2018 Han Liu, Xiangnan He, Fuli Feng, Liqiang Nie, Rui Liu, Hanwang Zhang

In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation.

Binarization Quantization

Learning to Guide Decoding for Image Captioning

no code implementations3 Apr 2018 Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.

Image Captioning

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

no code implementations7 Feb 2018 Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.

Binarization Video Retrieval

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

1 code implementation CVPR 2018 Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang

We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training.

General Classification Zero-Shot Learning

Grounding Referring Expressions in Images by Variational Context

1 code implementation CVPR 2018 Hanwang Zhang, Yulei Niu, Shih-Fu Chang

This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.

Multiple Instance Learning

Neural Collaborative Filtering

35 code implementations WWW 2017 Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua

When it comes to model the key factor in collaborative filtering -- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items.

Collaborative Filtering Recommendation Systems +1

Fast Matrix Factorization for Online Recommendation with Implicit Feedback

3 code implementations16 Aug 2017 Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua

To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data.

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

3 code implementations15 Aug 2017 Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions.

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

no code implementations ICCV 2017 Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, Shih-Fu Chang

We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level.

Weakly Supervised Object Detection

Attributed Social Network Embedding

1 code implementation14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks.

Social and Information Networks

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

1 code implementation CVPR 2017 Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua

Existing visual attention models are generally spatial, i. e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image.

Image Captioning

Online Collaborative Learning for Open-Vocabulary Visual Classifiers

no code implementations CVPR 2016 Hanwang Zhang, Xindi Shang, Wenzhuo Yang, Huan Xu, Huanbo Luan, Tat-Seng Chua

Leveraging on the structure of the proposed collaborative learning formulation, we develop an efficient online algorithm that can jointly learn the label embeddings and visual classifiers.

Learning Image and User Features for Recommendation in Social Networks

no code implementations ICCV 2015 Xue Geng, Hanwang Zhang, Jingwen Bian, Tat-Seng Chua

It is often a great challenge for traditional recommender systems to learn representative features of both users and images in large social networks, in particular, social curation networks, which are characterized as the extremely sparse links between users and images, and the extremely diverse visual contents of images.

Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.