no code implementations • ACL 2022 • Moxin Li, Fuli Feng, Hanwang Zhang, Xiangnan He, Fengbin Zhu, Tat-Seng Chua
Neural discrete reasoning (NDR) has shown remarkable progress in combining deep models with discrete reasoning.
no code implementations • 25 Sep 2023 • Dongsheng Wang, Miaoge Li, Xinyang Liu, MingSheng Xu, Bo Chen, Hanwang Zhang
To address the limitation, we propose a multi-mode token-level tuning framework that leverages the optimal transportation to learn and align a set of prompt tokens across modalities.
1 code implementation • 22 Sep 2023 • Zhongqi Yue, Hanwang Zhang, Qianru Sun
Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e. g., class identity) and domain-specific features (e. g., environment) that does not generalize to the target domain.
no code implementations • 17 Sep 2023 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Many studies focus on improving pretraining or developing new backbones in text-video retrieval.
no code implementations • 26 Aug 2023 • Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua
In this work, we investigate strengthening the awareness of video dynamics for DMs, for high-quality T2V generation.
no code implementations • 18 Aug 2023 • Xuanyu Yi, Jiajun Deng, Qianru Sun, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang
We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model.
1 code implementation • 8 Aug 2023 • Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang
To address this issue, we propose a generic and lightweight controllable knowledge re-injection module, which utilizes the sophisticated reasoning ability of LLMs to control the VPG to conditionally extract instruction-specific visual information and re-inject it into the LLM.
1 code implementation • 17 Jul 2023 • Yanghao Wang, Zhongqi Yue, Xian-Sheng Hua, Hanwang Zhang
First, as the randomization is independent of the distribution of the limited known objects, the random proposals become the instrumental variable that prevents the training from being confounded by the known objects.
1 code implementation • 30 Jun 2023 • Tan Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
Generative AI has made significant strides in computer vision, particularly in image/video synthesis conditioned on text descriptions.
1 code implementation • 12 Jun 2023 • Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang
To mitigate this, we propose a Fast Diffusion Model (FDM) which improves the diffusion process of DMs from a stochastic optimization perspective to speed up both training and sampling.
no code implementations • 7 Jun 2023 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Text-video retrieval contains various challenges, including biases coming from diverse sources.
4 code implementations • 23 May 2023 • Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang
In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and observe that it is equivalent to the Doupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels.
1 code implementation • 25 Mar 2023 • Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.
1 code implementation • CVPR 2023 • Hui Lv, Zhongqi Yue, Qianru Sun, Bin Luo, Zhen Cui, Hanwang Zhang
At each MIL training iteration, we use the current detector to divide the samples into two groups with different context biases: the most confident abnormal/normal snippets and the rest ambiguous ones.
1 code implementation • CVPR 2023 • Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF).
1 code implementation • 1 Feb 2023 • Kaifeng Gao, Long Chen, Hanwang Zhang, Jun Xiao, Qianru Sun
Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones.
no code implementations • 29 Jan 2023 • Beier Zhu, Yulei Niu, Saeil Lee, Minhoe Hur, Hanwang Zhang
We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg).
no code implementations • 5 Jan 2023 • Zihua Wang, Xu Yang, Haiyang Xu, Hanwang Zhang, and Qinghao Ye, Chenliang Li, and Weiwei Sun, Ming Yan, Songfang Huang, Fei Huang, Yu Zhang
We design a novel global-local Transformer named \textbf{Ada-ClustFormer} (\textbf{ACF}) to generate captions.
no code implementations • 5 Jan 2023 • Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang
To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.
1 code implementation • CVPR 2023 • Muli Yang, Liancheng Wang, Cheng Deng, Hanwang Zhang
Novel Class Discovery (NCD) aims to discover unknown classes without any annotation, by exploiting the transferable knowledge already learned from a base set of known classes.
no code implementations • 23 Nov 2022 • Haoxin Li, YuAn Liu, Hanwang Zhang, Boyang Li
In video action recognition, shortcut static features can interfere with the learning of motion features, resulting in poor out-of-distribution (OOD) generalization.
no code implementations • 20 Nov 2022 • Jianqiang Huang, Jian Wang, Qianru Sun, Hanwang Zhang
An intuitive solution is ``coupling'' the CAM with the long-range attention matrix of visual transformers (ViT) We find that the direct ``coupling'', e. g., pixel-wise multiplication of attention and activation, achieves a more global coverage (on the foreground), but unfortunately goes with a great increase of false positives, i. e., background pixels are mistakenly included.
Weakly supervised Semantic Segmentation
Weakly-Supervised Semantic Segmentation
no code implementations • 23 Oct 2022 • Yulei Niu, Long Chen, Chang Zhou, Hanwang Zhang
The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set.
1 code implementation • 4 Oct 2022 • Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai
This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.
1 code implementation • 6 Aug 2022 • Jiaxin Qi, Kaihua Tang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang
If the context in every class is evenly distributed, OOD would be trivial because the context can be easily removed due to an underlying principle: class is invariant to context.
no code implementations • 27 Jul 2022 • Lin Li, Long Chen, Hanrong Shi, Hanwang Zhang, Yi Yang, Wei Liu, Jun Xiao
To this end, we propose a novel NoIsy label CorrEction and Sample Training strategy for SGG: NICEST.
1 code implementation • 27 Jul 2022 • Xuanyu Yi, Kaihua Tang, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang
Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously "easy" noises are now turned into "hard" ones -- they are almost as outliers as the clean tail samples.
1 code implementation • 25 Jul 2022 • Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang
We are interested in learning robust models from insufficient data, without the need for any externally pre-trained checkpoints.
1 code implementation • 19 Jul 2022 • Kaihua Tang, Mingyuan Tao, Jiaxin Qi, Zhenguang Liu, Hanwang Zhang
In fact, even if the class is balanced, samples within each class may still be long-tailed due to the varying attributes.
Ranked #1 on
Long-tail Learning
on ImageNet-GLT
1 code implementation • ICLR 2022 • Xinting Hu, Yulei Niu, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang
Our method is three-fold: 1) We propose Class-Aware Propensity (CAP) that exploits the unlabeled data to train an improved classifier using the biased labeled data.
1 code implementation • 29 Jun 2022 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
In this report, we present our approach for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
Ranked #7 on
Multi-Instance Retrieval
on EPIC-KITCHENS-100
1 code implementation • 26 Jun 2022 • Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim
Most methods consider only one joint embedding space between global visual and textual features without considering the local structures of each modality.
Ranked #12 on
Video Retrieval
on YouCook2
1 code implementation • 30 May 2022 • Beier Zhu, Yulei Niu, Yucheng Han, Yue Wu, Hanwang Zhang
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e. g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]".
1 code implementation • 24 May 2022 • Haiteng Zhao, Chang Ma, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng, Hanwang Zhang
Deep learning models have achieved great success in many fields, yet they are vulnerable to adversarial examples.
1 code implementation • CVPR 2022 • Zhaozheng Chen, Tan Wang, Xiongwei Wu, Xian-Sheng Hua, Hanwang Zhang, Qianru Sun
Specifically, due to the sum-over-class pooling nature of BCE, each pixel in CAM may be responsive to multiple classes co-occurring in the same receptive field.
Weakly supervised Semantic Segmentation
Weakly-Supervised Semantic Segmentation
1 code implementation • 31 Dec 2021 • Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, Hanwang Zhang
We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck.
1 code implementation • 29 Dec 2021 • Beier Zhu, Yulei Niu, Xian-Sheng Hua, Hanwang Zhang
We address the overlooked unbiasedness in existing long-tailed classification methods: we find that their overall improvement is mostly attributed to the biased preference of tail over head, as the test distribution is assumed to be balanced; however, when the test is as imbalanced as the long-tailed training data -- let the test respect Zipf's law of nature -- the tail bias is no longer beneficial overall because it hurts the head majorities.
1 code implementation • NeurIPS 2021 • Xinhsuai Dong, Luu Anh Tuan, Min Lin, Shuicheng Yan, Hanwang Zhang
The fine-tuning of pre-trained language models has a great success in many NLP fields.
1 code implementation • NeurIPS 2021 • Yulei Niu, Hanwang Zhang
Question answering (QA) models are well-known to exploit data bias, e. g., the language prior in visual QA and the position bias in reading comprehension.
1 code implementation • NeurIPS 2021 • Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang
A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics).
1 code implementation • 3 Oct 2021 • Long Chen, Yuhang Zheng, Yulei Niu, Hanwang Zhang, Jun Xiao
Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST).
no code implementations • ICCV 2021 • Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai
We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.
1 code implementation • ICCV 2021 • Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang
Attention module does not always help deep models learn causal features that are robust in any confounding context, e. g., a foreground object feature is invariant to different backgrounds.
1 code implementation • ACL 2021 • Yixin Cao, Xiang Ji, Xin Lv, Juanzi Li, Yonggang Wen, Hanwang Zhang
We present InferWiki, a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns.
1 code implementation • ICCV 2021 • Zhongqi Yue, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang
However, the theoretical solution provided by transportability is far from practical for UDA, because it requires the stratification and representation of the unobserved confounder that is the cause of the domain gap.
2 code implementations • 17 Jun 2021 • Kaihua Tang, Mingyuan Tao, Hanwang Zhang
As these visual confounders are imperceptible in general, we propose to use the instrumental variable that achieves causal intervention without the need for confounder observation.
1 code implementation • Findings (ACL) 2021 • Fuli Feng, Jizhi Zhang, Xiangnan He, Hanwang Zhang, Tat-Seng Chua
Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning.
no code implementations • 12 May 2021 • Chenchi Zhang, Wenbo Ma, Jun Xiao, Hanwang Zhang, Jian Shao, Yueting Zhuang, Long Chen
In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i. e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i. e., query-aware).
1 code implementation • EMNLP 2021 • Jiaxin Shi, Shulin Cao, Lei Hou, Juanzi Li, Hanwang Zhang
Multi-hop Question Answering (QA) is a challenging task because it requires precise reasoning with entity relations at every step towards the answer.
1 code implementation • CVPR 2021 • YuAn Liu, Jingyuan Chen, Zhenfang Chen, Bing Deng, Jianqiang Huang, Hanwang Zhang
The key challenge is how to distinguish the action of interest segments from the background, which is unlabelled even on the video-level.
Weakly-supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization
no code implementations • CVPR 2021 • Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai
Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.
1 code implementation • CVPR 2021 • Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang
We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation.
1 code implementation • CVPR 2021 • Zhongqi Yue, Tan Wang, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua
We show that the key reason is that the generation is not Counterfactual Faithful, and thus we propose a faithful one, whose generation is from the sample-specific counterfactual question: What would the sample look like, if we set its class attribute to a certain class, while keeping its sample attribute unchanged?
no code implementations • ACM International Conference on Multimedia 2020 • Yang, Xu, Chongyang Gao, Hanwang Zhang, and Jianfei Cai
We propose irredundant attention in SSG-RNN to improve the possibility of abstracting topics from rarely described sub-graphs and inheriting attention in WSG-RNN to generate more grounded sentences with the abstracted topics, both of which give rise to more distinctive paragraphs.
1 code implementation • NeurIPS 2020 • Zhongqi Yue, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua
Specifically, we develop three effective IFSL algorithmic implementations based on the backdoor adjustment, which is essentially a causal intervention towards the SCM of many-shot learning: the upper-bound of FSL in a causal view.
2 code implementations • NeurIPS 2020 • Kaihua Tang, Jianqiang Huang, Hanwang Zhang
On one hand, it has a harmful causal effect that misleads the tail prediction biased towards the head.
Ranked #34 on
Long-tail Learning
on CIFAR-10-LT (ρ=10)
1 code implementation • NeurIPS 2020 • Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, Qianru Sun
We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS).
Ranked #26 on
Weakly-Supervised Semantic Segmentation
on COCO 2014 val
1 code implementation • 21 Sep 2020 • Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua
However, we argue that there is a significant gap between clicks and user satisfaction -- it is common that a user is "cheated" to click an item by the attractive title/cover of the item.
1 code implementation • 3 Sep 2020 • Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Shih-Fu Chang
The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals.
1 code implementation • ECCV 2020 • Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua, Qianru Sun
Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.
2 code implementations • ACL 2022 • Shulin Cao, Jiaxin Shi, Liangming Pan, Lunyiu Nie, Yutong Xiang, Lei Hou, Juanzi Li, Bin He, Hanwang Zhang
To this end, we introduce KQA Pro, a dataset for Complex KBQA including ~120K diverse natural language questions.
1 code implementation • CVPR 2021 • Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen
VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language.
1 code implementation • CVPR 2020 • Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.
Ranked #12 on
Visual Dialog
on VisDial v0.9 val
1 code implementation • CVPR 2020 • Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, Hanwang Zhang
Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data.
1 code implementation • CVPR 2020 • Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang
To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision.
2 code implementations • CVPR 2020 • Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, ShiLiang Pu, Yueting Zhuang
To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP.
Ranked #1 on
Visual Question Answering (VQA)
on VQA-CP
(using extra training data)
no code implementations • 9 Mar 2020 • Xu Yang, Hanwang Zhang, Jianfei Cai
The dataset bias in vision-language tasks is becoming one of the main problems that hinder the progress of our community.
no code implementations • 5 Mar 2020 • Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua
Graph Convolutional Network (GCN) is an emerging technique that performs learning and reasoning on graph data.
2 code implementations • CVPR 2020 • Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA.
Ranked #23 on
Image Captioning
on COCO Captions
6 code implementations • CVPR 2020 • Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang
Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e. g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach".
Ranked #1 on
Scene Graph Generation
on Visual Genome
no code implementations • 5 Jan 2020 • Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.
Ranked #1 on
Partial Label Learning
on MPII Movie Description
1 code implementation • CVPR 2020 • Jiaxin Qi, Yulei Niu, Jianqiang Huang, Hanwang Zhang
This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial).
no code implementations • 8 Jul 2019 • Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang
Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.
no code implementations • 9 Jun 2019 • Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, Qianru Sun
In this paper, we alleviate the missing-annotation problem and enable the joint reasoning by leveraging the language scene graph which covers both labeled referent and unlabeled contexts (other objects, attributes, and relationships).
1 code implementation • 6 Jun 2019 • Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu
With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.
no code implementations • 5 Jun 2019 • Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, Hanwang Zhang
Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space.
no code implementations • ICCV 2019 • Xu Yang, Hanwang Zhang, Jianfei Cai
To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).
no code implementations • ICCV 2019 • Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang
We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history.
Ranked #10 on
Visual Dialog
on VisDial v0.9 val
no code implementations • ICCV 2019 • Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha
In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed.
2 code implementations • CVPR 2019 • Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai
We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.
1 code implementation • CVPR 2019 • Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen
Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.
Ranked #13 on
Visual Dialog
on VisDial v0.9 val
no code implementations • ICCV 2019 • Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, ShiLiang Pu, Shih-Fu Chang
CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.
6 code implementations • CVPR 2019 • Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu
We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.
Ranked #5 on
Panoptic Scene Graph Generation
on PSG Dataset
2 code implementations • CVPR 2019 • Jiaxin Shi, Hanwang Zhang, Juanzi Li
We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using scene graphs --- objects as nodes and the pairwise relationships as edges --- for explainable and explicit reasoning with structured knowledge.
Ranked #10 on
Visual Question Answering (VQA)
on CLEVR
1 code implementation • 6 Nov 2018 • Jiaxin Shi, Chen Liang, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang
We propose DeepChannel, a robust, data-efficient, and interpretable neural model for extractive document summarization.
2 code implementations • 6 Nov 2018 • Jiaxin Shi, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang
Sentence embedding is an effective feature representation for most deep learning-based NLP tasks.
no code implementations • NeurIPS 2018 • Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang
Deep neural networks suffer from over-fitting and catastrophic forgetting when trained with small data.
no code implementations • 1 Sep 2018 • Qiangeng Xu, Hanwang Zhang, Weiyue Wang, Peter N. Belhumeur, Ulrich Neumann
In this paper, we introduce a stochastic dynamics video infilling (SDVI) framework to generate frames between long intervals in a video.
1 code implementation • 16 Aug 2018 • Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu
To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning.
1 code implementation • ECCV 2018 • Xu Yang, Hanwang Zhang, Jianfei Cai
By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.
1 code implementation • 6 May 2018 • Han Liu, Xiangnan He, Fuli Feng, Liqiang Nie, Rui Liu, Hanwang Zhang
In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation.
no code implementations • 3 Apr 2018 • Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.
no code implementations • 7 Feb 2018 • Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.
1 code implementation • CVPR 2018 • Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang
We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training.
1 code implementation • CVPR 2018 • Hanwang Zhang, Yulei Niu, Shih-Fu Chang
This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.
42 code implementations • WWW 2017 • Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua
When it comes to model the key factor in collaborative filtering -- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items.
3 code implementations • 16 Aug 2017 • Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua
To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data.
6 code implementations • 15 Aug 2017 • Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua
Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions.
no code implementations • ICCV 2017 • Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, Shih-Fu Chang
We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level.
1 code implementation • 14 May 2017 • Lizi Liao, Xiangnan He, Hanwang Zhang, Tat-Seng Chua
For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks.
Social and Information Networks
2 code implementations • CVPR 2017 • Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua
To the best of our knowledge, VTransE is the first end-to-end relation detection network.
2 code implementations • CVPR 2017 • Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua
Existing visual attention models are generally spatial, i. e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image.
no code implementations • CVPR 2016 • Hanwang Zhang, Xindi Shang, Wenzhuo Yang, Huan Xu, Huanbo Luan, Tat-Seng Chua
Leveraging on the structure of the proposed collaborative learning formulation, we develop an efficient online algorithm that can jointly learn the label embeddings and visual classifiers.
no code implementations • ICCV 2015 • Xue Geng, Hanwang Zhang, Jingwen Bian, Tat-Seng Chua
It is often a great challenge for traditional recommender systems to learn representative features of both users and images in large social networks, in particular, social curation networks, which are characterized as the extremely sparse links between users and images, and the extremely diverse visual contents of images.