Search Results for author: Hanwang Zhang

Found 143 papers, 93 papers with code

Pushing Rendering Boundaries: Hard Gaussian Splatting

no code implementations6 Dec 2024 Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang

To address this problem, we propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results.

Novel View Synthesis

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

1 code implementation5 Dec 2024 Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan

HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

no code implementations28 Nov 2024 Xue Song, Jiequan Cui, Hanwang Zhang, Jiaxin Shi, Jingjing Chen, Chi Zhang, Yu-Gang Jiang

Furthermore, generalizable models for image editing with visual instructions typically require quad data, i. e., a before-after image pair, along with query and target images.

Specificity Text-based Image Editing

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

1 code implementation25 Nov 2024 Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao, Long Chen

For causal generation, it introduces unidirectional feature computation, which ensures that the cache of conditional frames can be precomputed in previous autoregression steps and reused in every subsequent step, eliminating redundant computations.

Denoising Video Generation

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

no code implementations25 Nov 2024 Yuan Zhou, Qingshan Xu, Jiequan Cui, Junbao Zhou, Jing Zhang, Richang Hong, Hanwang Zhang

In this paper, we propose a new de\textbf{C}oupled du\textbf{A}l-interactive linea\textbf{R} att\textbf{E}ntion (CARE) mechanism, revealing that features' decoupling and interaction can fully unleash the power of linear attention.

Inductive Bias

Robust Fine-tuning of Zero-shot Models via Variance Reduction

1 code implementation11 Nov 2024 Beier Zhu, Jiequan Cui, Hanwang Zhang

When fine-tuning zero-shot models like CLIP, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD).

Unified Generative and Discriminative Training for Multi-modal Large Language Models

no code implementations1 Nov 2024 Wei Chow, Juncheng Li, Qifan Yu, Kaihang Pan, Hao Fei, Zhiqi Ge, Shuai Yang, Siliang Tang, Hanwang Zhang, Qianru Sun

Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval, yet struggles with complex scenarios requiring fine-grained semantic differentiation.

Dynamic Time Warping Image-text Classification +5

Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting

no code implementations25 Oct 2024 Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

Vision-language models, such as CLIP, have shown impressive generalization capacities when using appropriate text descriptions.

Few-shot NeRF by Adaptive Rendering Loss Regularization

no code implementations23 Oct 2024 Qingshan Xu, Xuanyu Yi, Jianyao Xu, Wenbing Tao, Yew-Soon Ong, Hanwang Zhang

In this work, we reveal that there exists an inconsistency between the frequency regularization of PE and rendering loss.

Novel View Synthesis

Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

no code implementations8 Oct 2024 Hao Fei, Shengqiong Wu, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

Recent developments of vision large language models (LLMs) have seen remarkable progress, yet still encounter challenges towards multimodal generalists, such as coarse-grained instance-level understanding, lack of unified support for both images and videos, and insufficient coverage across various vision tasks.

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

1 code implementation30 Sep 2024 Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, Qianru Sun

In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting.

knowledge editing

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

no code implementations9 Aug 2024 Dongsheng Wang, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang

However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives.

Image to text Text-to-Image Generation

Selective Vision-Language Subspace Projection for Few-shot CLIP

1 code implementation24 Jul 2024 Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang

Vision-language models such as CLIP are capable of mapping the different modality data into a unified feature space, enabling zero/few-shot inference by measuring the similarity of given images and texts.

Few-Shot Learning

Visual Prompt Selection for In-Context Learning Segmentation

1 code implementation14 Jul 2024 Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang

As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level.

Diversity Image Segmentation +3

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

1 code implementation16 Jun 2024 Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao

Inspired from the huge success of large language models (LLMs) and following GPT (generative pre-trained transformer), we bring causal (i. e., unidirectional) generation into VDMs, and use past frames as prompt to generate future frames.

Video Generation

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

no code implementations13 Jun 2024 Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang

Recent advancements in image generation have enabled the creation of high-quality images from text conditions.

Conditional Image Generation

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

2 code implementations10 Jun 2024 Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors.

3D Generation Attribute

Towards Semantic Equivalence of Tokenization in Multimodal LLM

no code implementations7 Jun 2024 Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.

Visual Question Answering

Non-confusing Generation of Customized Concepts in Diffusion Models

no code implementations11 May 2024 Wang Lin, Jingyuan Chen, Jiaxin Shi, Yichen Zhu, Chen Liang, Junzhong Miao, Tao Jin, Zhou Zhao, Fei Wu, Shuicheng Yan, Hanwang Zhang

We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs).

Auto-Encoding Morph-Tokens for Multimodal LLM

1 code implementation3 May 2024 Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.

Image Reconstruction MORPH

Dual-Modal Prompting for Sketch-Based Image Retrieval

no code implementations29 Apr 2024 Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang, Hanwang Zhang, Yanning Zhang

In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval.

Retrieval Sketch-Based Image Retrieval

Diffusion Time-step Curriculum for One Image to 3D Generation

1 code implementation CVPR 2024 Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Hanwang Zhang

Score distillation sampling~(SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a \textbf{single} image.

3D Generation Image to 3D +1

View-Consistent 3D Editing with Gaussian Splatting

no code implementations18 Mar 2024 Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS.

Controllable Relation Disentanglement for Few-Shot Class-Incremental Learning

no code implementations17 Mar 2024 Yuan Zhou, Richang Hong, Yanrong Guo, Lin Liu, Shijie Hao, Hanwang Zhang

In this paper, we propose to tackle Few-Shot Class-Incremental Learning (FSCIL) from a new perspective, i. e., relation disentanglement, which means enhancing FSCIL via disentangling spurious relation between categories.

class-incremental learning Disentanglement +3

Distributionally Generative Augmentation for Fair Facial Attribute Classification

1 code implementation CVPR 2024 Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang

This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.

Attribute Classification +2

Discriminative Probing and Tuning for Text-to-Image Generation

no code implementations CVPR 2024 Leigang Qu, Wenjie Wang, Yongqi Li, Hanwang Zhang, Liqiang Nie, Tat-Seng Chua

We present a discriminative adapter built on T2I models to probe their discriminative abilities on two representative tasks and leverage discriminative fine-tuning to improve their text-image alignment.

Text-to-Image Generation

Doubly Abductive Counterfactual Inference for Text-based Image Editing

1 code implementation CVPR 2024 Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang

Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning.

counterfactual Counterfactual Inference +2

Few-shot Learner Parameterization by Diffusion Time-steps

1 code implementation CVPR 2024 Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Qianru Sun

To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i. e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent.

Few-Shot Learning Inductive Bias

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

1 code implementation CVPR 2024 Jiequan Cui, Beier Zhu, Xin Wen, Xiaojuan Qi, Bei Yu, Hanwang Zhang

Second, with the proposed concept of Model Prediction Bias, we investigate the origins of problematic representation during optimization.

Contrastive Learning Data Augmentation +3

Exploring Diffusion Time-steps for Unsupervised Representation Learning

1 code implementation21 Jan 2024 Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

Representation learning is all about discovering the hidden modular attributes that generate the data faithfully.

Attribute counterfactual +3

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

1 code implementation CVPR 2024 Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang

To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model.

3D Generation Text to 3D

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

1 code implementation10 Jan 2024 Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang

Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.

3D Object Detection Data Augmentation +2

MGNet: Learning Correspondences via Multiple Graphs

no code implementations10 Jan 2024 Luanyuan Dai, Xiaoyu Du, Hanwang Zhang, Jinhui Tang

To obtain information integrating implicit and explicit local graphs, we construct local graphs from implicit and explicit aspects and combine them effectively, which is used to build a global graph.

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

2 code implementations15 Dec 2023 Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

As Archimedes famously said, ``Give me a lever long enough and a fulcrum on which to place it, and I shall move the world'', in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters.

Image Captioning In-Context Learning +4

Invariant Feature Regularization for Fair Face Recognition

3 code implementations ICCV 2023 Jiali Ma, Zhongqi Yue, Kagaya Tomoyuki, Suzuki Tomoki, Karlekar Jayashree, Sugiri Pranata, Hanwang Zhang

Unfortunately, face datasets inevitably capture the imbalanced demographic attributes that are ubiquitous in real-world observations, and the model learns biased feature that generalizes poorly in the minority group.

Diversity Face Recognition

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

1 code implementation NeurIPS 2023 Beier Zhu, Kaihua Tang, Qianru Sun, Hanwang Zhang

In this study, we systematically examine the biases in foundation models and demonstrate the efficacy of our proposed Generalized Logit Adjustment (GLA) method.

Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

1 code implementation NeurIPS 2023 Zhongqi Yue, Hanwang Zhang, Qianru Sun

Domain Adaptation (DA) is always challenged by the spurious correlation between domain-invariant features (e. g., class identity) and domain-specific features (e. g., environment) that does not generalize to the target domain.

Unsupervised Domain Adaptation

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

no code implementations ICCV 2023 Xuanyu Yi, Jiajun Deng, Qianru Sun, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-trained 2D model.

3D Shape Classification Retrieval

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation8 Aug 2023 Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Caption Generation Image Captioning +2

Random Boxes Are Open-world Object Detectors

1 code implementation ICCV 2023 Yanghao Wang, Zhongqi Yue, Xian-Sheng Hua, Hanwang Zhang

First, as the randomization is independent of the distribution of the limited known objects, the random proposals become the instrumental variable that prevents the training from being confounded by the known objects.

Object object-detection +1

DisCo: Disentangled Control for Realistic Human Dance Generation

1 code implementation CVPR 2024 Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.


Fast Diffusion Model

1 code implementation12 Jun 2023 Zike Wu, Pan Zhou, Kenji Kawaguchi, Hanwang Zhang

In this paper, we propose a Fast Diffusion Model (FDM) to significantly speed up DMs from a stochastic optimization perspective for both faster training and sampling.

Image Generation

An Overview of Challenges in Egocentric Text-Video Retrieval

no code implementations7 Jun 2023 Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Text-video retrieval contains various challenges, including biases coming from diverse sources.

Retrieval Video Retrieval

Decoupled Kullback-Leibler Divergence Loss

4 code implementations23 May 2023 Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels.

Adversarial Defense Adversarial Robustness +1

Equivariant Similarity for Vision-Language Foundation Models

1 code implementation ICCV 2023 Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.

Image-text Retrieval Text Retrieval +2

Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

1 code implementation CVPR 2023 Hui Lv, Zhongqi Yue, Qianru Sun, Bin Luo, Zhen Cui, Hanwang Zhang

At each MIL training iteration, we use the current detector to divide the samples into two groups with different context biases: the most confident abnormal/normal snippets and the rest ambiguous ones.

Anomaly Detection Multiple Instance Learning +1

Semantic Scene Completion with Cleaner Self

1 code implementation CVPR 2023 Fengyun Wang, Dong Zhang, Hanwang Zhang, Jinhui Tang, Qianru Sun

SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF).

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

1 code implementation1 Feb 2023 Kaifeng Gao, Long Chen, Hanwang Zhang, Jun Xiao, Qianru Sun

Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones.

Object Relation +1

Debiased Fine-Tuning for Vision-language Models by Prompt Regularization

no code implementations29 Jan 2023 Beier Zhu, Yulei Niu, Saeil Lee, Minhoe Hur, Hanwang Zhang

We present a new paradigm for fine-tuning large-scale visionlanguage pre-trained models on downstream task, dubbed Prompt Regularization (ProReg).

Adaptively Clustering Neighbor Elements for Image-Text Generation

1 code implementation5 Jan 2023 Zihua Wang, Xu Yang, Hanwang Zhang, Haiyang Xu, Ming Yan, Fei Huang, Yu Zhang

In this gradual clustering process, a parsing tree is generated which embeds the hierarchical knowledge of the input sequence.

Clustering Decoder +5

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations ICCV 2023 Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Question Answering Retrieval +4

Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery

1 code implementation CVPR 2023 Muli Yang, Liancheng Wang, Cheng Deng, Hanwang Zhang

Novel Class Discovery (NCD) aims to discover unknown classes without any annotation, by exploiting the transferable knowledge already learned from a base set of known classes.

Novel Class Discovery

Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground

1 code implementation ICCV 2023 Haoxin Li, YuAn Liu, Hanwang Zhang, Boyang Li

The video background is clearly a source of static bias, but the video foreground, such as the clothing of the actor, can also provide static bias.

Action Recognition Data Augmentation +1

Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation

no code implementations20 Nov 2022 Jianqiang Huang, Jian Wang, Qianru Sun, Hanwang Zhang

An intuitive solution is ``coupling'' the CAM with the long-range attention matrix of visual transformers (ViT) We find that the direct ``coupling'', e. g., pixel-wise multiplication of attention and activation, achieves a more global coverage (on the foreground), but unfortunately goes with a great increase of false positives, i. e., background pixels are mistakenly included.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Respecting Transfer Gap in Knowledge Distillation

no code implementations23 Oct 2022 Yulei Niu, Long Chen, Chang Zhou, Hanwang Zhang

The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set.

Knowledge Distillation

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

1 code implementation4 Oct 2022 Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.

Image Captioning Sentence +2

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization

1 code implementation6 Aug 2022 Jiaxin Qi, Kaihua Tang, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang

If the context in every class is evenly distributed, OOD would be trivial because the context can be easily removed due to an underlying principle: class is invariant to context.

Out-of-Distribution Generalization

Identifying Hard Noise in Long-Tailed Sample Distribution

1 code implementation27 Jul 2022 Xuanyu Yi, Kaihua Tang, Xian-Sheng Hua, Joo-Hwee Lim, Hanwang Zhang

Such imbalanced training data makes a classifier less discriminative for the tail classes, whose previously "easy" noises are now turned into "hard" ones -- they are almost as outliers as the clean tail samples.


Equivariance and Invariance Inductive Bias for Learning from Insufficient Data

1 code implementation25 Jul 2022 Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang

We are interested in learning robust models from insufficient data, without the need for any externally pre-trained checkpoints.

Inductive Bias

Invariant Feature Learning for Generalized Long-Tailed Classification

1 code implementation19 Jul 2022 Kaihua Tang, Mingyuan Tao, Jiaxin Qi, Zhenguang Liu, Hanwang Zhang

In fact, even if the class is balanced, samples within each class may still be long-tailed due to the varying attributes.

Attribute Classification +1

On Non-Random Missing Labels in Semi-Supervised Learning

1 code implementation ICLR 2022 Xinting Hu, Yulei Niu, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang

Our method is three-fold: 1) We propose Class-Aware Propensity (CAP) that exploits the unlabeled data to train an improved classifier using the biased labeled data.

Imputation Missing Labels +1

RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval

1 code implementation26 Jun 2022 Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Most methods consider only one joint embedding space between global visual and textual features without considering the local structures of each modality.

Retrieval Text to Video Retrieval +1

Prompt-aligned Gradient for Prompt Tuning

1 code implementation ICCV 2023 Beier Zhu, Yulei Niu, Yucheng Han, Yue Wu, Hanwang Zhang

Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by "prompt", e. g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity measure between the image and the prompt sentence "a photo of a [CLASS]".

Domain Adaptation Few-Shot Learning +2

Certified Robustness Against Natural Language Attacks by Causal Intervention

1 code implementation24 May 2022 Haiteng Zhao, Chang Ma, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng, Hanwang Zhang

Deep learning models have achieved great success in many fields, yet they are vulnerable to adversarial examples.

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

1 code implementation CVPR 2022 Zhaozheng Chen, Tan Wang, Xiongwei Wu, Xian-Sheng Hua, Hanwang Zhang, Qianru Sun

Specifically, due to the sum-over-class pooling nature of BCE, each pixel in CAM may be responsive to multiple classes co-occurring in the same receptive field.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Deconfounded Visual Grounding

no code implementations31 Dec 2021 Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, Hanwang Zhang

We focus on the confounding bias between language and location in the visual grounding pipeline, where we find that the bias is the major visual reasoning bottleneck.

Referring Expression Visual Grounding +1

Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification

1 code implementation29 Dec 2021 Beier Zhu, Yulei Niu, Xian-Sheng Hua, Hanwang Zhang

We address the overlooked unbiasedness in existing long-tailed classification methods: we find that their overall improvement is mostly attributed to the biased preference of tail over head, as the test distribution is assumed to be balanced; however, when the test is as imbalanced as the long-tailed training data -- let the test respect Zipf's law of nature -- the tail bias is no longer beneficial overall because it hurts the head majorities.


Introspective Distillation for Robust Question Answering

1 code implementation NeurIPS 2021 Yulei Niu, Hanwang Zhang

Question answering (QA) models are well-known to exploit data bias, e. g., the language prior in visual QA and the position bias in reading comprehension.

counterfactual Inductive Bias +3

Self-Supervised Learning Disentangled Group Representation as Feature

1 code implementation NeurIPS 2021 Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang

A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics).

Colorization Contrastive Learning +1

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

1 code implementation3 Oct 2021 Long Chen, Yuhang Zheng, Yulei Niu, Hanwang Zhang, Jun Xiao

Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST).

counterfactual Question Answering +1

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations ICCV 2021 Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Image Captioning Question Answering +1

Causal Attention for Unbiased Visual Recognition

1 code implementation ICCV 2021 Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang

Attention module does not always help deep models learn causal features that are robust in any confounding context, e. g., a foreground object feature is invariant to different backgrounds.

Are Missing Links Predictable? An Inferential Benchmark for Knowledge Graph Completion

1 code implementation ACL 2021 Yixin Cao, Xiang Ji, Xin Lv, Juanzi Li, Yonggang Wen, Hanwang Zhang

We present InferWiki, a Knowledge Graph Completion (KGC) dataset that improves upon existing benchmarks in inferential ability, assumptions, and patterns.

Knowledge Graph Completion

Transporting Causal Mechanisms for Unsupervised Domain Adaptation

1 code implementation ICCV 2021 Zhongqi Yue, Qianru Sun, Xian-Sheng Hua, Hanwang Zhang

However, the theoretical solution provided by transportability is far from practical for UDA, because it requires the stratification and representation of the unobserved confounder that is the cause of the domain gap.

Unsupervised Domain Adaptation

Adversarial Visual Robustness by Causal Intervention

2 code implementations17 Jun 2021 Kaihua Tang, Mingyuan Tao, Hanwang Zhang

As these visual confounders are imperceptible in general, we propose to use the instrumental variable that achieves causal intervention without the need for confounder observation.

Adversarial Robustness

Empowering Language Understanding with Counterfactual Reasoning

1 code implementation Findings (ACL) 2021 Fuli Feng, Jizhi Zhang, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

Present language understanding methods have demonstrated extraordinary ability of recognizing patterns in texts via machine learning.

counterfactual Counterfactual Reasoning +2

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

no code implementations12 May 2021 Chenchi Zhang, Wenbo Ma, Jun Xiao, Hanwang Zhang, Jian Shao, Yueting Zhuang, Long Chen

In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i. e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i. e., query-aware).

Image-text matching Referring Expression +2

Causal Attention for Vision-Language Tasks

no code implementations CVPR 2021 Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

Distilling Causal Effect of Data in Class-Incremental Learning

1 code implementation CVPR 2021 Xinting Hu, Kaihua Tang, Chunyan Miao, Xian-Sheng Hua, Hanwang Zhang

We propose a causal framework to explain the catastrophic forgetting in Class-Incremental Learning (CIL) and then derive a novel distillation method that is orthogonal to the existing anti-forgetting techniques, such as data replay and feature/label distillation.

class-incremental learning Class Incremental Learning +1

Counterfactual Zero-Shot and Open-Set Visual Recognition

1 code implementation CVPR 2021 Zhongqi Yue, Tan Wang, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua

We show that the key reason is that the generation is not Counterfactual Faithful, and thus we propose a faithful one, whose generation is from the sample-specific counterfactual question: What would the sample look like, if we set its class attribute to a certain class, while keeping its sample attribute unchanged?

Attribute Binary Classification +3

Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning

no code implementations ACM International Conference on Multimedia 2020 Yang, Xu, Chongyang Gao, Hanwang Zhang, and Jianfei Cai

We propose irredundant attention in SSG-RNN to improve the possibility of abstracting topics from rarely described sub-graphs and inheriting attention in WSG-RNN to generate more grounded sentences with the abstracted topics, both of which give rise to more distinctive paragraphs.

Decoder Image Paragraph Captioning +1

Interventional Few-Shot Learning

1 code implementation NeurIPS 2020 Zhongqi Yue, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua

Specifically, we develop three effective IFSL algorithmic implementations based on the backdoor adjustment, which is essentially a causal intervention towards the SCM of many-shot learning: the upper-bound of FSL in a causal view.

Few-Shot Learning

Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue

1 code implementation21 Sep 2020 Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

However, we argue that there is a significant gap between clicks and user satisfaction -- it is common that a user is "cheated" to click an item by the attractive title/cover of the item.

Click-Through Rate Prediction counterfactual +1

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

1 code implementation3 Sep 2020 Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Shih-Fu Chang

The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals.

Referring Expression Vocal Bursts Valence Prediction

Feature Pyramid Transformer

1 code implementation ECCV 2020 Dong Zhang, Hanwang Zhang, Jinhui Tang, Meng Wang, Xiansheng Hua, Qianru Sun

Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.

Instance Segmentation object-detection +3

Counterfactual VQA: A Cause-Effect Look at Language Bias

1 code implementation CVPR 2021 Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen

VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language.

counterfactual Counterfactual Inference +2

Iterative Context-Aware Graph Inference for Visual Dialog

1 code implementation CVPR 2020 Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.

Graph Attention Graph Embedding +2

Learning to Segment the Tail

1 code implementation CVPR 2020 Xinting Hu, Yi Jiang, Kaihua Tang, Jingyuan Chen, Chunyan Miao, Hanwang Zhang

Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data.

Few-Shot Learning Incremental Learning

More Grounded Image Captioning by Distilling Image-Text Matching Model

1 code implementation CVPR 2020 Yuanen Zhou, Meng Wang, Daqing Liu, Zhenzhen Hu, Hanwang Zhang

To improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong supervision.

Image Captioning Image-text matching +4

Counterfactual Samples Synthesizing for Robust Visual Question Answering

2 code implementations CVPR 2020 Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, ShiLiang Pu, Yueting Zhuang

To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP.

 Ranked #1 on Visual Question Answering (VQA) on VQA-CP (using extra training data)

counterfactual Question Answering +1

Deconfounded Image Captioning: A Causal Retrospect

no code implementations9 Mar 2020 Xu Yang, Hanwang Zhang, Jianfei Cai

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community.

Causal Inference Image Captioning

Cross-GCN: Enhancing Graph Convolutional Network with $k$-Order Feature Interactions

no code implementations5 Mar 2020 Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

Graph Convolutional Network (GCN) is an emerging technique that performs learning and reasoning on graph data.

Document Classification

Unbiased Scene Graph Generation from Biased Training

6 code implementations CVPR 2020 Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang

Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e. g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach".

Causal Inference counterfactual +2

Visual Commonsense R-CNN

1 code implementation CVPR 2020 Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun

We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA.

Image Captioning Representation Learning +1

General Partial Label Learning via Dual Bipartite Graph Autoencoder

no code implementations5 Jan 2020 Brian Chen, Bo Wu, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang

Compared to the traditional Partial Label Learning (PLL) problem, GPLL relaxes the supervision assumption from instance-level -- a label set partially labels an instance -- to group-level: 1) a label set partially labels a group of instances, where the within-group instance-label link annotations are missing, and 2) cross-group links are allowed -- instances in a group may be partially linked to the label set from another group.

Partial Label Learning

Two Causal Principles for Improving Visual Dialog

1 code implementation CVPR 2020 Jiaxin Qi, Yulei Niu, Jianqiang Huang, Hanwang Zhang

This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial).

Visual Dialog Vocal Bursts Valence Prediction

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

no code implementations8 Jul 2019 Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang

Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.

Multiple Instance Learning Referring Expression

Joint Visual Grounding with Language Scene Graphs

no code implementations9 Jun 2019 Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, Qianru Sun

In this paper, we alleviate the missing-annotation problem and enable the joint reasoning by leveraging the language scene graph which covers both labeled referent and unlabeled contexts (other objects, attributes, and relationships).

Referring Expression Visual Grounding

Context-Aware Visual Policy Network for Fine-Grained Image Captioning

1 code implementation6 Jun 2019 Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu

With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.

Image Captioning Image Paragraph Captioning +2

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

no code implementations5 Jun 2019 Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, Hanwang Zhang

Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space.

Visual Grounding Visual Reasoning

Learning to Collocate Neural Modules for Image Captioning

no code implementations ICCV 2019 Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Decoder Image Captioning +3

Making History Matter: History-Advantage Sequence Training for Visual Dialog

no code implementations ICCV 2019 Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang

We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history.

Answer Generation Decoder +5

Learning to Assemble Neural Module Tree Networks for Visual Grounding

no code implementations ICCV 2019 Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha

In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed.

Dependency Parsing Natural Language Visual Grounding +6

Auto-Encoding Scene Graphs for Image Captioning

2 code implementations CVPR 2019 Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Decoder Image Captioning +2

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

no code implementations ICCV 2019 Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, ShiLiang Pu, Shih-Fu Chang

CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.

counterfactual Graph Generation +2

Recursive Visual Attention in Visual Dialog

1 code implementation CVPR 2019 Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Question Answering Visual Dialog +1

Learning to Compose Dynamic Tree Structures for Visual Contexts

6 code implementations CVPR 2019 Kaihua Tang, Hanwang Zhang, Baoyuan Wu, Wenhan Luo, Wei Liu

We propose to compose dynamic tree structures that place the objects in an image into a visual context, helping visual reasoning tasks such as scene graph generation and visual Q&A.

Graph Generation Panoptic Scene Graph Generation +3

Explainable and Explicit Visual Reasoning over Scene Graphs

2 code implementations CVPR 2019 Jiaxin Shi, Hanwang Zhang, Juanzi Li

We aim to dismantle the prevalent black-box neural architectures used in complex visual reasoning tasks, into the proposed eXplainable and eXplicit Neural Modules (XNMs), which advance beyond existing neural module networks towards using scene graphs --- objects as nodes and the pairwise relationships as edges --- for explainable and explicit reasoning with structured knowledge.

Inductive Bias Visual Question Answering (VQA) +1

Learning to Embed Sentences Using Attentive Recursive Trees

2 code implementations6 Nov 2018 Jiaxin Shi, Lei Hou, Juanzi Li, Zhiyuan Liu, Hanwang Zhang

Sentence embedding is an effective feature representation for most deep learning-based NLP tasks.

Sentence Sentence Embedding +1

Stochastic Dynamics for Video Infilling

no code implementations1 Sep 2018 Qiangeng Xu, Hanwang Zhang, Weiyue Wang, Peter N. Belhumeur, Ulrich Neumann

In this paper, we introduce a stochastic dynamics video infilling (SDVI) framework to generate frames between long intervals in a video.

Discrete Factorization Machines for Fast Feature-based Recommendation

1 code implementation6 May 2018 Han Liu, Xiangnan He, Fuli Feng, Liqiang Nie, Rui Liu, Hanwang Zhang

In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation.

Binarization Quantization

Learning to Guide Decoding for Image Captioning

no code implementations3 Apr 2018 Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.

Attribute Decoder +1

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

no code implementations7 Feb 2018 Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.

Binarization Decoder +2

Grounding Referring Expressions in Images by Variational Context

1 code implementation CVPR 2018 Hanwang Zhang, Yulei Niu, Shih-Fu Chang

This is a general yet challenging vision-language task since it does not only require the localization of objects, but also the multimodal comprehension of context --- visual attributes (e. g., "largest", "baby") and relationships (e. g., "behind") that help to distinguish the referent from other objects, especially those of the same category.

Multiple Instance Learning Referring Expression

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

1 code implementation CVPR 2018 Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang

We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training.

General Classification Zero-Shot Learning

Neural Collaborative Filtering

43 code implementations WWW 2017 Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, Tat-Seng Chua

When it comes to model the key factor in collaborative filtering -- the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items.

Collaborative Filtering Recommendation Systems

Fast Matrix Factorization for Online Recommendation with Implicit Feedback

3 code implementations16 Aug 2017 Xiangnan He, Hanwang Zhang, Min-Yen Kan, Tat-Seng Chua

To address this, we specifically design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) technique, for efficiently optimizing a MF model with variably-weighted missing data.

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

7 code implementations15 Aug 2017 Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions.


PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

no code implementations ICCV 2017 Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, Shih-Fu Chang

We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level.

Object object-detection +2

Attributed Social Network Embedding

1 code implementation14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, Tat-Seng Chua

For social networks, besides the network structure, there also exists rich information about social actors, such as user profiles of friendship networks and textual content of citation networks.

Social and Information Networks

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

2 code implementations CVPR 2017 Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua

Existing visual attention models are generally spatial, i. e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image.

Image Captioning Sentence

Online Collaborative Learning for Open-Vocabulary Visual Classifiers

no code implementations CVPR 2016 Hanwang Zhang, Xindi Shang, Wenzhuo Yang, Huan Xu, Huanbo Luan, Tat-Seng Chua

Leveraging on the structure of the proposed collaborative learning formulation, we develop an efficient online algorithm that can jointly learn the label embeddings and visual classifiers.

Learning Image and User Features for Recommendation in Social Networks

no code implementations ICCV 2015 Xue Geng, Hanwang Zhang, Jingwen Bian, Tat-Seng Chua

It is often a great challenge for traditional recommender systems to learn representative features of both users and images in large social networks, in particular, social curation networks, which are characterized as the extremely sparse links between users and images, and the extremely diverse visual contents of images.

Recommendation Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.