no code implementations • 9 Jun 2024 • Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang Nie, Tat-Seng Chua
Subsequently, we unify generation and retrieval in an autoregressive generation way and propose an autonomous decision module to choose the best-matched one between generated and retrieved images as the response to the text query.
no code implementations • 19 Apr 2024 • Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty
One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks.
1 code implementation • CVPR 2024 • Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian
Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications.
1 code implementation • CVPR 2024 • Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.
no code implementations • 3 May 2023 • Ruochen Zhao, Shafiq Joty, Yongjie Wang, Tan Wang
The emergence of large-scale pretrained language models has posed unprecedented challenges in deriving explanations of why the model has made some predictions.
1 code implementation • ICCV 2023 • Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.
Ranked #7 on Visual Reasoning on Winoground
1 code implementation • 25 Jul 2022 • Tan Wang, Qianru Sun, Sugiri Pranata, Karlekar Jayashree, Hanwang Zhang
We are interested in learning robust models from insufficient data, without the need for any externally pre-trained checkpoints.
1 code implementation • 26 Apr 2022 • Jianbin Jiang, Tan Wang, He Yan, Junhui Liu
Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e. g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment.
1 code implementation • CVPR 2022 • Zhaozheng Chen, Tan Wang, Xiongwei Wu, Xian-Sheng Hua, Hanwang Zhang, Qianru Sun
Specifically, due to the sum-over-class pooling nature of BCE, each pixel in CAM may be responsive to multiple classes co-occurring in the same receptive field.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
no code implementations • CVPR 2022 • Jianbin Jiang, Tan Wang, He Yan, Junhui Liu
Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e. g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment.
1 code implementation • NeurIPS 2021 • Tan Wang, Zhongqi Yue, Jianqiang Huang, Qianru Sun, Hanwang Zhang
A good visual representation is an inference map from observations (images) to features (vectors) that faithfully reflects the hidden modularized generative factors (semantics).
1 code implementation • ICCV 2021 • Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang
Attention module does not always help deep models learn causal features that are robust in any confounding context, e. g., a foreground object feature is invariant to different backgrounds.
no code implementations • 4 Aug 2021 • Lingdong Kong, Prakhar Ganesh, Tan Wang, Junhao Liu, Le Zhang, Yao Chen
We hope that the scale, diversity, and quality of our dataset can benefit researchers in this area and beyond.
1 code implementation • CVPR 2021 • Zhongqi Yue, Tan Wang, Hanwang Zhang, Qianru Sun, Xian-Sheng Hua
We show that the key reason is that the generation is not Counterfactual Faithful, and thus we propose a faithful one, whose generation is from the sample-specific counterfactual question: What would the sample look like, if we set its class attribute to a certain class, while keeping its sample attribute unchanged?
1 code implementation • 16 Aug 2020 • Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu, Jin Yu, Hongxia Yang, Fei Wu
In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned.
1 code implementation • CVPR 2020 • Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA.
Ranked #23 on Image Captioning on COCO Captions
2 code implementations • 12 Aug 2019 • Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song
We propose a novel framework that achieves remarkable matching performance with acceptable model complexity.
no code implementations • 1 Apr 2017 • Shuoxin Ma, Tan Wang
In this paper, scalable Whole Slide Imaging (sWSI), a novel high-throughput, cost-effective and robust whole slide imaging system on both Android and iOS platforms is introduced and analyzed.