1 code implementation • 30 Mar 2022 • Hanqing Wang, Wei Liang, Jianbing Shen, Luc van Gool, Wenguan Wang
Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions.
1 code implementation • 28 Mar 2022 • Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc van Gool
Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes.
2 code implementations • 27 Mar 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang
In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
1 code implementation • 27 Mar 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, Yi Yang
Our target is to learn visual correspondence from unlabeled videos.
1 code implementation • 26 Mar 2022 • Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang
In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.
1 code implementation • 18 Mar 2022 • Chen Liang, Wenguan Wang, Tianfei Zhou, Jiaxu Miao, Yawei Luo, Yi Yang
In light of this, we present Locater (local-global context aware Transformer), which augments the Transformer architecture with a finite memory so as to query the entire video with the language expression in an efficient manner.
no code implementations • 10 Sep 2021 • Rui Gong, Martin Danelljan, Dengxin Dai, Wenguan Wang, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc van Gool
We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy.
1 code implementation • 2 Jul 2021 • Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, Luc van Gool
Video segmentation, i. e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to virtual background creation in video conferencing, just to name a few.
1 code implementation • 2 Jul 2021 • Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, LiWei Wang
As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques.
no code implementations • 2 Jun 2021 • Chen Liang, Yu Wu, Tianfei Zhou, Wenguan Wang, Zongxin Yang, Yunchao Wei, Yi Yang
Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference.
One-shot visual object segmentation
Semantic Segmentation
+2
no code implementations • CVPR 2021 • Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang
Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.
Ranked #5 on
Referring Expression Segmentation
on J-HMDB
1 code implementation • CVPR 2021 • Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, Jianbing Shen
On existing public benchmarks, face forgery detection techniques have achieved great success.
1 code implementation • CVPR 2021 • Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool
To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.
1 code implementation • CVPR 2021 • Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen
Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i. e., entailing an agent to navigate 3D environments through following linguistic instructions.
5 code implementations • ICCV 2021 • Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool
Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
no code implementations • 25 Dec 2020 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
3D data that contains rich geometry information of objects and scenes is valuable for understanding 3D physical world.
no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He
The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.
1 code implementation • ECCV 2020 • Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc van Gool, Dengxin Dai
This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances.
1 code implementation • ECCV 2020 • Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
1 code implementation • ECCV 2020 • Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, Luc van Gool
How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation.
2 code implementations • ECCV 2020 • Guolei Sun, Wenguan Wang, Jifeng Dai, Luc van Gool
Moreover, our approach ranked 1st place in the Weakly-Supervised Semantic Segmentation Track of CVPR2020 Learning from Imperfect Data Challenge.
1 code implementation • CVPR 2020 • Junbo Yin, Wenguan Wang, Qinghao Meng, Ruigang Yang, Jianbing Shen
In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure.
1 code implementation • CVPR 2020 • Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, Ling Shao
As human bodies are underlying hierarchically structured, how to model human structures is the central theme in this task.
1 code implementation • CVPR 2020 • Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi
We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.
1 code implementation • CVPR 2020 • Tianfei Zhou, Wenguan Wang, Siyuan Qi, Haibin Ling, Jianbing Shen
The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction.
1 code implementation • ICCV 2019 • Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao
The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively.
1 code implementation • ICCV 2019 • Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao
This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG).
1 code implementation • CVPR 2019 • Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, Fatih Porikli
We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view.
Ranked #1 on
Unsupervised Video Object Segmentation
on YouTube
Semantic Segmentation
Unsupervised Video Object Segmentation
+2
1 code implementation • ICCV 2019 • Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, Ling Shao
Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation.
no code implementations • CONLL 2019 • Xuewen Shi, He-Yan Huang, Wenguan Wang, Ping Jian, Yi-Kun Tang
To alleviate this problem, we propose an NMT approach that heightens the adequacy in machine translation by transferring the semantic knowledge learned from bilingual sentence alignment.
1 code implementation • ICCV 2019 • Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu
This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.
no code implementations • CVPR 2019 • Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ling Shao
The top-down process is used for coarse-to-fine saliency estimation, where high-level saliency is gradually integrated with finer lower-layer features to obtain a fine-grained result.
no code implementations • CVPR 2019 • Wenguan Wang, Shuyang Zhao, Jianbing Shen, Steven C. H. Hoi, Ali Borji
The first is the exploitation of an essential pyramid attention structure for salient object detection.
1 code implementation • CVPR 2019 • Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling
This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks.
Semantic Segmentation
Unsupervised Video Object Segmentation
+2
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2019 • Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Jianwen Xie, Song-Chun Zhu
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image.
Ranked #5 on
3D Human Pose Estimation
on HumanEva-I
1 code implementation • 19 Apr 2019 • Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, Ruigang Yang
As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years.
1 code implementation • CVPR 2019 • Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu
The answer to a given question is represented by a node with missing value.
Ranked #14 on
Visual Dialog
on VisDial v0.9 val
1 code implementation • ECCV 2018 • Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).
Ranked #1 on
Video Salient Object Detection
on UVSD
(using extra training data)
1 code implementation • ECCV 2018 • Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu
For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.
Ranked #20 on
Human-Object Interaction Detection
on V-COCO
no code implementations • CVPR 2018 • Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, Song-Chun Zhu
We collect a new dataset VideoCoAtt from public TV show videos, containing 380 complex video sequences with more than 492, 000 frames that include diverse social scenes for shared attention study.
1 code implementation • CVPR 2018 • Wenguan Wang, Yuanlu Xu, Jianbing Shen, Song-Chun Zhu
This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e. g., fashion landmark localization and clothing category classification.
1 code implementation • CVPR 2018 • Wenguan Wang, Jianbing Shen, Xingping Dong, Ali Borji
Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner.
no code implementations • CVPR 2018 • Xingping Dong, Jianbing Shen, Wenguan Wang, Yu Liu, Ling Shao, Fatih Porikli
Hyperparameters are numerical presets whose values are assigned prior to the commencement of the learning process.
no code implementations • ICCV 2019 • Kai Zhao, Shang-Hua Gao, Wenguan Wang, Ming-Ming Cheng
By reformulating the standard F-measure we propose the relaxed F-measure which is differentiable w. r. t the posterior and can be easily appended to the back of CNNs as the loss function.
1 code implementation • CVPR 2018 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.
1 code implementation • CVPR 2018 • Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji
Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments.
no code implementations • 29 Oct 2017 • Quanshi Zhang, Wenguan Wang, Song-Chun Zhu
We aim to discover representation flaws caused by potential dataset bias.
no code implementations • ICCV 2017 • Wenguan Wang, Jianbing Shen
We model the photo cropping problem as a cascade of attention box regression and aesthetic quality classification, based on deep learning.
no code implementations • 17 Oct 2017 • Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Song-Chun Zhu
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation.
1 code implementation • journal 2017 • Wenguan Wang, Jianbing Shen
Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields.
no code implementations • ICCV 2017 • Wenguan Wang, Jianbing Shen, Jianwen Xie, Fatih Porikli
We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory".
no code implementations • 28 Feb 2017 • Wenguan Wang, Jianbing Shen, Fatih Porikli
Conventional video segmentation approaches rely heavily on appearance models.
no code implementations • 2 Feb 2017 • Wenguan Wang, Jianbing Shen, Ling Shao
This paper proposes a deep learning model to efficiently detect salient regions in videos.
1 code implementation • CVPR 2015 • Wenguan Wang, Jianbing Shen, Fatih Porikli
Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background.
Ranked #5 on
Video Salient Object Detection
on DAVSOD-Difficult20
(using extra training data)
1 code implementation • IEEE Trans. on Image Processing 2014 • Jianbing Shen, Yunfan Du, Wenguan Wang, Xuelong. Li
Then, the boundaries of initial superpixels are obtained according to the probabilities and the commute time.