Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation

1 code implementation30 Mar 2022 Hanqing Wang, Wei Liang, Jianbing Shen, Luc van Gool, Wenguan Wang

Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions.

Data Augmentation Vision-Language Navigation

Rethinking Semantic Segmentation: A Prototype View

1 code implementation28 Mar 2022 Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc van Gool

Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes.

Semantic Segmentation

Deep Hierarchical Semantic Segmentation

2 code implementations27 Mar 2022 Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang

In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.

Multi-Label Classification Semantic Segmentation

Visual Abductive Reasoning

1 code implementation26 Mar 2022 Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.

Local-Global Context Aware Transformer for Language-Guided Video Segmentation

1 code implementation18 Mar 2022 Chen Liang, Wenguan Wang, Tianfei Zhou, Jiaxu Miao, Yawei Luo, Yi Yang

In light of this, we present Locater (local-global context aware Transformer), which augments the Transformer architecture with a finite memory so as to query the entire video with the language expression in an efficient manner.

Frame Semantic Segmentation +4

TADA: Taxonomy Adaptive Domain Adaptation

no code implementations10 Sep 2021 Rui Gong, Martin Danelljan, Dengxin Dai, Wenguan Wang, Danda Pani Paudel, Ajad Chhatkuli, Fisher Yu, Luc van Gool

We extensively evaluate the effectiveness of our framework under different TADA settings: open taxonomy, coarse-to-fine taxonomy, and partially-overlapping taxonomy.

Contrastive Learning Domain Adaptation

A Survey on Deep Learning Technique for Video Segmentation

1 code implementation2 Jul 2021 Wenguan Wang, Tianfei Zhou, Fatih Porikli, David Crandall, Luc van Gool

Video segmentation, i. e., partitioning video frames into multiple segments or objects, plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to virtual background creation in video conferencing, just to name a few.

Autonomous Driving Scene Understanding +3

Collaborative Visual Navigation

1 code implementation2 Jul 2021 Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, LiWei Wang

As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques.

Multi-agent Reinforcement Learning Visual Navigation

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation

no code implementations CVPR 2021 Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang

Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.

Frame Referring Expression Segmentation

Face Forensics in the Wild

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, Jianbing Shen

On existing public benchmarks, face forgery detection techniques have achieved great success.

Frame Multiple Instance Learning

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

1 code implementation CVPR 2021 Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.

Human Parsing Multi-Person Pose Estimation +3

Structured Scene Memory for Vision-Language Navigation

1 code implementation CVPR 2021 Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen

Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i. e., entailing an agent to navigate 3D environments through following linguistic instructions.

Decision Making Vision-Language Navigation

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

5 code implementations ICCV 2021 Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool

Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.

Metric Learning Optical Character Recognition +2

Weakly Supervised 3D Object Detection from Lidar Point Cloud

1 code implementation ECCV 2020 Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc van Gool, Dengxin Dai

This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances.

3D Object Detection

Active Visual Information Gathering for Vision-Language Navigation

1 code implementation ECCV 2020 Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.

Vision-Language Navigation

Video Object Segmentation with Episodic Graph Memory Networks

1 code implementation ECCV 2020 Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, Luc van Gool

How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation.

Frame Semantic Segmentation +3

A Unified Object Motion and Affinity Model for Online Multi-Object Tracking

1 code implementation CVPR 2020 Junbo Yin, Wenguan Wang, Qinghao Meng, Ruigang Yang, Jianbing Shen

In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure.

Metric Learning Multi-Object Tracking +2

Hierarchical Human Parsing with Typed Part-Relation Reasoning

1 code implementation CVPR 2020 Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, Ling Shao

As human bodies are underlying hierarchically structured, how to model human structures is the central theme in this task.

Human Parsing

Learning Video Object Segmentation from Unlabeled Videos

1 code implementation CVPR 2020 Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi

We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.

Representation Learning Semantic Segmentation +3

Cascaded Human-Object Interaction Recognition

1 code implementation CVPR 2020 Tianfei Zhou, Wenguan Wang, Siyuan Qi, Haibin Ling, Jianbing Shen

The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction.

Human-Object Interaction Detection

Learning Compositional Neural Information Fusion for Human Parsing

1 code implementation ICCV 2019 Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao

The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively.

Human Parsing

Human-Aware Motion Deblurring

1 code implementation ICCV 2019 Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao

This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG).

Deblurring Image Deblurring

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

1 code implementation ICCV 2019 Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, Ling Shao

Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation.

Frame Semantic Segmentation +4

Improving Neural Machine Translation by Achieving Knowledge Transfer with Sentence Alignment Learning

no code implementations CONLL 2019 Xuewen Shi, He-Yan Huang, Wenguan Wang, Ping Jian, Yi-Kun Tang

To alleviate this problem, we propose an NMT approach that heightens the adequacy in machine translation by transferring the semantic knowledge learned from bilingual sentence alignment.

Machine Translation Sentence Embedding +3

Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning

1 code implementation ICCV 2019 Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu

This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.

An Iterative and Cooperative Top-Down and Bottom-Up Inference Network for Salient Object Detection

no code implementations CVPR 2019 Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ling Shao

The top-down process is used for coarse-to-fine saliency estimation, where high-level saliency is gradually integrated with finer lower-layer features to obtain a fine-grained result.

RGB Salient Object Detection Saliency Prediction +1

Salient Object Detection in the Deep Learning Era: An In-Depth Survey

1 code implementation19 Apr 2019 Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, Ruigang Yang

As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years.

RGB Salient Object Detection Saliency Prediction +1

Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection

1 code implementation ECCV 2018 Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam

This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).

 Ranked #1 on Video Salient Object Detection on UVSD (using extra training data)

Salient Object Detection Semantic Segmentation +3

Learning Human-Object Interactions by Graph Parsing Neural Networks

1 code implementation ECCV 2018 Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu

For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.

Human-Object Interaction Detection

Inferring Shared Attention in Social Scene Videos

no code implementations CVPR 2018 Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, Song-Chun Zhu

We collect a new dataset VideoCoAtt from public TV show videos, containing 380 complex video sequences with more than 492, 000 frames that include diverse social scenes for shared attention study.

Frame Scene Understanding

Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification

1 code implementation CVPR 2018 Wenguan Wang, Yuanlu Xu, Jianbing Shen, Song-Chun Zhu

This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e. g., fashion landmark localization and clothing category classification.

General Classification

Salient Object Detection Driven by Fixation Prediction

1 code implementation CVPR 2018 Wenguan Wang, Jianbing Shen, Xingping Dong, Ali Borji

Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner.

RGB Salient Object Detection Salient Object Detection

Optimizing the F-measure for Threshold-free Salient Object Detection

no code implementations ICCV 2019 Kai Zhao, Shang-Hua Gao, Wenguan Wang, Ming-Ming Cheng

By reformulating the standard F-measure we propose the relaxed F-measure which is differentiable w. r. t the posterior and can be easily appended to the back of CNNs as the loss function.

RGB Salient Object Detection Salient Object Detection

Learning Descriptor Networks for 3D Shape Synthesis and Analysis

1 code implementation CVPR 2018 Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu

This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.

3D Object Super-Resolution

Revisiting Video Saliency: A Large-scale Benchmark and a New Model

1 code implementation CVPR 2018 Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji

Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments.

Examining CNN Representations with respect to Dataset Bias

no code implementations29 Oct 2017 Quanshi Zhang, Wenguan Wang, Song-Chun Zhu

We aim to discover representation flaws caused by potential dataset bias.

Deep Cropping via Attention Box Prediction and Aesthetics Assessment

no code implementations ICCV 2017 Wenguan Wang, Jianbing Shen

We model the photo cropping problem as a cascade of attention box regression and aesthetic quality classification, based on deep learning.

Deep Visual Attention Prediction

1 code implementation journal 2017 Wenguan Wang, Jianbing Shen

Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields.

Saliency Prediction

Super-Trajectory for Video Segmentation

no code implementations ICCV 2017 Wenguan Wang, Jianbing Shen, Jianwen Xie, Fatih Porikli

We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory".

Frame Video Segmentation +1

Saliency-Aware Geodesic Video Object Segmentation

1 code implementation CVPR 2015 Wenguan Wang, Jianbing Shen, Fatih Porikli

Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background.

Ranked #5 on Video Salient Object Detection on DAVSOD-Difficult20 (using extra training data)

Frame Semantic Segmentation +2

Lazy Random Walks for Superpixel Segmentation

1 code implementation IEEE Trans. on Image Processing 2014 Jianbing Shen, Yunfan Du, Wenguan Wang, Xuelong. Li

Then, the boundaries of initial superpixels are obtained according to the probabilities and the commute time.


