no code implementations • ICCV 2023 • Liulei Li, Wenguan Wang, Yi Yang
Current high-performance semantic segmentation models are purely data-driven sub-symbolic approaches and blind to the structured nature of the visual world.
no code implementations • 22 Sep 2023 • James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu
This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER.
no code implementations • ICCV 2023 • Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang
Recent advances in semi-supervised semantic segmentation have been heavily reliant on pseudo labeling to compensate for limited labeled data, disregarding the valuable relational knowledge among semantic concepts.
no code implementations • ICCV 2023 • Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang
CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.
no code implementations • ICCV 2023 • Hanqing Wang, Wei Liang, Luc van Gool, Wenguan Wang
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions.
no code implementations • ICCV 2023 • Rui Liu, Xiaohan Wang, Wenguan Wang, Yi Yang
Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.
1 code implementation • ICCV 2023 • Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng
The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.
1 code implementation • 25 Jul 2023 • Haitian Zeng, Xiaohan Wang, Wenguan Wang, Yi Yang
We introduce a novel speaker model \textsc{Kefa} for navigation instruction generation.
1 code implementation • ICCV 2023 • Cheng Han, Qifan Wang, Yiming Cui, Zhiwen Cao, Wenguan Wang, Siyuan Qi, Dongfang Liu
Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning.
no code implementations • ICCV 2023 • Lu Yang, Liulei Li, Xueshi Xin, Yifan Sun, Qing Song, Wenguan Wang
Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras.
1 code implementation • 11 May 2023 • Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang
This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.
1 code implementation • 3 May 2023 • James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang
We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i. e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme.
1 code implementation • CVPR 2023 • Yurong Zhang, Liulei Li, Wenguan Wang, Rong Xie, Li Song, Wenjun Zhang
Current top-leading solutions for video object segmentation (VOS) typically follow a matching-based regime: for each query frame, the segmentation mask is inferred according to its correspondence to previously processed and the first annotated frames.
1 code implementation • 6 Apr 2023 • Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.
no code implementations • CVPR 2023 • Liulei Li, Wenguan Wang, Tianfei Zhou, Jianwu Li, Yi Yang
The objective of this paper is self-supervised learning of video object segmentation.
1 code implementation • CVPR 2023 • Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang
Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.
1 code implementation • 30 Oct 2022 • Hanqing Wang, Wei Liang, Luc van Gool, Wenguan Wang
With the emergence of varied visual navigation tasks (e. g, image-/object-/audio-goal and vision-language navigation) that specify the target in different ways, the community has made appealing advances in training specialized agents capable of handling individual navigation tasks well.
no code implementations • 28 Oct 2022 • Wenguan Wang, Yi Yang, Fei Wu
Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years.
2 code implementations • 5 Oct 2022 • Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang
Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature, class).
1 code implementation • 3 Oct 2022 • Wenguan Wang, James Liang, Dongfang Liu
Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in which instance masks are derived by querying the image feature using a set of instance-aware embeddings.
1 code implementation • 15 Sep 2022 • Wenguan Wang, Cheng Han, Tianfei Zhou, Dongfang Liu
We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers.
1 code implementation • 26 Jul 2022 • Junbo Yin, Jin Fang, Dingfu Zhou, Liangjun Zhang, Cheng-Zhong Xu, Jianbing Shen, Wenguan Wang
To reduce the dependence on large supervision, semi-supervised learning (SSL) based approaches have been proposed.
1 code implementation • 26 Jul 2022 • Junbo Yin, Dingfu Zhou, Liangjun Zhang, Jin Fang, Cheng-Zhong Xu, Jianbing Shen, Wenguan Wang
Existing approaches for unsupervised point cloud pre-training are constrained to either scene-level or point/voxel-level instance discrimination.
1 code implementation • 25 Jul 2022 • JieZhang Cao, Jingyun Liang, Kai Zhang, Yawei Li, Yulun Zhang, Wenguan Wang, Luc van Gool
Reference-based image super-resolution (RefSR) aims to exploit auxiliary reference (Ref) images to super-resolve low-resolution (LR) images.
1 code implementation • 21 Jul 2022 • JieZhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc van Gool
These issues can be alleviated by a cascade of three separate sub-tasks, including video deblurring, frame interpolation, and super-resolution, which, however, would fail to capture the spatial and temporal correlations among video sequences.
1 code implementation • 19 Jul 2022 • Yusheng Zhao, Jinyu Chen, Chen Gao, Wenguan Wang, Lirong Yang, Haibing Ren, Huaxia Xia, Si Liu
Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.
1 code implementation • CVPR 2022 • Hanqing Wang, Wei Liang, Jianbing Shen, Luc van Gool, Wenguan Wang
Since the rise of vision-language navigation (VLN), great progress has been made in instruction following -- building a follower to navigate environments under the guidance of instructions.
1 code implementation • CVPR 2022 • Tianfei Zhou, Wenguan Wang, Ender Konukoglu, Luc van Gool
Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes.
2 code implementations • CVPR 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang
In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
1 code implementation • 27 Mar 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, Yi Yang
Our target is to learn visual correspondence from unlabeled videos.
1 code implementation • CVPR 2022 • Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang
In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.
2 code implementations • 22 Mar 2022 • Zongxin Yang, Xiaohan Wang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Yi Yang
This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).
Semantic Segmentation
Semi-Supervised Video Object Segmentation
+1
1 code implementation • 18 Mar 2022 • Chen Liang, Wenguan Wang, Tianfei Zhou, Jiaxu Miao, Yawei Luo, Yi Yang
In light of this, we present Locater (local-global context aware Transformer), which augments the Transformer architecture with a finite memory so as to query the entire video with the language expression in an efficient manner.
Ranked #7 on
Referring Expression Segmentation
on A2D Sentences
Referring Expression Segmentation
Referring Video Object Segmentation
+4
no code implementations • CVPR 2022 • Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, Yi Yang
Our target is to learn visual correspondence from unlabeled videos.
1 code implementation • 2 Jul 2021 • Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, LiWei Wang
As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques.
1 code implementation • 2 Jul 2021 • Tianfei Zhou, Fatih Porikli, David Crandall, Luc van Gool, Wenguan Wang
Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing.
no code implementations • 2 Jun 2021 • Chen Liang, Yu Wu, Tianfei Zhou, Wenguan Wang, Zongxin Yang, Yunchao Wei, Yi Yang
Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference.
One-shot visual object segmentation
Referring Video Object Segmentation
+2
no code implementations • CVPR 2021 • Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang
Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.
Ranked #8 on
Referring Expression Segmentation
on J-HMDB
1 code implementation • CVPR 2021 • Tianfei Zhou, Wenguan Wang, Zhiyuan Liang, Jianbing Shen
On existing public benchmarks, face forgery detection techniques have achieved great success.
1 code implementation • CVPR 2021 • Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, Luc van Gool
To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner.
1 code implementation • CVPR 2021 • Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen
Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i. e., entailing an agent to navigate 3D environments through following linguistic instructions.
5 code implementations • ICCV 2021 • Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool
Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
no code implementations • 25 Dec 2020 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
3D data that contains rich geometry information of objects and scenes is valuable for understanding 3D physical world.
no code implementations • 17 Oct 2020 • Yunchao Wei, Shuai Zheng, Ming-Ming Cheng, Hang Zhao, LiWei Wang, Errui Ding, Yi Yang, Antonio Torralba, Ting Liu, Guolei Sun, Wenguan Wang, Luc van Gool, Wonho Bae, Junhyug Noh, Jinhwan Seo, Gunhee Kim, Hao Zhao, Ming Lu, Anbang Yao, Yiwen Guo, Yurong Chen, Li Zhang, Chuangchuang Tan, Tao Ruan, Guanghua Gu, Shikui Wei, Yao Zhao, Mariia Dobko, Ostap Viniavskyi, Oles Dobosevych, Zhendong Wang, Zhenyuan Chen, Chen Gong, Huanqing Yan, Jun He
The purpose of the Learning from Imperfect Data (LID) workshop is to inspire and facilitate the research in developing novel approaches that would harness the imperfect data and improve the data-efficiency during training.
1 code implementation • ECCV 2020 • Qinghao Meng, Wenguan Wang, Tianfei Zhou, Jianbing Shen, Luc van Gool, Dengxin Dai
This work proposes a weakly supervised approach for 3D object detection, only requiring a small set of weakly annotated scenes, associated with a few precisely labeled object instances.
1 code implementation • ECCV 2020 • Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen
Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments.
1 code implementation • ECCV 2020 • Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, Luc van Gool
How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation.
2 code implementations • ECCV 2020 • Guolei Sun, Wenguan Wang, Jifeng Dai, Luc van Gool
Moreover, our approach ranked 1st place in the Weakly-Supervised Semantic Segmentation Track of CVPR2020 Learning from Imperfect Data Challenge.
Object Localization
Weakly supervised Semantic Segmentation
+1
1 code implementation • CVPR 2020 • Junbo Yin, Wenguan Wang, Qinghao Meng, Ruigang Yang, Jianbing Shen
In this paper, we propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA, in order to learn a compact feature that is discriminative for both object motion and affinity measure.
1 code implementation • CVPR 2020 • Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi
We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.
1 code implementation • CVPR 2020 • Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, Ling Shao
As human bodies are underlying hierarchically structured, how to model human structures is the central theme in this task.
1 code implementation • CVPR 2020 • Tianfei Zhou, Wenguan Wang, Siyuan Qi, Haibin Ling, Jianbing Shen
The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction.
1 code implementation • ICCV 2019 • Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao
This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG).
1 code implementation • ICCV 2019 • Wenguan Wang, Xiankai Lu, Jianbing Shen, David Crandall, Ling Shao
Through parametric message passing, AGNN is able to efficiently capture and mine much richer and higher-order relations between video frames, thus enabling a more complete understanding of video content and more accurate foreground estimation.
Semantic Segmentation
Unsupervised Video Object Segmentation
+3
1 code implementation • ICCV 2019 • Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, Ling Shao
The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively.
1 code implementation • CVPR 2019 • Xiankai Lu, Wenguan Wang, Chao Ma, Jianbing Shen, Ling Shao, Fatih Porikli
We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view.
Semantic Segmentation
Unsupervised Video Object Segmentation
+2
no code implementations • CONLL 2019 • Xuewen Shi, He-Yan Huang, Wenguan Wang, Ping Jian, Yi-Kun Tang
To alleviate this problem, we propose an NMT approach that heightens the adequacy in machine translation by transferring the semantic knowledge learned from bilingual sentence alignment.
1 code implementation • ICCV 2019 • Lifeng Fan, Wenguan Wang, Siyuan Huang, Xinyu Tang, Song-Chun Zhu
This paper addresses a new problem of understanding human gaze communication in social videos from both atomic-level and event-level, which is significant for studying human social interactions.
no code implementations • CVPR 2019 • Wenguan Wang, Shuyang Zhao, Jianbing Shen, Steven C. H. Hoi, Ali Borji
The first is the exploitation of an essential pyramid attention structure for salient object detection.
no code implementations • CVPR 2019 • Wenguan Wang, Jianbing Shen, Ming-Ming Cheng, Ling Shao
The top-down process is used for coarse-to-fine saliency estimation, where high-level saliency is gradually integrated with finer lower-layer features to obtain a fine-grained result.
no code implementations • IEEE Transactions on Pattern Analysis and Machine Intelligence 2019 • Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Jianwen Xie, Song-Chun Zhu
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation from a monocular RGB image.
Ranked #13 on
3D Human Pose Estimation
on HumanEva-I
1 code implementation • CVPR 2019 • Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling
This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks.
Semantic Segmentation
Unsupervised Video Object Segmentation
+2
1 code implementation • 19 Apr 2019 • Wenguan Wang, Qiuxia Lai, Huazhu Fu, Jianbing Shen, Haibin Ling, Ruigang Yang
As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years.
1 code implementation • CVPR 2019 • Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu
The answer to a given question is represented by a node with missing value.
Ranked #14 on
Visual Dialog
on VisDial v0.9 val
1 code implementation • ECCV 2018 • Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).
Ranked #1 on
Video Salient Object Detection
on UVSD
(using extra training data)
1 code implementation • ECCV 2018 • Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu
For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.
Ranked #31 on
Human-Object Interaction Detection
on V-COCO
no code implementations • CVPR 2018 • Xingping Dong, Jianbing Shen, Wenguan Wang, Yu Liu, Ling Shao, Fatih Porikli
Hyperparameters are numerical presets whose values are assigned prior to the commencement of the learning process.
1 code implementation • CVPR 2018 • Wenguan Wang, Yuanlu Xu, Jianbing Shen, Song-Chun Zhu
This paper proposes a knowledge-guided fashion network to solve the problem of visual fashion analysis, e. g., fashion landmark localization and clothing category classification.
1 code implementation • CVPR 2018 • Wenguan Wang, Jianbing Shen, Xingping Dong, Ali Borji
Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner.
no code implementations • CVPR 2018 • Lifeng Fan, Yixin Chen, Ping Wei, Wenguan Wang, Song-Chun Zhu
We collect a new dataset VideoCoAtt from public TV show videos, containing 380 complex video sequences with more than 492, 000 frames that include diverse social scenes for shared attention study.
no code implementations • ICCV 2019 • Kai Zhao, Shang-Hua Gao, Wenguan Wang, Ming-Ming Cheng
By reformulating the standard F-measure we propose the relaxed F-measure which is differentiable w. r. t the posterior and can be easily appended to the back of CNNs as the loss function.
1 code implementation • CVPR 2018 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.
1 code implementation • CVPR 2018 • Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, Ali Borji
Existing video saliency datasets lack variety and generality of common dynamic scenes and fall short in covering challenging situations in unconstrained environments.
no code implementations • 29 Oct 2017 • Quanshi Zhang, Wenguan Wang, Song-Chun Zhu
We aim to discover representation flaws caused by potential dataset bias.
no code implementations • ICCV 2017 • Wenguan Wang, Jianbing Shen
We model the photo cropping problem as a cascade of attention box regression and aesthetic quality classification, based on deep learning.
no code implementations • 17 Oct 2017 • Hao-Shu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, Song-Chun Zhu
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation.
Ranked #1 on
3D Absolute Human Pose Estimation
on Human3.6M
(Average MPJPE (mm) metric)
1 code implementation • journal 2017 • Wenguan Wang, Jianbing Shen
Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields.
no code implementations • 28 Feb 2017 • Wenguan Wang, Jianbing Shen, Fatih Porikli
Conventional video segmentation approaches rely heavily on appearance models.
no code implementations • ICCV 2017 • Wenguan Wang, Jianbing Shen, Jianwen Xie, Fatih Porikli
We introduce a novel semi-supervised video segmentation approach based on an efficient video representation, called as "super-trajectory".
no code implementations • 2 Feb 2017 • Wenguan Wang, Jianbing Shen, Ling Shao
This paper proposes a deep learning model to efficiently detect salient regions in videos.
1 code implementation • CVPR 2015 • Wenguan Wang, Jianbing Shen, Fatih Porikli
Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background.
Ranked #5 on
Video Salient Object Detection
on DAVSOD-Difficult20
(using extra training data)
1 code implementation • IEEE Trans. on Image Processing 2014 • Jianbing Shen, Yunfan Du, Wenguan Wang, Xuelong. Li
Then, the boundaries of initial superpixels are obtained according to the probabilities and the commute time.