1 code implementation • 11 Jan 2023 • Bo Dong, Pichao Wang, Fan Wang
On the ADE20K dataset, our model achieves 41. 8 mIoU and 4. 6 GFLOPs, which is 4. 4 mIoU higher than Segformer, with 45% less GFLOPs.
1 code implementation • 16 Nov 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang
Although improving motion recognition to some extent, these methods still face sub-optimal situations in the following aspects: (i) Data augmentation, i. e., the scale of the RGB-D datasets is still limited, and few efforts have been made to explore novel data augmentation strategies for videos; (ii) Optimization mechanism, i. e., the tightly space-time-entangled network structure brings more challenges to spatiotemporal information modeling; And (iii) cross-modal knowledge fusion, i. e., the high similarity between multimodal representations caused to insufficient late fusion.
1 code implementation • NIPS 2022 • Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, Hao Li
Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem.
no code implementations • 6 Oct 2022 • Zhimin Gao, Peitao Wang, Pei Lv, Xiaoheng Jiang, Qidong Liu, Pichao Wang, Mingliang Xu, Wanqing Li
Besides, these methods directly calculate the pair-wise global self-attention equally for all the joints in both the spatial and temporal dimensions, undervaluing the effect of discriminative local joints and the short-range temporal dynamics.
no code implementations • 29 Sep 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang
To achieve these two purposes, we propose a novel data-centric ViT training framework to dynamically measure the ``difficulty'' of training samples and generate ``effective'' samples for models at different training stages.
1 code implementation • 21 Sep 2022 • Zihui Guo, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li
It has been studied either using first person vision (FPV) or third person vision (TPV).
1 code implementation • CVPR 2022 • Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao Li
The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.
Ranked #4 on
6D Pose Estimation using RGB
on LineMOD
no code implementations • 19 Feb 2022 • Shanshan Wang, Lei Zhang, Pichao Wang
In our work, considering the different importance of pair-wise samples for both feature learning and domain alignment, we deduce our BP-Triplet loss for effective UDA from the perspective of Bayesian learning.
no code implementations • 21 Jan 2022 • Pichao Wang, Fan Wang, Hao Li
During the KD process, the TCL loss transfers the local structure, exploits the higher order information, and mitigates the misalignment of the heterogeneous output of teacher and student networks.
1 code implementation • 23 Dec 2021 • Jingkai Zhou, Pichao Wang, Fan Wang, Qiong Liu, Hao Li, Rong Jin
Self-attention is powerful in modeling long-range dependencies, but it is weak in local finer-level feature learning.
Ranked #36 on
Instance Segmentation
on COCO minival
1 code implementation • CVPR 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li, Rong Jin
Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors.
1 code implementation • 2 Dec 2021 • Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
Ranked #1 on
Unsupervised Semantic Segmentation
on COCO-Stuff-171
(using extra training data)
1 code implementation • CVPR 2022 • Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc van Gool
Estimating 3D human poses from monocular videos is a challenging task due to depth ambiguity and self-occlusion.
Ranked #8 on
3D Human Pose Estimation
on MPI-INF-3DHP
1 code implementation • 23 Nov 2021 • Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin
We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks.
Ranked #1 on
Unsupervised Person Re-Identification
on Market-1501
(using extra training data)
1 code implementation • ICLR 2022 • Tongkun Xu, Weihua Chen, Pichao Wang, Fan Wang, Hao Li, Rong Jin
Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.
Ranked #1 on
Domain Adaptation
on Office-31
no code implementations • 8 Sep 2021 • Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou, Fan Wang, Hao Li, Rong Jin
In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters.
1 code implementation • 28 May 2021 • Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shuning Chang, Hao Li, Rong Jin
A key component in vision transformers is the fully-connected self-attention which is more powerful than CNNs in modelling long range dependencies.
no code implementations • 15 Apr 2021 • Zitong Yu, Xiaobai Li, Pichao Wang, Guoying Zhao
3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from emergent 3D mask attacks.
no code implementations • 30 Mar 2021 • Shuning Chang, Pichao Wang, Fan Wang, Hao Li, Jiashi Feng
Temporal action proposal generation (TAPG) is a fundamental and challenging task in video understanding, especially in temporal action detection.
1 code implementation • 26 Mar 2021 • Wenhao Li, Hong Liu, Runwei Ding, Mengyuan Liu, Pichao Wang, Wenming Yang
The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE.
Ranked #1 on
3D Human Pose Estimation
on HumanEva-I
4 code implementations • ICCV 2021 • Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, Wei Jiang
Extracting robust feature representation is one of the key challenges in object re-identification (ReID).
Ranked #1 on
Person Re-Identification
on Market-1501-C
2 code implementations • 1 Feb 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin
Comparing with previous NAS methods, the proposed Zen-NAS is magnitude times faster on multiple server-side and mobile-side GPU platforms with state-of-the-art accuracy on ImageNet.
Ranked #1 on
Neural Architecture Search
on ImageNet
no code implementations • 5 Jan 2021 • Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li
In this paper, we propose a \textbf{Tr}ansformer-based RGB-D \textbf{e}gocentric \textbf{a}ction \textbf{r}ecognition framework, called Trear.
2 code implementations • ICCV 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin
To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures.
no code implementations • 8 Dec 2020 • Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li
In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images respectively.
no code implementations • 29 Oct 2020 • Haoyuan Zhang, Yonghong Hou, Pichao Wang, Zihui Guo, Wanqing Li
The recently developed DARTS (Differentiable Architecture Search) is adopted to search for an effective network architecture that is built upon the two types of cells.
1 code implementation • 21 Aug 2020 • Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu, Stan Z. Li, Guoying Zhao
Gesture recognition has attracted considerable attention owing to its great potential in applications.
no code implementations • 21 Feb 2020 • Jingkun Gao, Xiaomin Song, Qingsong Wen, Pichao Wang, Liang Sun, Huan Xu
It is deployed as a public online service and widely adopted in different business scenarios at Alibaba Group.
no code implementations • 17 Mar 2018 • Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, Philip Ogunbona
This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition.
no code implementations • 5 Dec 2017 • Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, Xinwang Liu
Differently from the conventional ConvNet that learns the deep separable features for homogeneous modality-based classification with only one softmax loss function, the c-ConvNet enhances the discriminative power of the deeply learned features and weakens the undesired modality discrepancy by jointly optimizing a ranking loss and a softmax loss for both homogeneous and heterogeneous modalities.
no code implementations • 31 Oct 2017 • Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, Sergio Escalera
Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data.
no code implementations • 6 Jul 2017 • Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou, Wanqing Li
Recent methods based on 3D skeleton data have achieved outstanding performance due to its conciseness, robustness, and view-independent representation.
1 code implementation • 2 May 2017 • Zewei Ding, Pichao Wang, Philip O. Ogunbona, Wanqing Li
The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis.
Ranked #84 on
Skeleton Based Action Recognition
on NTU RGB+D
(Accuracy (CV) metric)
no code implementations • CVPR 2017 • Pichao Wang, Wanqing Li, Zhimin Gao, Yuyao Zhang, Chang Tang, Philip Ogunbona
Based on the scene flow vectors, we propose a new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition.
Ranked #3 on
Hand Gesture Recognition
on ChaLearn val
no code implementations • 7 Jan 2017 • Pichao Wang, Wanqing Li, Song Liu, Zhimin Gao, Chang Tang, Philip Ogunbona
This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI).
Ranked #2 on
Hand Gesture Recognition
on ChaLearn val
no code implementations • 30 Dec 2016 • Pichao Wang, Wanqing Li, Chuankun Li, Yonghong Hou
Convolutional Neural Networks (ConvNets) have recently shown promising performance in many computer vision tasks, especially image-based recognition.
Ranked #1 on
Skeleton Based Action Recognition
on Gaming 3D (G3D)
no code implementations • 8 Nov 2016 • Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li
Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition.
no code implementations • 22 Aug 2016 • Pichao Wang, Wanqing Li, Song Liu, Yuyao Zhang, Zhimin Gao, Philip Ogunbona
This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets).
no code implementations • 1 Feb 2016 • Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li
This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information.
no code implementations • 21 Jan 2016 • Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, Chang Tang
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010.
no code implementations • 10 Nov 2015 • Chang Tang, Pichao Wang, Wanqing Li
This paper presents a fast yet effective method to recognize actions from stream of noisy skeleton data, and a novel weighted covariance descriptor is adopted to accumulate evidence.
no code implementations • IEEE Transactions on Human-Machine Systems 2016 2015 • Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, Philip Ogunbona
In addition, the method was evaluated on the large dataset constructed from the above datasets.
Ranked #9 on
Multimodal Activity Recognition
on EV-Action
no code implementations • 20 Jan 2015 • Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, Philip Ogunbona
The results show that our approach can achieve state-of-the-art results on the individual datasets and without dramatical performance degradation on the Combined Dataset.
no code implementations • 14 Sep 2014 • Pichao Wang, Wanqing Li, Philip Ogunbona, Zhimin Gao, Hanling Zhang
These parts are referred to as Frequent Local Parts or FLPs.