Search Results for author: Pichao Wang

Found 57 papers, 25 papers with code

Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences

no code implementations20 Jan 2015 Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, Philip Ogunbona

The results show that our approach can achieve state-of-the-art results on the individual datasets and without dramatical performance degradation on the Combined Dataset.

Action Recognition Temporal Action Localization

Online Action Recognition based on Incremental Learning of Weighted Covariance Descriptors

no code implementations10 Nov 2015 Chang Tang, Pichao Wang, Wanqing Li

This paper presents a fast yet effective method to recognize actions from stream of noisy skeleton data, and a novel weighted covariance descriptor is adopted to accumulate evidence.

Action Recognition Incremental Learning +1

RGB-D-based Action Recognition Datasets: A Survey

no code implementations21 Jan 2016 Jing Zhang, Wanqing Li, Philip O. Ogunbona, Pichao Wang, Chang Tang

Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010.

Action Recognition Temporal Action Localization

Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier

no code implementations1 Feb 2016 Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

This paper proposes a new framework for RGB-D-based action recognition that takes advantages of hand-designed features from skeleton data and deeply learned features from depth maps, and exploits effectively both the local and global temporal information.

Action Recognition Temporal Action Localization

Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

no code implementations22 Aug 2016 Pichao Wang, Wanqing Li, Song Liu, Yuyao Zhang, Zhimin Gao, Philip Ogunbona

This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets).

General Classification Gesture Recognition

Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks

no code implementations8 Nov 2016 Pichao Wang, Zhaoyang Li, Yonghong Hou, Wanqing Li

Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition.

Action Recognition Temporal Action Localization

Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks

no code implementations30 Dec 2016 Pichao Wang, Wanqing Li, Chuankun Li, Yonghong Hou

Convolutional Neural Networks (ConvNets) have recently shown promising performance in many computer vision tasks, especially image-based recognition.

Action Recognition Skeleton Based Action Recognition +1

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

no code implementations7 Jan 2017 Pichao Wang, Wanqing Li, Song Liu, Zhimin Gao, Chang Tang, Philip Ogunbona

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI).

General Classification Gesture Recognition

Scene Flow to Action Map: A New Representation for RGB-D based Action Recognition with Convolutional Neural Networks

no code implementations CVPR 2017 Pichao Wang, Wanqing Li, Zhimin Gao, Yuyao Zhang, Chang Tang, Philip Ogunbona

Based on the scene flow vectors, we propose a new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition.

3D Action Recognition

Skeleton-based Action Recognition Using LSTM and CNN

no code implementations6 Jul 2017 Chuankun Li, Pichao Wang, Shuang Wang, Yonghong Hou, Wanqing Li

Recent methods based on 3D skeleton data have achieved outstanding performance due to its conciseness, robustness, and view-independent representation.

Action Analysis Action Recognition +2

RGB-D-based Human Motion Recognition with Deep Learning: A Survey

no code implementations31 Oct 2017 Pichao Wang, Wanqing Li, Philip Ogunbona, Jun Wan, Sergio Escalera

Specifically, deep learning methods based on the CNN and RNN architectures have been adopted for motion recognition using RGB-D data.

Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition

no code implementations5 Dec 2017 Pichao Wang, Wanqing Li, Jun Wan, Philip Ogunbona, Xinwang Liu

Differently from the conventional ConvNet that learns the deep separable features for homogeneous modality-based classification with only one softmax loss function, the c-ConvNet enhances the discriminative power of the deeply learned features and weakens the undesired modality discrepancy by jointly optimizing a ranking loss and a softmax loss for both homogeneous and heterogeneous modalities.

Action Recognition Temporal Action Localization

Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks

no code implementations17 Mar 2018 Pichao Wang, Wanqing Li, Zhimin Gao, Chang Tang, Philip Ogunbona

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition.

3D Action Recognition Gesture Recognition

SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching

no code implementations29 Oct 2020 Haoyuan Zhang, Yonghong Hou, Pichao Wang, Zihui Guo, Wanqing Li

The recently developed DARTS (Differentiable Architecture Search) is adopted to search for an effective network architecture that is built upon the two types of cells.

Action Recognition Skeleton Based Action Recognition

Transformer Guided Geometry Model for Flow-Based Unsupervised Visual Odometry

no code implementations8 Dec 2020 Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

In this paper, we propose a method consisting of two camera pose estimators that deal with the information from pairwise images and a short sequence of images respectively.

Visual Odometry

Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition

2 code implementations ICCV 2021 Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures.

Neural Architecture Search Vocal Bursts Intensity Prediction

Trear: Transformer-based RGB-D Egocentric Action Recognition

no code implementations5 Jan 2021 Xiangyu Li, Yonghong Hou, Pichao Wang, Zhimin Gao, Mingliang Xu, Wanqing Li

In this paper, we propose a \textbf{Tr}ansformer-based RGB-D \textbf{e}gocentric \textbf{a}ction \textbf{r}ecognition framework, called Trear.

Action Recognition Optical Flow Estimation

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

2 code implementations1 Feb 2021 Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

Comparing with previous NAS methods, the proposed Zen-NAS is magnitude times faster on multiple server-side and mobile-side GPU platforms with state-of-the-art accuracy on ImageNet.

Image Classification Neural Architecture Search

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation

no code implementations30 Mar 2021 Shuning Chang, Pichao Wang, Fan Wang, Hao Li, Jiashi Feng

Temporal action proposal generation (TAPG) is a fundamental and challenging task in video understanding, especially in temporal action detection.

Action Detection Temporal Action Proposal Generation +1

TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection

no code implementations15 Apr 2021 Zitong Yu, Xiaobai Li, Pichao Wang, Guoying Zhao

3D mask face presentation attack detection (PAD) plays a vital role in securing face recognition systems from emergent 3D mask attacks.

Face Presentation Attack Detection Face Recognition

KVT: k-NN Attention for Boosting Vision Transformers

1 code implementation28 May 2021 Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shuning Chang, Hao Li, Rong Jin

A key component in vision transformers is the fully-connected self-attention which is more powerful than CNNs in modelling long range dependencies.

Scaled ReLU Matters for Training Vision Transformers

no code implementations8 Sep 2021 Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou, Fan Wang, Hao Li, Rong Jin

In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters.

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

2 code implementations ICLR 2022 Tongkun Xu, Weihua Chen, Pichao Wang, Fan Wang, Hao Li, Rong Jin

Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.

Unsupervised Domain Adaptation

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

2 code implementations23 Nov 2021 Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin

We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks.

 Ranked #1 on Unsupervised Person Re-Identification on Market-1501 (using extra training data)

Self-Supervised Learning Unsupervised Domain Adaptation +1

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

1 code implementation2 Dec 2021 Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin

Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.

Ranked #2 on Unsupervised Semantic Segmentation on COCO-Stuff-171 (using extra training data)

Segmentation Self-Supervised Learning +1

ELSA: Enhanced Local Self-Attention for Vision Transformer

1 code implementation23 Dec 2021 Jingkai Zhou, Pichao Wang, Fan Wang, Qiong Liu, Hao Li, Rong Jin

Self-attention is powerful in modeling long-range dependencies, but it is weak in local finer-level feature learning.

Image Classification Instance Segmentation +2

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

no code implementations21 Jan 2022 Pichao Wang, Fan Wang, Hao Li

During the KD process, the TCL loss transfers the local structure, exploits the higher order information, and mitigates the misalignment of the heterogeneous output of teacher and student networks.

Knowledge Distillation Transfer Learning

BP-Triplet Net for Unsupervised Domain Adaptation: A Bayesian Perspective

no code implementations19 Feb 2022 Shanshan Wang, Lei Zhang, Pichao Wang

In our work, considering the different importance of pair-wise samples for both feature learning and domain alignment, we deduce our BP-Triplet loss for effective UDA from the perspective of Bayesian learning.

Metric Learning Unsupervised Domain Adaptation

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

1 code implementation CVPR 2022 Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao Li

The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.

3D Object Detection 6D Pose Estimation using RGB +1

Effective Vision Transformer Training: A Data-Centric Perspective

no code implementations29 Sep 2022 Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang

To achieve these two purposes, we propose a novel data-centric ViT training framework to dynamically measure the ``difficulty'' of training samples and generate ``effective'' samples for models at different training stages.

Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition

no code implementations6 Oct 2022 Zhimin Gao, Peitao Wang, Pei Lv, Xiaoheng Jiang, Qidong Liu, Pichao Wang, Mingliang Xu, Wanqing Li

Besides, these methods directly calculate the pair-wise global self-attention equally for all the joints in both the spatial and temporal dimensions, undervaluing the effect of discriminative local joints and the short-range temporal dynamics.

Action Recognition Skeleton Based Action Recognition

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

1 code implementation NIPS 2022 Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, Hao Li

Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem.

A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition

1 code implementation16 Nov 2022 Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang

Although improving motion recognition to some extent, these methods still face sub-optimal situations in the following aspects: (i) Data augmentation, i. e., the scale of the RGB-D datasets is still limited, and few efforts have been made to explore novel data augmentation strategies for videos; (ii) Optimization mechanism, i. e., the tightly space-time-entangled network structure brings more challenges to spatiotemporal information modeling; And (iii) cross-modal knowledge fusion, i. e., the high similarity between multimodal representations caused to insufficient late fusion.

Action Recognition Data Augmentation +2

Head-Free Lightweight Semantic Segmentation with Linear Transformer

1 code implementation11 Jan 2023 Bo Dong, Pichao Wang, Fan Wang

On the ADE20K dataset, our model achieves 41. 8 mIoU and 4. 6 GFLOPs, which is 4. 4 mIoU higher than Segformer, with 45% less GFLOPs.

Segmentation Semantic Segmentation

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm

no code implementations14 Mar 2023 Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng Shou

In view of the practicality of PETL, previous works focus on tuning a small set of parameters for each downstream task in an end-to-end manner while rarely considering the task distribution shift issue between the pre-training task and the downstream task.

Transfer Learning Vocal Bursts Valence Prediction

Making Vision Transformers Efficient from A Token Sparsification View

1 code implementation CVPR 2023 Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin, Mike Zheng Shou

In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks.

Image Classification Instance Segmentation +4

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

2 code implementations22 Mar 2023 Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li

In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold.

3D Object Detection 6D Pose Estimation using RGB +1

Selective Structured State-Spaces for Long-Form Video Understanding

no code implementations CVPR 2023 Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid

To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.

Contrastive Learning Video Understanding

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

2 code implementations CVPR 2023 Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen

However, in real scenarios, the performance of PoseFormer and its follow-ups is limited by two factors: (a) The length of the input joint sequence; (b) The quality of 2D joint detection.

3D Human Pose Estimation Human Dynamics

DOAD: Decoupled One Stage Action Detection Network

no code implementations1 Apr 2023 Shuning Chang, Pichao Wang, Fan Wang, Jiashi Feng, Mike Zheng Show

Specifically, one branch focuses on detection representation for actor detection, and the other one for action recognition.

Action Detection Action Recognition +1

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

no code implementations ICCV 2023 Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar

Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment.

Retrieval Text to Video Retrieval +2

Revisiting Vision Transformer from the View of Path Ensemble

no code implementations ICCV 2023 Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou

Therefore, we propose the path pruning and EnsembleScale skills for improvement, which cut out the underperforming paths and re-weight the ensemble components, respectively, to optimize the path combination and make the short paths focus on providing high-quality representation for subsequent paths.

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

1 code implementation23 Aug 2023 Yujun Ma, Benjia Zhou, Ruili Wang, Pichao Wang

RGB-D action and gesture recognition remain an interesting topic in human-centered scene understanding, primarily due to the multiple granularities and large variation in human motion.

Gesture Recognition Scene Understanding

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

2 code implementations15 Sep 2023 Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou

Recently, many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1% of extra parameters could surpass full fine-tuning in low-data resource scenarios.

Domain Generalization Few-Shot Learning

Human Pose-based Estimation, Tracking and Action Recognition with Deep Learning: A Survey

no code implementations19 Oct 2023 Lijuan Zhou, Xiang Meng, Zhihuan Liu, Mengqi Wu, Zhimin Gao, Pichao Wang

This paper presents a comprehensive survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition. Pose estimation involves the determination of human joint positions from images or image sequences.

2D Pose Estimation 3D Pose Estimation +3

Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation

1 code implementation20 Nov 2023 Wenhao Li, Mengyuan Liu, Hong Liu, Pichao Wang, Jialun Cai, Nicu Sebe

Transformers have been successfully applied in the field of video-based 3D human pose estimation.

3D Human Pose Estimation

Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval

1 code implementation26 Mar 2024 Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang Tao

Correspondingly, a single text embedding may be less expressive to capture the video embedding and empower the retrieval.

Retrieval Video Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.