Search Results for author: Ailing Zeng

Found 43 papers, 27 papers with code

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation25 Jan 2024 Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

GPAvatar: Generalizable and Precise Head Avatar from Image(s)

1 code implementation18 Jan 2024 Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya Harada

Head avatar reconstruction, crucial for applications in virtual reality, online meetings, gaming, and film industries, has garnered substantial attention within the computer vision community.

Neural Rendering Novel View Synthesis

DPoser: Diffusion Model as Robust 3D Human Pose Prior

1 code implementation9 Dec 2023 Junzhe Lu, Jing Lin, Hongkun Dou, Ailing Zeng, Yue Deng, Yulun Zhang, Haoqian Wang

Our approach demonstrates considerable enhancements over common uniform scheduling used in image domains, boasting improvements of 5. 4%, 17. 2%, and 3. 8% across human mesh recovery, pose completion, and motion denoising, respectively.

Denoising Human Mesh Recovery +1

PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

no code implementations7 Dec 2023 Yinhuai Wang, Jing Lin, Ailing Zeng, Zhengyi Luo, Jian Zhang, Lei Zhang

To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills.

Human-Object Interaction Detection Object

HumanTOMATO: Text-aligned Whole-body Motion Generation

no code implementations19 Oct 2023 Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang, Lei Zhang, Heung-Yeung Shum

This work targets a novel text-driven whole-body motion generation task, which takes a given textual description as input and aims at generating high-quality, diverse, and coherent facial expressions, hand gestures, and body motions simultaneously.

UniPose: Detecting Any Keypoints

1 code implementation12 Oct 2023 Jie Yang, Ailing Zeng, Ruimao Zhang, Lei Zhang

This work proposes a unified framework called UniPose to detect keypoints of any articulated (e. g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation.

 Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)

2D Human Pose Estimation 2D Pose Estimation +4

Bridging the Gap between Human Motion and Action Semantics via Kinematic Phrases

no code implementations6 Oct 2023 Xinpeng Liu, Yong-Lu Li, Ailing Zeng, Zizheng Zhou, Yang You, Cewu Lu

The goal of motion understanding is to establish a reliable mapping between motion and action semantics, while it is a challenging many-to-many problem.

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

1 code implementation2 Oct 2023 Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.

Image Generation Text-based Image Editing

FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

1 code implementation10 Sep 2023 Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang

To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.

3D Human Pose Estimation 3D Pose Estimation +1

Neural Interactive Keypoint Detection

1 code implementation ICCV 2023 Jie Yang, Ailing Zeng, Feng Li, Shilong Liu, Ruimao Zhang, Lei Zhang

Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process.

Keypoint Detection

Effective Whole-body Pose Estimation with Two-stages Distillation

1 code implementation29 Jul 2023 Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li

Different from the previous self-knowledge distillation, this stage finetunes the student's head with only 20% training time as a plug-and-play training strategy.

 Ranked #1 on 2D Human Pose Estimation on COCO-WholeBody (using extra training data)

2D Human Pose Estimation Pose Estimation +1

FITS: Modeling Time Series with $10k$ Parameters

1 code implementation6 Jul 2023 Zhijian Xu, Ailing Zeng, Qiang Xu

In this paper, we introduce FITS, a lightweight yet powerful model for time series analysis.

Anomaly Detection Time Series +1

detrex: Benchmarking Detection Transformers

1 code implementation12 Jun 2023 Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

1 code implementation20 May 2023 Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.

Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations25 Apr 2023 Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

1 code implementation ICCV 2023 Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu

While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.

Denoising Image Generation

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

1 code implementation CVPR 2023 Jing Lin, Ailing Zeng, Haoqian Wang, Lei Zhang, Yu Li

It is challenging to perform this task with a single network due to resolution issues, i. e., the face and hands are usually located in extremely small regions.

3D Human Pose Estimation 3D Human Reconstruction +1

From Knowledge Distillation to Self-Knowledge Distillation: A Unified Approach with Normalized Loss and Customized Soft Labels

1 code implementation ICCV 2023 Zhendong Yang, Ailing Zeng, Zhe Li, Tianke Zhang, Chun Yuan, Yu Li

We decompose the KD loss and find the non-target loss from it forces the student's non-target logits to match the teacher's, but the sum of the two non-target logits is different, preventing them from being identical.

Self-Knowledge Distillation

Introducing Depth into Transformer-based 3D Object Detection

no code implementations25 Feb 2023 Hao Zhang, Hongyang Li, Ailing Zeng, Feng Li, Shilong Liu, Xingyu Liao, Lei Zhang

To address the second issue, we introduce an auxiliary learning task called Depth-aware Negative Suppression loss.

3D Object Detection Auxiliary Learning +3

FrAug: Frequency Domain Augmentation for Time Series Forecasting

no code implementations18 Feb 2023 Muxi Chen, Zhijian Xu, Ailing Zeng, Qiang Xu

In time series forecasting (TSF), we need to model the fine-grained temporal relationship within time series segments to generate accurate forecasting results given data in a look-back window.

Anomaly Detection Data Augmentation +3

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation

3 code implementations3 Feb 2023 Jie Yang, Ailing Zeng, Shilong Liu, Feng Li, Ruimao Zhang, Lei Zhang

This paper presents a novel end-to-end framework with Explicit box Detection for multi-person Pose estimation, called ED-Pose, where it unifies the contextual learning between human-level (global) and keypoint-level (local) information.

2D Human Pose Estimation Human Detection +3

Are Transformers Effective for Time Series Forecasting?

4 code implementations26 May 2022 Ailing Zeng, Muxi Chen, Lei Zhang, Qiang Xu

Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task.

Anomaly Detection Temporal Relation Extraction +2

DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation

1 code implementation16 Mar 2022 Ailing Zeng, Xuan Ju, Lei Yang, Ruiyuan Gao, Xizhou Zhu, Bo Dai, Qiang Xu

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch.

2D Human Pose Estimation 3D Human Pose Estimation +2

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

2 code implementations27 Dec 2021 Ailing Zeng, Lei Yang, Xuan Ju, Jiefeng Li, Jianyi Wang, Qiang Xu

With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect.

2D Human Pose Estimation 3D Human Pose Estimation +2

T-WaveNet: A Tree-Structured Wavelet Neural Network for Time Series Signal Analysis

no code implementations ICLR 2022 Minhao Liu, Ailing Zeng, Qiuxia Lai, Ruiyuan Gao, Min Li, Jing Qin, Qiang Xu

In this work, we propose a novel tree-structured wavelet neural network for time series signal analysis, namely T-WaveNet, by taking advantage of an inherent property of various types of signals, known as the dominant frequency range.

Activity Recognition Representation Learning +3

Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation

no code implementations ICCV 2021 Ailing Zeng, Xiao Sun, Lei Yang, Nanxuan Zhao, Minhao Liu, Qiang Xu

While the average prediction accuracy has been improved significantly over the years, the performance on hard poses with depth ambiguity, self-occlusion, and complex or rare poses is still far from satisfactory.

3D Human Pose Estimation 3D Pose Estimation +3

Information Bottleneck Approach to Spatial Attention Learning

1 code implementation7 Aug 2021 Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu

Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e. g., image classification, fine-grained recognition, cross-domain classification).

Decision Making domain classification +1

Human Pose Regression with Residual Log-likelihood Estimation

3 code implementations ICCV 2021 Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu

In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution.

3D Human Pose Estimation Multi-Person Pose Estimation +1

SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction

3 code implementations17 Jun 2021 Minhao Liu, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia Lai, Lingna Ma, Qiang Xu

One unique property of time series is that the temporal relations are largely preserved after downsampling into two sub-sequences.

 Ranked #1 on Time Series Forecasting on ETTh1 (24) Multivariate (using extra training data)

Time Series Traffic Prediction +1

Relational Graph Neural Network Design via Progressive Neural Architecture Search

no code implementations30 May 2021 Ailing Zeng, Minhao Liu, Zhiwei Liu, Ruiyuan Gao, Jing Qin, Qiang Xu

We propose a novel solution to addressing a long-standing dilemma in the representation learning of graph neural networks (GNNs): how to effectively capture and represent useful information embedded in long-distance nodes to improve the performance of nodes with low homophily without leading to performance degradation in nodes with high homophily.

Neural Architecture Search Node Classification +1

Skimming and Scanning for Untrimmed Video Action Recognition

no code implementations21 Apr 2021 Yunyan Hong, Ailing Zeng, Min Li, Cewu Lu, Li Jiang, Qiang Xu

Video action recognition (VAR) is a primary task of video understanding, and untrimmed videos are more common in real-life scenes.

Action Recognition Temporal Action Localization +1

T-WaveNet: Tree-Structured Wavelet Neural Network for Sensor-Based Time Series Analysis

no code implementations10 Dec 2020 Minhao Liu, Ailing Zeng, Qiuxia Lai, Qiang Xu

Motivated by the fact that usually a small subset of the frequency components carries the primary information for sensor data, we propose a novel tree-structured wavelet neural network for sensor data analysis, namely \emph{T-WaveNet}.

Activity Recognition Brain Computer Interface +5

SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

1 code implementation ECCV 2020 Ailing Zeng, Xiao Sun, Fuyang Huang, Minhao Liu, Qiang Xu, Stephen Lin

With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference.

Monocular 3D Human Pose Estimation

DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image

no code implementations9 Dec 2019 Fuyang Huang, Ailing Zeng, Minhao Liu, Qiuxia Lai, Qiang Xu

In this paper, we propose a two-stage fully 3D network, namely \textbf{DeepFuse}, to estimate human pose in 3D space by fusing body-worn Inertial Measurement Unit (IMU) data and multi-view images deeply.

3D Human Pose Estimation 3D Pose Estimation

Structure-Aware 3D Hourglass Network for Hand Pose Estimation from Single Depth Image

no code implementations26 Dec 2018 Fuyang Huang, Ailing Zeng, Minhao Liu, Jing Qin, Qiang Xu

Experimental results show that the proposed structure-aware 3D hourglass network is able to achieve a mean joint error of 7. 4 mm in MSRA and 8. 9 mm in NYU datasets, respectively.

Hand Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.