no code implementations • 15 Apr 2025 • Pu Wang, Zhihua Zhang, Dianjie Lu, Guijuan Zhang, Youshan Zhang, Zhuoran Zheng
Since human and environmental factors interfere, captured polyp images usually suffer from issues such as dim lighting, blur, and overexposure, which pose challenges for downstream polyp segmentation tasks.
no code implementations • 6 Apr 2025 • Foram Niravbhai Shah, Parshwa Shah, Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Ahmed Helmy
Recent advances in dance generation have enabled automatic synthesis of 3D dance motions.
Ranked #1 on
Motion Synthesis
on FineDance
no code implementations • 2 Apr 2025 • Pu Wang, Yu Zhang, Zhuoran Zheng
Label Distribution Learning (LDL) aims to characterize the polysemy of an instance by building a set of descriptive degrees corresponding to the instance.
no code implementations • 24 Mar 2025 • Yuan Gao, Shaobo Xia, Pu Wang, Xiaohuan Xi, Sheng Nie, Cheng Wang
This review, for the first time, adopts a unified weakly supervised learning perspective to systematically examine research on both LiDAR interpretation and inversion.
no code implementations • 26 Feb 2025 • Pu Wang, Huaizhi Ma, Zhihua Zhang, Zhuoran Zheng
Accurate polyp segmentation remains challenging due to irregular lesion morphologies, ambiguous boundaries, and heterogeneous imaging conditions.
no code implementations • 21 Jan 2025 • Bingyi Liu, Jian Teng, Hongfei Xue, Enshu Wang, Chuanhui Zhu, Pu Wang, Libing Wu
Collaborative perception significantly enhances individual vehicle perception performance through the exchange of sensory information among agents.
no code implementations • 14 Jan 2025 • Farnoosh Koleini, Muhammad Usama Saleem, Pu Wang, Hongfei Xue, Ahmed Helmy, Abbey Fenwick
Recent advancements in 3D human pose estimation from single-camera images and videos have relied on parametric models, like SMPL.
Ranked #4 on
3D Human Pose Estimation
on EMDB
1 code implementation • 10 Jan 2025 • Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Srijan Das
In this work, we innovatively adapt the Mamba architecture for action detection and propose Multi-scale Temporal Mamba (MS-Temba), comprising two key components: Temporal Mamba (Temba) Blocks and the Temporal Mamba Fuser.
1 code implementation • 29 Dec 2024 • Qucheng Peng, Ce Zheng, Zhengming Ding, Pu Wang, Chen Chen
To cope with the label deficiency issue, one common solution is to train the HPE models with easily available synthetic datasets (source) and apply them to real-world data (target) through domain adaptation (DA).
no code implementations • 19 Dec 2024 • Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Srijan Das, Chen Chen
HMR from monocular images has predominantly been addressed by deterministic methods that output a single prediction for a given 2D image.
Ranked #3 on
3D Human Pose Estimation
on EMDB
no code implementations • 18 Dec 2024 • Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang
Traditional discriminative methods, which learn a deterministic mapping from a 2D image to a single 3D mesh, often struggle with the inherent ambiguities in 2D-to-3D mapping.
no code implementations • 26 Nov 2024 • Pu Wang, Hugo Van hamme
End-to-end transformer-based automatic speech recognition (ASR) systems often capture multiple speech traits in their learned representations that are highly entangled, leading to a lack of interpretability.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 Oct 2024 • Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov
To further enhance control precision, we introduce inference-time logit editing, which manipulates the predicted conditional motion distribution so that the generated motion, sampled from the adjusted distribution, closely adheres to the input control signals.
no code implementations • 9 Sep 2024 • Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li
We introduce a novel federated learning framework, named Multimodal Large Language Model Assisted Federated Learning (MLLM-LLaVA-FL), which employs powerful MLLMs at the server end to address the heterogeneous and long-tailed challenges.
no code implementations • 16 Jun 2024 • Zhiwen Fan, Pu Wang, Yang Zhao, Yibo Zhao, Boris Ivanovic, Zhangyang Wang, Marco Pavone, Hao Frank Yang
Leveraging this rich dataset, we further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes, such as crash types, severity and number of injuries, based on contextual and environmental factors.
no code implementations • 13 Jun 2024 • Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang
To overcome these challenges, we propose a DiffGMM model, a denoising model based on the diffusion and Gaussian mixture models.
no code implementations • 13 Jun 2024 • Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das
To address this, we propose a semi-automated framework for curating ADL datasets, creating ADL-X, a multiview, multimodal RGBS instruction-tuning dataset.
1 code implementation • 28 Mar 2024 • Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen
To address these challenges, we propose Bidirectional Autoregressive Motion Model (BAMM), a novel text-to-motion generation framework.
Ranked #7 on
Motion Synthesis
on KIT Motion-Language
1 code implementation • CVPR 2024 • Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee, Chen Chen
MMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into a sequence of discrete tokens in latent space, and (2) a conditional masked motion transformer that learns to predict randomly masked motion tokens, conditioned on the pre-computed text tokens.
Ranked #13 on
Motion Synthesis
on KIT Motion-Language
no code implementations • 30 Oct 2023 • Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang
Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods.
no code implementations • 23 Mar 2023 • Ce Zheng, Xianpeng Liu, Qucheng Peng, Tianfu Wu, Pu Wang, Chen Chen
While image-based HMR methods have achieved impressive results, they often struggle to recover humans in dynamic scenarios, leading to temporal inconsistencies and non-smooth 3D motion predictions due to the absence of human motion.
Ranked #66 on
3D Human Pose Estimation
on 3DPW
no code implementations • 31 Jan 2023 • Ayman Ali, Ekkasit Pinyoanuntapong, Pu Wang, Mohsen Dorodchi
In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency.
no code implementations • 31 Jan 2023 • Ayman Ali, Ekkasit Pinyoanuntapong, Pu Wang, Mohsen Dorodchi
Recently, there has been a remarkable increase in the interest towards skeleton-based action recognition within the research community, owing to its various advantageous features, including computational efficiency, representative features, and illumination invariance.
1 code implementation • 31 Jan 2023 • Ekkasit Pinyoanuntapong, Ayman Ali, Kalvik Jakkala, Pu Wang, Minwoo Lee, Qucheng Peng, Chen Chen, Zhi Sun
mmWave radar-based gait recognition is a novel user identification method that captures human gait biometrics from mmWave radar return signals.
1 code implementation • 27 Oct 2022 • Ekkasit Pinyoanuntapong, Ayman Ali, Pu Wang, Minwoo Lee, Chen Chen
Most existing gait recognition methods are appearance-based, which rely on the silhouettes extracted from the video data of human walking activities.
Ranked #9 on
Multiview Gait Recognition
on CASIA-B
no code implementations • 4 Oct 2022 • Guangyu Sun, Umar Khalid, Matias Mendieta, Pu Wang, Chen Chen
Recently, the use of small pre-trained models has been shown to be effective in federated learning optimization and improving convergence.
no code implementations • 18 Jul 2022 • Toshiaki Koike-Akino, Pu Wang, Genki Yamashita, Wataru Tsujita, Makoto Nakajima
A learning-based THz multi-layer imaging has been recently used for contactless three-dimensional (3D) positioning and encoding.
no code implementations • 28 Jun 2022 • Pu Wang, Hugo Van hamme
End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data.
1 code implementation • 18 May 2022 • Kunqi Wang, Daolin Si, Pu Wang, Jing Ge, Peiyuan Ni, Shuguo Wang
Matching the rail cross-section profiles measured on site with the designed profile is a must to evaluate the wear of the rail, which is very important for track maintenance and rail safety.
no code implementations • 17 May 2022 • Toshiaki Koike-Akino, Pu Wang, Ye Wang
Commercial Wi-Fi devices can be used for integrated sensing and communications (ISAC) to jointly exchange data and monitor indoor environment.
no code implementations • 17 May 2022 • Toshiaki Koike-Akino, Pu Wang, Ye Wang
Beyond data communications, commercial-off-the-shelf Wi-Fi devices can be used to monitor human activities, track device locomotion, and sense the ambient environment.
no code implementations • CVPR 2022 • Peizhao Li, Pu Wang, Karl Berntorp, Hongfu Liu
We consider the object recognition problem in autonomous driving using automotive radar sensors.
Ranked #2 on
Multiple Object Tracking
on RADIATE
1 code implementation • CVPR 2022 • Matias Mendieta, Taojiannan Yang, Pu Wang, Minwoo Lee, Zhengming Ding, Chen Chen
To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model.
1 code implementation • 24 Nov 2021 • Ce Zheng, Matias Mendieta, Pu Wang, Aidong Lu, Chen Chen
We propose a pose analysis module that uses graph transformers to exploit structured and implicit joint correlations, and a mesh regression module that combines the extracted pose feature with the mesh template to reconstruct the final human mesh.
Ranked #71 on
3D Human Pose Estimation
on 3DPW
no code implementations • 18 Oct 2021 • Pinyarash Pinyoanuntapong, Tagore Pothuneedi, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang
Federated Learning (FL) over wireless multi-hop edge computing networks, i. e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm.
no code implementations • 14 Oct 2021 • Pinyarash Pinyoanuntapong, Prabhu Janakaraj, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang
To solve such MDP, multi-agent reinforcement learning (MA-RL) algorithms along with domain-specific action space refining schemes are developed, which online learn the delay-minimum forwarding paths to minimize the model exchange latency between the edge devices (i. e., workers) and the remote server.
no code implementations • 15 Jun 2021 • Pu Wang, Bagher BabaAli, Hugo Van hamme
The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 14 May 2021 • Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen
MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.
no code implementations • 30 Mar 2021 • Pu Wang, Hugo Van hamme
In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder, and investigate whether pre-training methodology can reduce training data requirements for the NMF and capsule network.
1 code implementation • 26 Nov 2020 • Jiawei Zhu, Xin Han, Hanhan Deng, Chao Tao, Ling Zhao, Pu Wang, Lin Tao, Haifeng Li
On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks.
no code implementations • 22 Nov 2020 • Jiawei Zhu, Chao Tao, Hanhan Deng, Ling Zhao, Pu Wang, Tao Lin, Haifeng Li
Traffic forecasting is a fundamental and challenging task in the field of intelligent transportation.
1 code implementation • 6 Feb 2019 • Kalvik Jakkala, Arupjyoti Bhuya, Zhi Sun, Pu Wang, Zhuo Cheng
Gait is a person's natural walking style and a complex biological process that is unique to each person.
10 code implementations • 12 Nov 2018 • Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, Haifeng Li
However, traffic forecasting has always been considered an open scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time, namely, spatial dependence and temporal dependence.
Ranked #4 on
Traffic Prediction
on SZ-Taxi
no code implementations • 9 Nov 2013 • Jun Fang, Yanning Shen, Hongbin Li, Pu Wang
In this paper, we develop a new sparse Bayesian learning method for recovery of block-sparse signals with unknown cluster patterns.