Search Results for author: Pu Wang

Found 44 papers, 13 papers with code

AgentPolyp: Accurate Polyp Segmentation via Image Enhancement Agent

no code implementations15 Apr 2025 Pu Wang, Zhihua Zhang, Dianjie Lu, Guijuan Zhang, Youshan Zhang, Zhuoran Zheng

Since human and environmental factors interfere, captured polyp images usually suffer from issues such as dim lighting, blur, and overexposure, which pose challenges for downstream polyp segmentation tasks.

Denoising Image Enhancement +1

UAKNN: Label Distribution Learning via Uncertainty-Aware KNN

no code implementations2 Apr 2025 Pu Wang, Yu Zhang, Zhuoran Zheng

Label Distribution Learning (LDL) aims to characterize the polysemy of an instance by building a set of descriptive degrees corresponding to the instance.

Descriptive

LiDAR Remote Sensing Meets Weak Supervision: Concepts, Methods, and Perspectives

no code implementations24 Mar 2025 Yuan Gao, Shaobo Xia, Pu Wang, Xiaohuan Xi, Sheng Nie, Cheng Wang

This review, for the first time, adopts a unified weakly supervised learning perspective to systematically examine research on both LiDAR interpretation and inversion.

Weakly-supervised Learning

PolypFlow: Reinforcing Polyp Segmentation with Flow-Driven Dynamics

no code implementations26 Feb 2025 Pu Wang, Huaizhi Ma, Zhihua Zhang, Zhuoran Zheng

Accurate polyp segmentation remains challenging due to irregular lesion morphologies, ambiguous boundaries, and heterogeneous imaging conditions.

Segmentation

mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework

no code implementations21 Jan 2025 Bingyi Liu, Jian Teng, Hongfei Xue, Enshu Wang, Chuanhui Zhu, Pu Wang, Libing Wu

Collaborative perception significantly enhances individual vehicle perception performance through the exchange of sensory information among agents.

MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action Detection

1 code implementation10 Jan 2025 Arkaprava Sinha, Monish Soundar Raj, Pu Wang, Ahmed Helmy, Srijan Das

In this work, we innovatively adapt the Mamba architecture for action detection and propose Multi-scale Temporal Mamba (MS-Temba), comprising two key components: Temporal Mamba (Temba) Blocks and the Temporal Mamba Fuser.

Action Detection Mamba

Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation

1 code implementation29 Dec 2024 Qucheng Peng, Ce Zheng, Zhengming Ding, Pu Wang, Chen Chen

To cope with the label deficiency issue, one common solution is to train the HPE models with easily available synthetic datasets (source) and apply them to real-world data (target) through domain adaptation (DA).

Domain Adaptation Pose Estimation

GenHMR: Generative Human Mesh Recovery

no code implementations19 Dec 2024 Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Pu Wang, Hongfei Xue, Srijan Das, Chen Chen

HMR from monocular images has predominantly been addressed by deterministic methods that output a single prediction for a given 2D image.

3D Reconstruction Human Mesh Recovery +1

MMHMR: Generative Masked Modeling for Hand Mesh Recovery

no code implementations18 Dec 2024 Muhammad Usama Saleem, Ekkasit Pinyoanuntapong, Mayur Jagdishbhai Patel, Hongfei Xue, Ahmed Helmy, Srijan Das, Pu Wang

Traditional discriminative methods, which learn a deterministic mapping from a 2D image to a single 3D mesh, often struggle with the inherent ambiguities in 2D-to-3D mapping.

3D Hand Pose Estimation

Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation

no code implementations26 Nov 2024 Pu Wang, Hugo Van hamme

End-to-end transformer-based automatic speech recognition (ASR) systems often capture multiple speech traits in their learned representations that are highly entangled, leading to a lack of interpretability.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ControlMM: Controllable Masked Motion Generation

no code implementations14 Oct 2024 Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul, Pu Wang, Hongfei Xue, Chen Chen, Chuan Guo, Junli Cao, Jian Ren, Sergey Tulyakov

To further enhance control precision, we introduce inference-time logit editing, which manipulates the predicted conditional motion distribution so that the generated motion, sampled from the adjusted distribution, closely adheres to the input control signals.

Motion Generation

MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning

no code implementations9 Sep 2024 Jianyi Zhang, Hao Frank Yang, Ang Li, Xin Guo, Pu Wang, Haiming Wang, Yiran Chen, Hai Li

We introduce a novel federated learning framework, named Multimodal Large Language Model Assisted Federated Learning (MLLM-LLaVA-FL), which employs powerful MLLMs at the server end to address the heterogeneous and long-tailed challenges.

Federated Learning Image Captioning +5

Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

no code implementations16 Jun 2024 Zhiwen Fan, Pu Wang, Yang Zhao, Yibo Zhao, Boris Ivanovic, Zhangyang Wang, Marco Pavone, Hao Frank Yang

Leveraging this rich dataset, we further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes, such as crash types, severity and number of injuries, based on contextual and environmental factors.

Ensemble Learning

Diffusion Gaussian Mixture Audio Denoise

no code implementations13 Jun 2024 Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang

To overcome these challenges, we propose a DiffGMM model, a denoising model based on the diffusion and Gaussian mixture models.

Audio Denoising Denoising

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living

no code implementations13 Jun 2024 Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das

To address this, we propose a semi-automated framework for curating ADL datasets, creating ADL-X, a multiview, multimodal RGBS instruction-tuning dataset.

Benchmarking Human-Object Interaction Detection +3

BAMM: Bidirectional Autoregressive Motion Model

1 code implementation28 Mar 2024 Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang, Minwoo Lee, Srijan Das, Chen Chen

To address these challenges, we propose Bidirectional Autoregressive Motion Model (BAMM), a novel text-to-motion generation framework.

Denoising model +2

MMM: Generative Masked Motion Model

1 code implementation CVPR 2024 Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee, Chen Chen

MMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into a sequence of discrete tokens in latent space, and (2) a conditional masked motion transformer that learns to predict randomly masked motion tokens, conditioned on the pre-computed text tokens.

model Motion Generation +1

DPATD: Dual-Phase Audio Transformer for Denoising

no code implementations30 Oct 2023 Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang

Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods.

Denoising Speech Enhancement

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

no code implementations23 Mar 2023 Ce Zheng, Xianpeng Liu, Qucheng Peng, Tianfu Wu, Pu Wang, Chen Chen

While image-based HMR methods have achieved impressive results, they often struggle to recover humans in dynamic scenarios, leading to temporal inconsistencies and non-smooth 3D motion predictions due to the absence of human motion.

3D Human Pose Estimation Human Mesh Recovery

A Modular Multi-stage Lightweight Graph Transformer Network for Human Pose and Shape Estimation from 2D Human Pose

no code implementations31 Jan 2023 Ayman Ali, Ekkasit Pinyoanuntapong, Pu Wang, Mohsen Dorodchi

In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency.

Computational Efficiency

Skeleton-based Human Action Recognition via Convolutional Neural Networks (CNN)

no code implementations31 Jan 2023 Ayman Ali, Ekkasit Pinyoanuntapong, Pu Wang, Mohsen Dorodchi

Recently, there has been a remarkable increase in the interest towards skeleton-based action recognition within the research community, owing to its various advantageous features, including computational efficiency, representative features, and illumination invariance.

Action Recognition Computational Efficiency +3

GaitSADA: Self-Aligned Domain Adaptation for mmWave Gait Recognition

1 code implementation31 Jan 2023 Ekkasit Pinyoanuntapong, Ayman Ali, Kalvik Jakkala, Pu Wang, Minwoo Lee, Qucheng Peng, Chen Chen, Zhi Sun

mmWave radar-based gait recognition is a novel user identification method that captures human gait biometrics from mmWave radar return signals.

Contrastive Learning Domain Adaptation +2

GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer

1 code implementation27 Oct 2022 Ekkasit Pinyoanuntapong, Ayman Ali, Pu Wang, Minwoo Lee, Chen Chen

Most existing gait recognition methods are appearance-based, which rely on the silhouettes extracted from the video data of human walking activities.

Multiview Gait Recognition Representation Learning

Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning

no code implementations4 Oct 2022 Guangyu Sun, Umar Khalid, Matias Mendieta, Pu Wang, Chen Chen

Recently, the use of small pre-trained models has been shown to be effective in federated learning optimization and improving convergence.

Federated Learning parameter-efficient fine-tuning

Quantum Feature Extraction for THz Multi-Layer Imaging

no code implementations18 Jul 2022 Toshiaki Koike-Akino, Pu Wang, Genki Yamashita, Wataru Tsujita, Makoto Nakajima

A learning-based THz multi-layer imaging has been recently used for contactless three-dimensional (3D) positioning and encoding.

BIG-bench Machine Learning Quantum Machine Learning

Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding

no code implementations28 Jun 2022 Pu Wang, Hugo Van hamme

End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data.

Spoken Language Understanding

Deep learning on rail profiles matching

1 code implementation18 May 2022 Kunqi Wang, Daolin Si, Pu Wang, Jing Ge, Peiyuan Ni, Shuguo Wang

Matching the rail cross-section profiles measured on site with the designed profile is a must to evaluate the wear of the rail, which is very important for track maintenance and rail safety.

Deep Learning

AutoQML: Automated Quantum Machine Learning for Wi-Fi Integrated Sensing and Communications

no code implementations17 May 2022 Toshiaki Koike-Akino, Pu Wang, Ye Wang

Commercial Wi-Fi devices can be used for integrated sensing and communications (ISAC) to jointly exchange data and monitor indoor environment.

BIG-bench Machine Learning ISAC +1

Quantum Transfer Learning for Wi-Fi Sensing

no code implementations17 May 2022 Toshiaki Koike-Akino, Pu Wang, Ye Wang

Beyond data communications, commercial-off-the-shelf Wi-Fi devices can be used to monitor human activities, track device locomotion, and sense the ambient environment.

Transfer Learning

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

1 code implementation CVPR 2022 Matias Mendieta, Taojiannan Yang, Pu Wang, Minwoo Lee, Zhengming Ding, Chen Chen

To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model.

Federated Learning Privacy Preserving

A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose

1 code implementation24 Nov 2021 Ce Zheng, Matias Mendieta, Pu Wang, Aidong Lu, Chen Chen

We propose a pose analysis module that uses graph transformers to exploit structured and implicit joint correlations, and a mesh regression module that combines the extracted pose feature with the mesh template to reconstruct the final human mesh.

3D Human Pose Estimation 3D Human Shape Estimation +2

Sim-to-Real Transfer in Multi-agent Reinforcement Networking for Federated Edge Computing

no code implementations18 Oct 2021 Pinyarash Pinyoanuntapong, Tagore Pothuneedi, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang

Federated Learning (FL) over wireless multi-hop edge computing networks, i. e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm.

Edge-computing Federated Learning +3

EdgeML: Towards Network-Accelerated Federated Learning over Wireless Edge

no code implementations14 Oct 2021 Pinyarash Pinyoanuntapong, Prabhu Janakaraj, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang

To solve such MDP, multi-agent reinforcement learning (MA-RL) algorithms along with domain-specific action space refining schemes are developed, which online learn the delay-minimum forwarding paths to minimize the model exchange latency between the edge devices (i. e., workers) and the remote server.

Edge-computing Federated Learning +1

A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech

no code implementations15 Jun 2021 Pu Wang, Bagher BabaAli, Hugo Van hamme

The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

1 code implementation14 May 2021 Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen

MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.

Action Recognition Image Classification +2

Pre-training for low resource speech-to-intent applications

no code implementations30 Mar 2021 Pu Wang, Hugo Van hamme

In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder, and investigate whether pre-training methodology can reduce training data requirements for the NMF and capsule network.

Decoder

KST-GCN: A Knowledge-Driven Spatial-Temporal Graph Convolutional Network for Traffic Forecasting

1 code implementation26 Nov 2020 Jiawei Zhu, Xin Han, Hanhan Deng, Chao Tao, Ling Zhao, Pu Wang, Lin Tao, Haifeng Li

On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks.

Knowledge Graphs Representation Learning

Deep CSI Learning for Gait Biometric Sensing and Recognition

1 code implementation6 Feb 2019 Kalvik Jakkala, Arupjyoti Bhuya, Zhi Sun, Pu Wang, Zhuo Cheng

Gait is a person's natural walking style and a complex biological process that is unique to each person.

Denoising Gait Identification +1

T-GCN: A Temporal Graph ConvolutionalNetwork for Traffic Prediction

10 code implementations12 Nov 2018 Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, Haifeng Li

However, traffic forecasting has always been considered an open scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time, namely, spatial dependence and temporal dependence.

Management Prediction +1

Pattern-Coupled Sparse Bayesian Learning for Recovery of Block-Sparse Signals

no code implementations9 Nov 2013 Jun Fang, Yanning Shen, Hongbin Li, Pu Wang

In this paper, we develop a new sparse Bayesian learning method for recovery of block-sparse signals with unknown cluster patterns.

Cannot find the paper you are looking for? You can Submit a new open access paper.