MMM consists of two key components: (1) a motion tokenizer that transforms 3D human motion into a sequence of discrete tokens in latent space, and (2) a conditional masked motion transformer that learns to predict randomly masked motion tokens, conditioned on the pre-computed text tokens.
Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods.
mmWave radar-based gait recognition is a novel user identification method that captures human gait biometrics from mmWave radar return signals.
Recently, there has been a remarkable increase in the interest towards skeleton-based action recognition within the research community, owing to its various advantageous features, including computational efficiency, representative features, and illumination invariance.
In this research, we address the challenge faced by existing deep learning-based human mesh reconstruction methods in balancing accuracy and computational efficiency.
Most existing gait recognition methods are appearance-based, which rely on the silhouettes extracted from the video data of human walking activities.
Ranked #5 on Multiview Gait Recognition on CASIA-B
A learning-based THz multi-layer imaging has been recently used for contactless three-dimensional (3D) positioning and encoding.
End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data.
Matching the rail cross-section profiles measured on site with the designed profile is a must to evaluate the wear of the rail, which is very important for track maintenance and rail safety.
Beyond data communications, commercial-off-the-shelf Wi-Fi devices can be used to monitor human activities, track device locomotion, and sense the ambient environment.
Commercial Wi-Fi devices can be used for integrated sensing and communications (ISAC) to jointly exchange data and monitor indoor environment.
We consider the object recognition problem in autonomous driving using automotive radar sensors.
Ranked #1 on Multiple Object Tracking on RADIATE
To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model.
We propose a pose analysis module that uses graph transformers to exploit structured and implicit joint correlations, and a mesh regression module that combines the extracted pose feature with the mesh template to reconstruct the final human mesh.
Ranked #51 on 3D Human Pose Estimation on 3DPW
Federated Learning (FL) over wireless multi-hop edge computing networks, i. e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm.
To solve such MDP, multi-agent reinforcement learning (MA-RL) algorithms along with domain-specific action space refining schemes are developed, which online learn the delay-minimum forwarding paths to minimize the model exchange latency between the edge devices (i. e., workers) and the remote server.
The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech.
MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.
In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder, and investigate whether pre-training methodology can reduce training data requirements for the NMF and capsule network.
On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks.
Traffic forecasting is a fundamental and challenging task in the field of intelligent transportation.
Gait is a person's natural walking style and a complex biological process that is unique to each person.
However, traffic forecasting has always been considered an open scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time, namely, spatial dependence and temporal dependence.
In this paper, we develop a new sparse Bayesian learning method for recovery of block-sparse signals with unknown cluster patterns.