A learning-based THz multi-layer imaging has been recently used for contactless three-dimensional (3D) positioning and encoding.
End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data.
Matching the rail cross-section profiles measured on site with the designed profile is a must to evaluate the wear of the rail, which is very important for track maintenance and rail safety.
Beyond data communications, commercial-off-the-shelf Wi-Fi devices can be used to monitor human activities, track device locomotion, and sense the ambient environment.
Commercial Wi-Fi devices can be used for integrated sensing and communications (ISAC) to jointly exchange data and monitor indoor environment.
We consider the object recognition problem in autonomous driving using automotive radar sensors.
To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model.
We propose a pose analysis module that uses graph transformers to exploit structured and implicit joint correlations, and a mesh regression module that combines the extracted pose feature with the mesh template to reconstruct the final human mesh.
Ranked #44 on 3D Human Pose Estimation on 3DPW
Federated Learning (FL) over wireless multi-hop edge computing networks, i. e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm.
To solve such MDP, multi-agent reinforcement learning (MA-RL) algorithms along with domain-specific action space refining schemes are developed, which online learn the delay-minimum forwarding paths to minimize the model exchange latency between the edge devices (i. e., workers) and the remote server.
The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech and finetuning on a mixture of dysarthric and normal speech.
MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.
In this paper we combine the encoder of an end-to-end ASR system with the prior NMF/capsule network-based user-taught decoder, and investigate whether pre-training methodology can reduce training data requirements for the NMF and capsule network.
On this background, this study presents a knowledge representation-driven traffic forecasting method based on spatial-temporal graph convolutional networks.
Traffic forecasting is a fundamental and challenging task in the field of intelligent transportation.
Gait is a person's natural walking style and a complex biological process that is unique to each person.
However, traffic forecasting has always been considered an open scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time, namely, spatial dependence and temporal dependence.
In this paper, we develop a new sparse Bayesian learning method for recovery of block-sparse signals with unknown cluster patterns.