30 papers with code • 0 benchmarks • 9 datasets
Detecting activities in extended videos.
In this work, we propose novel decoding algorithms to enable streaming automatic speech recognition (ASR) on unsegmented long-form recordings without voice activity detection (VAD), based on monotonic chunkwise attention (MoChA) with an auxiliary connectionist temporal classification (CTC) objective.
In particular, the common sparsity pattern in the received pilot and data signal has been ignored in most existing studies, and auxiliary information of channel decoding has not been utilized for user activity detection.
A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known Diffusion nets architecture.
Both knowledge distillation and the first fine-tuning step are carried out on manually segmented real and synthetic data, the latter being generated with an MT system trained on the available corpora.
Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications.
Moreover, the proposed algorithm unrolling approach inherits the structure and domain knowledge of the ISTA, thereby maintaining the algorithm robustness, which can handle non-Gaussian preamble sequence matrix in massive access.
The availability of large-scale video action understanding datasets has facilitated advances in the interpretation of visual scenes containing people.
Internet of Things (IoT) has triggered a rapid increase in the number of connected devices and new use cases of wireless communications.
Specifically, at each iteration, the proposed active set CD algorithm first selects a small subset of all devices, namely the active set, which contains a few devices that contribute the most to the deviation from the first-order optimality condition of the MLE problem thus potentially can provide the most improvement to the objective function, then applies the CD algorithm to perform the detection for the devices in the active set.
In this paper, we propose a turbo receiver for joint activity detection and data decoding in grant-free massive random access, which iterates between a detector and a belief propagation (BP)-based channel decoder.