Activity Detection
63 papers with code • 1 benchmarks • 12 datasets
Detecting activities in extended videos.
Libraries
Use these libraries to find Activity Detection models and implementationsDatasets
Latest papers
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection.
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Most automatic speech processing systems register degraded performance when applied to noisy or reverberant speech.
MM-ALT: A Multimodal Automatic Lyric Transcription System
Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest from both the speech and music information retrieval communities, given its significant application potential.
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels
To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry highlighting the changes in the vessel's moving pattern which tends to indicate fishing activity.
Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay
Introducing adversarial multi-task learning to the model is observed to increase performance in terms of Area Under Curve (AUC), particularly in noisy environments, while the performance is not degraded at higher SNR levels.
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD.
Low-Latency Speech Separation Guided Diarization for Telephone Conversations
In particular, we compare two low-latency speech separation models.
Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random Access
Joint activity detection and channel estimation (JADCE) for grant-free random access is a critical issue that needs to be addressed to support massive connectivity in IoT networks.
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios
Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.
HGCN: Harmonic gated compensation network for speech enhancement
Mask processing in the time-frequency (T-F) domain through the neural network has been one of the mainstreams for single-channel speech enhancement.