Activity Detection

63 papers with code • 1 benchmarks • 12 datasets

Detecting activities in extended videos.

Libraries

Use these libraries to find Activity Detection models and implementations

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

mkunes/w2v2_audioframeclassification 26 Oct 2022

In this paper, we explore the effectiveness of this model on three basic speech classification tasks: speaker change detection, overlapped speech detection, and voice activity detection.

26
26 Oct 2022

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

marianne-m/brouhaha-vad 24 Oct 2022

Most automatic speech processing systems register degraded performance when applied to noisy or reverberant speech.

108
24 Oct 2022

MM-ALT: A Multimodal Automatic Lyric Transcription System

guxm2021/MM_ALT 13 Jul 2022

Automatic lyric transcription (ALT) is a nascent field of study attracting increasing interest from both the speech and music information retrieval communities, given its significant application potential.

11
13 Jul 2022

A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels

marthadais/aisclassification 12 Jul 2022

To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry highlighting the changes in the vessel's moving pattern which tends to indicate fishing activity.

10
12 Jul 2022

Adversarial Multi-Task Deep Learning for Noise-Robust Voice Activity Detection with Low Algorithmic Delay

aau-es-ml/VAD-with-adversarial-multi-task-learning 4 Jul 2022

Introducing adversarial multi-task learning to the model is observed to increase performance in terms of Area Under Curve (AUC), particularly in noisy environments, while the performance is not degraded at higher SNR levels.

3
04 Jul 2022

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

idiap/zff_vad 27 Jun 2022

This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD.

18
27 Jun 2022

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

dr-pato/ssgd 5 Apr 2022

In particular, we compare two low-latency speech separation models.

11
05 Apr 2022

Gan-Based Joint Activity Detection and Channel Estimation For Grant-free Random Access

deeeeeeplearning/jadce 4 Apr 2022

Joint activity detection and channel estimation (JADCE) for grant-free random access is a critical issue that needs to be addressed to support massive connectivity in IoT networks.

3
04 Apr 2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

alibaba-damo-academy/FunASR 18 Mar 2022

Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.

3,096
18 Mar 2022

HGCN: Harmonic gated compensation network for speech enhancement

wangtianrui/HGCN 30 Jan 2022

Mask processing in the time-frequency (T-F) domain through the neural network has been one of the mainstreams for single-channel speech enhancement.

51
30 Jan 2022