Activity Detection

51 papers with code • 0 benchmarks • 12 datasets

Detecting activities in extended videos.

Most implemented papers

Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks

imatge-upc/activitynet-2016-cvprw 29 Aug 2016

This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed.

An End-to-End Architecture for Keyword Spotting and Voice Activity Detection

mindorii/kws 28 Nov 2016

We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection.

Fine-grained Activity Recognition in Baseball Videos

piergiaj/mlb-youtube 9 Apr 2018

In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection.

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

zhenghuatan/rvadfast 9 Jun 2019

In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity. neural building blocks for speaker diarization

pyannote/pyannote-audio 4 Nov 2019

We introduce pyannote. audio, an open-source toolkit written in Python for speaker diarization.

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

butspeechfit/eend 12 Nov 2022

End-to-end diarization presents an attractive alternative to standard cascaded diarization systems because a single system can handle all aspects of the task at once.

Learning Latent Super-Events to Detect Multiple Activities in Videos

piergiaj/super-events-cvpr18 CVPR 2018

In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos.

Personal VAD: Speaker-Conditioned Voice Activity Detection

pirxus/personalVAD 12 Aug 2019

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

ROAD: The ROad event Awareness Dataset for Autonomous Driving

gurkirt/road-dataset 23 Feb 2021

We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.