1 code implementation • 16 Dec 2021 • Lam Pham, Dat Ngo, Phu X. Nguyen, Truong Hoang, Alexander Schindler
This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'.