no code implementations • 22 Mar 2022 • Meng Feng, Chieh-Chi Kao, Qingming Tang, Ming Sun, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Standard acoustic event classification (AEC) solutions require large-scale collection of data from client devices for model optimization.
no code implementations • 27 Jan 2022 • Ayoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang
We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 12 Feb 2021 • Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang
Speech emotion recognition (SER) is a key technology to enable more natural human-machine communication.
Ranked #4 on
Speech Emotion Recognition
on MSP-Podcast (Dominance)
(using extra training data)
no code implementations • 1 Jul 2019 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
Acoustic Event Detection (AED), aiming at detecting categories of events based on audio signals, has found application in many intelligent systems.
no code implementations • NIPS Workshop CDNNRIA 2018 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
In this paper, we present a compression approach based on the combination of low-rank matrix factorization and quantization training, to reduce complexity for neural network based acoustic event detection (AED) models.
no code implementations • 29 Apr 2019 • Bowen Shi, Ming Sun, Chieh-Chi Kao, Viktor Rozgic, Spyros Matsoukas, Chao Wang
This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset.
no code implementations • 18 May 2017 • Zhuolin Jiang, Viktor Rozgic, Sancar Adali
Experimental results demonstrate that our approach can achieve state-of-the-art average precision (AP) performances on the InfAR dataset: (1) the proposed two-stream 3D CNN achieves the best reported 77. 5% AP, and (2) our 3D CNN model applied to the optical flow fields achieves the best reported single stream 75. 42% AP.
no code implementations • 3 Feb 2016 • Zhuolin Jiang, Yaming Wang, Larry Davis, Walt Andrews, Viktor Rozgic
Deep Convolutional Neural Networks (CNN) enforces supervised information only at the output layer, and hidden layers are trained by back propagating the prediction error from the output layer without explicit supervision.