Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection

28 Nov 2017  ·  Cheng-Tao Chung, Lin-shan Lee ·

In this paper, we compare two paradigms for unsupervised discovery of structured acoustic tokens directly from speech corpora without any human annotation. The Multigranular Paradigm seeks to capture all available information in the corpora with multiple sets of tokens for different model granularities. The Hierarchical Paradigm attempts to jointly learn several levels of signal representations in a hierarchical structure. The two paradigms are unified within a theoretical framework in this paper. Query-by-Example Spoken Term Detection (QbE-STD) experiments on the QUESST dataset of MediaEval 2015 verifies the competitiveness of the acoustic tokens. The Enhanced Relevance Score (ERS) proposed in this work improves both paradigms for the task of QbE-STD. We also list results on the ABX evaluation task of the Zero Resource Challenge 2015 for comparison of the Paradigms.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here