Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning

This paper describes the development of a chunker for spoken data by supervised machine learning using the CRFs, based on a small reference corpus composed of two kinds of discourse: prepared monologue vs. spontaneous talk in interaction. The methodology considers the specific character of the spoken data. The machine learning uses the results of several available taggers, without correcting the results manually. Experiments show that the discourse type (monologue vs. free talk), the speech nature (spontaneous vs. prepared) and the corpus size can influence the results of the machine learning process and must be considered while interpreting the results.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here