On Semi-Supervised Multiple Representation Behavior Learning

21 Oct 2019  ·  Ruqian Lu, Shengluan Hou ·

We propose a novel paradigm of semi-supervised learning (SSL)--the semi-supervised multiple representation behavior learning (SSMRBL). SSMRBL aims to tackle the difficulty of learning a grammar for natural language parsing where the data are natural language texts and the 'labels' for marking data are parsing trees and/or grammar rule pieces. We call such 'labels' as compound structured labels which require a hard work for training. SSMRBL is an incremental learning process that can learn more than one representation, which is an appropriate solution for dealing with the scarce of labeled training data in the age of big data and with the heavy workload of learning compound structured labels. We also present a typical example of SSMRBL, regarding behavior learning in form of a grammatical approach towards domain-based multiple text summarization (DBMTS). DBMTS works under the framework of rhetorical structure theory (RST). SSMRBL includes two representations: text embedding (for representing information contained in the texts) and grammar model (for representing parsing as a behavior). The first representation was learned as embedded digital vectors called impacts in a low dimensional space. The grammar model was learned in an iterative way. Then an automatic domain-oriented multi-text summarization approach was proposed based on the two representations discussed above. Experimental results on large-scale Chinese dataset SogouCA indicate that the proposed method brings a good performance even if only few labeled texts are used for training with respect to our defined automated metrics.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here