Interpretable Natural Language Segmentation Based on Link Grammar

14 Nov 2020 · Vignav Ramesh, Anton Kolonin ·

Natural language segmentation (NLS), or text segmentation, refers to the process of dividing written text into meaningful units. Sentence segmentation, a subfield of text segmentation, is the problem of dividing a string of natural language text into its component sentences. Current methods of sentence segmentation are often either hardcoded-they require manual implementation of fixed grammar and segmentation rules-or require extensive training on labeled corpora and are not explainable-they are "black box" algorithms that cannot be understood by humans. In this paper, we present a novel explainable sentence segmentation method capable of separating bodies of text into grammatically valid sentences solely based on the grammatical relationships between individual words or tokens. The proposed NLS architecture can both automate the input query parsing and semantic query execution components of voice-activated question answering and information retrieval systems as well as enable automatic summarization, entity extraction, sentiment identification, and a variety of other natural language processing (NLP) algorithms that operate at the sentential level.

PDF Abstract