23 papers with code • 0 benchmarks • 0 datasets
Text segmentation deals with the correct division of a document into semantically coherent blocks.
These leaderboards are used to track progress in Text Segmentation
We propose a novel domain-independent framework, called CoType, that runs a data-driven text segmentation algorithm to extract entity mentions, and jointly embeds entity mentions, relation mentions, text features and type labels into two low-dimensional spaces (for entity and relation mentions respectively), where, in each space, objects whose types are close will also have similar representations.
We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation.
The trained CRF segmenter was compared empirically to a baseline approach based on maximum matching that used a dictionary extracted from the manually segmented corpus.
The preeminent reason for poor output in Optical Character Recognition (OCR) for Bangla text is introduced by segmentation related error.
For training our network, we develop a cross-entropy based loss function that addresses the imbalance problems.