We introduce a novel scientific document processing task for making previously inaccessible information in printed paper documents available to automatic processing.
We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions.
Thus it outputs the discourse-level topic structures.
This article has been withdrawn by arXiv administrators because the submitter did not have the legal authority to grant the license applied to the work.
We describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking.