Document Translation
14 papers with code • 3 benchmarks • 3 datasets
Most implemented papers
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
Pre-training via Paraphrasing
The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.
Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations
In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task.
CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task
We present CLIReval, an easy-to-use toolkit for evaluating machine translation (MT) with the proxy task of cross-lingual information retrieval (CLIR).
Rethinking Document-level Neural Machine Translation
This paper does not aim at introducing a novel model for document-level neural machine translation.
UDAAN: Machine Learning based Post-Editing tool for Document Translation
UDAAN has an end-to-end Machine Translation (MT) plus post-editing pipeline wherein users can upload a document to obtain raw MT output.
Neural Approaches to Multilingual Information Retrieval
Providing access to information across languages has been a goal of Information Retrieval (IR) for decades.
Modeling Context With Linear Attention for Scalable Document-Level Translation
Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations.
TransDocs: Optical Character Recognition with word to word translation
While OCR has been used in various applications, its output is not always accurate, leading to misfit words.
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.