CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

microsoft/CodeXGLUE 9 Feb 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

Pre-training via Paraphrasing

lucidrains/marge-pytorch NeurIPS 2020

The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.

Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations

sameenmaruf/Bi-MSMT WS 2018

In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task.

CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task

ssun32/CLIReval ACL 2020

We present CLIReval, an easy-to-use toolkit for evaluating machine translation (MT) with the proxy task of cross-lingual information retrieval (CLIR).

Rethinking Document-level Neural Machine Translation

sunzewei2715/Doc2Doc_NMT Findings (ACL) 2022

This paper does not aim at introducing a novel model for document-level neural machine translation.

UDAAN: Machine Learning based Post-Editing tool for Document Translation

IITB-OpenOCRCorrect/iitb-openocr-digit-tool 3 Mar 2022

UDAAN has an end-to-end Machine Translation (MT) plus post-editing pipeline wherein users can upload a document to obtain raw MT output.

Modeling Context With Linear Attention for Scalable Document-Level Translation

zhaofengwu/rfa-doc-mt 16 Oct 2022

Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations.

TransDocs: Optical Character Recognition with word to word translation

abhishekbamotra/transdocs 15 Apr 2023

While OCR has been used in various applications, its output is not always accurate, leading to misfit words.

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

indonlp/nusa-writes 19 Sep 2023

We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.