Code Translation

34 papers with code • 2 benchmarks • 7 datasets

Code translation is the process of converting code written in one programming language to another programming language while maintaining the same functionality. This process is also known as code conversion, source-to-source translation, or transpilation. Code translation is often performed when developers want to take advantage of new programming languages, improve code performance, or maintain legacy systems. Some common examples include translating code from Python to Java, or from JavaScript to TypeScript.

Libraries

Use these libraries to find Code Translation models and implementations

Most implemented papers

Unsupervised Translation of Programming Languages

facebookresearch/CodeGen NeurIPS 2020

We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

salesforce/codet5 EMNLP 2021

We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

microsoft/CodeXGLUE 9 Feb 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

p-lambda/composed_finetuning 29 Jun 2020

Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

THUDM/CodeGeeX 22 Sep 2020

Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models.

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

facebookresearch/CodeGen NeurIPS 2021

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.

The impact of lexical and grammatical processing on generating code from natural language

codegenfactors/BertranX Findings (ACL) 2022

Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms.

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

magnumresearchgroup/magnum-nlc2cmd 15 Feb 2023

First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text.

GraphCodeBERT: Pre-training Code Representations with Data Flow

microsoft/CodeBERT ICLR 2021

Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.

Unified Pre-training for Program Understanding and Generation

wasiahmad/PLBART NAACL 2021

Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models.