Code Translation
54 papers with code • 2 benchmarks • 10 datasets
Code translation is the process of converting code written in one programming language to another programming language while maintaining the same functionality. This process is also known as code conversion, source-to-source translation, or transpilation. Code translation is often performed when developers want to take advantage of new programming languages, improve code performance, or maintain legacy systems. Some common examples include translating code from Python to Java, or from JavaScript to TypeScript.
Libraries
Use these libraries to find Code Translation models and implementationsDatasets
Most implemented papers
Unsupervised Translation of Programming Languages
We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy.
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models.
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).
DOBF: A Deobfuscation Pre-Training Objective for Programming Languages
Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.
Unified Pre-training for Program Understanding and Generation
Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models.
The impact of lexical and grammatical processing on generating code from natural language
Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms.
CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models
Pre-trained programming language (PL) models (such as CodeT5, CodeBERT, GraphCodeBERT, etc.,) have the potential to automate software engineering tasks involving code understanding and code generation.
NatGen: Generative pre-training by "Naturalizing" source code
Pre-trained Generative Language models (e. g. PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation.