34 papers with code • 2 benchmarks • 7 datasets
LibrariesUse these libraries to find Code Translation models and implementations
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.
Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative).
Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models.
Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.
Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms.
First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text.
Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models.