Code Generation Transformers

CodeT5 is a Transformer-based model for code understanding and generation based on the T5 architecture. It utilizes an identifier-aware pre-training objective that considers the crucial token type information (identifiers) from code. Specifically, the denoising Seq2Seq objective of T5 is extended with two identifier tagging and prediction tasks to enable the model to better leverage the token type information from programming languages, which are the identifiers assigned by developers. To improve the natural language-programming language alignment, a bimodal dual learning objective is used for a bidirectional conversion between natural language and programming language.

Source: CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Code Generation 5 17.86%
Program Repair 4 14.29%
Language Modelling 3 10.71%
Code Translation 3 10.71%
Translation 2 7.14%
Reinforcement Learning (RL) 1 3.57%
Machine Translation 1 3.57%
Synthetic Data Generation 1 3.57%
Retrieval 1 3.57%

Components


Component Type
T5
Transformers

Categories