CodeT5 is a Transformer-based model for code understanding and generation based on the T5 architecture. It utilizes an identifier-aware pre-training objective that considers the crucial token type information (identifiers) from code. Specifically, the denoising Seq2Seq objective of T5 is extended with two identifier tagging and prediction tasks to enable the model to better leverage the token type information from programming languages, which are the identifiers assigned by developers. To improve the natural language-programming language alignment, a bimodal dual learning objective is used for a bidirectional conversion between natural language and programming language.
Source: CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and GenerationPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Code Generation | 6 | 15.00% |
Language Modelling | 5 | 12.50% |
Program Repair | 4 | 10.00% |
Code Translation | 3 | 7.50% |
Retrieval | 2 | 5.00% |
Vulnerability Detection | 2 | 5.00% |
Decoder | 2 | 5.00% |
Translation | 2 | 5.00% |
Kolmogorov-Arnold Networks | 1 | 2.50% |