Assemble Foundation Models for Automatic Code Summarization

13 Jan 2022  ·  Jian Gu, Pasquale Salza, Harald C. Gall ·

Automatic code summarization is beneficial to daily software development since it could help reduce the requirement of manual writing. Currently, artificial intelligence is undergoing a paradigm shift. The foundation models pretrained on massive data and finetuned to downstream tasks surpass specially customized models. This trend inspired us to consider reusing foundation models instead of learning from scratch. Thereby, we propose a flexible and robust approach for automatic code summarization, based on neural models. We assemble available foundation models, such as CodeBERT and GPT-2, into a single neural model named AdaMo. Moreover, we utilize Gaussian noise as the simulation of contextual information to optimize the latent representation. Furthermore, we introduce two adaptive schemes from the perspective of knowledge transfer, namely continuous pretraining and intermediate finetuning, and design intermediate stage tasks for general sequence-to-sequence learning. Finally, we evaluate AdaMo against a benchmark dataset for code summarization, by comparing it with state-of-the-art models.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Source Code Summarization CodeSearchNet - Python AdaMo-basic METEOR 12.51% # 1
BLEU-4 16.46 # 1
Source Code Summarization DeepCom-Java AdaMo-noise METEOR 28.25% # 1
BLEU-4 45.35 # 1
Source Code Summarization DeepCom-Java AdaMo-basic METEOR 28.19% # 2
BLEU-4 45.3 # 2
Source Code Summarization Hybrid-DeepCom-Java AdaMo-basic METEOR 25.59% # 1
BLEU-4 37.64 # 1
Source Code Summarization ParallelCorpus-Python AdaMo-noise METEOR 21.92% # 1
BLEU-4 34.05 # 1
Source Code Summarization ParallelCorpus-Python AdaMo-basic METEOR 21.68% # 2
BLEU-4 33.85 # 2