Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Results from the Paper


Ranked #3 on Code Generation on CoNaLa (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Code Generation CoNaLa External Knowledge With API BLEU 30.69 # 4
Code Generation CoNaLa External Knowledge With API + Reranking BLEU 32.26 # 3
Code Generation CoNaLa-Ext External Knowledge With API BLEU 20.37 # 4
Code Generation CoNaLa-Ext External Knowledge With API + Reranking BLEU 20.54 # 3

Methods


No methods listed for this paper. Add relevant methods here