The impact of lexical and grammatical processing on generating code from natural language

Considering the seq2seq architecture of TranX for natural language to code translation, we identify four key components of importance: grammatical constraints, lexical preprocessing, input representations, and copy mechanisms. To study the impact of these components, we use a state-of-the-art architecture that relies on BERT encoder and a grammar-based decoder for which a formalization is provided. The paper highlights the importance of the lexical substitution component in the current natural language to code systems.

PDF Abstract Findings (ACL) 2022 PDF Findings (ACL) 2022 Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Code Generation CoNaLa TranX + BERT w/mined BLEU 34.2 # 2
Exact Match Accuracy 5.8 # 2
Code Generation Django TranX + BERT w/mined Accuracy 81.03 # 2
BLEU Score 79.86 # 2