Fine-grained Software Vulnerability Detection via Information Theory and Contrastive Learning

29 Sep 2021 · Van Nguyen, Trung Le, John C. Grundy, Dinh Phung ·

Software vulnerabilities existing in a program or function of computer systems have been becoming a serious and crucial concern. In a program or function consisting of hundreds or thousands of source code statements, there are only few statements causing the corresponding vulnerabilities. Vulnerability labeling on a function or program level is usually done by experts with the assistance of machine learning tools; however, it will be much more costly and time-consuming to do that on a statement level. In this paper, to tackle this challenging problem, we propose a novel end-to-end deep learning-based approach to obtain the vulnerability-relevant code statements of a specific function. Inspired from previous approaches, we first leverage the mutual information theory for learning a set of latent variables that can represent the relevance of the source code statements to the corresponding function's vulnerability. We then propose a novel clustered spatial contrastive learning in order to further improve the representation learning and robust the selection process of vulnerability-relevant code statements. The experimental results on real-world datasets show the superiority of our proposed method over other state-of-the-art baselines.

PDF Abstract