Efficient Pattern-based Static Analysis Approach via Regular-Expression Rules

Pattern-based static analyzers like SpotBugs use bug patterns (rules) to detect bugs may have several limitations: (1) too slow, (2) do not usually support analysis of partial programs, (3) require parsing code into AST/CFG, and (4) high false positive rate. Each pattern relies on analysis context (e.g., data flow analysis) to improve the accuracy of the analysis. To understand the analysis contexts required by each pattern, we study the design of bug patterns in SpotBugs. Based on our study, we present Codegex, an efficient pattern-based static analysis approach that uses regular expression with several strategies to extract more information from program texts (syntax and type information). It can analyze partial and complete code quickly without parsing code into AST. We evaluate Codegex using two settings. First, we compare the effectiveness and efficiency of Codegex and SpotBugs in analyzing 52 projects. Our results show that Codegex can detect bugs with comparable accuracy as SpotBugs but up to 590X faster, showing the potential of using Codegex as the fast stage of SpotBugs in a two-stage approach for instant feedback. Second, we evaluate Codegex in automated code review by running it on 4256 PRs where it generated 372 review comments and received 116 feedback. Overall, 78.45% of the feedback that we received is positive, indicating the promise of using Codegex for automated code review.

PDF
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here