Using Machine Learning to amortize this expensive process could lower the cost of code coverage by requiring only the source code context, and the task of code coverage prediction can be a novel benchmark for judging the ability of models to understand code.
We compare our approach with the various prompt variations and state of the art methods in the task of performance bug fixing.
We recognize that the current advances in machine learning can be used to detect vulnerable code patterns on syntactically incomplete code snippets as the developer is writing the code at EditTime.
Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code.
Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code.
Additionally, we evaluate DeepPERF on 50 open source C# repositories on GitHub using both benchmark and unit tests and find that our model is able to suggest valid performance improvements that can improve both CPU usage and Memory allocations.
This scenario motivates the code adaptation task -- a variant of program repair which aims to adapt variable identifiers in a pasted snippet of code to the surrounding, preexisting source code.
Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks.
In this research, we focus on utilizing pre-training techniques for the tasks in the code review scenario.
Due to increasingly complex software design and rapid iterative development, code defects and security vulnerabilities are prevalent in modern software.
We study the feasibility of a Data Science assistant powered by a sequence-to-sequence transformer by training a new model JuPyT5 on all publicly available Jupyter Notebook GitHub repositories and developing a new metric: Data Science Problems (DSP).
While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window.
Our model achieves 63-68% accuracy for merge resolution synthesis, yielding nearly a 3x performance improvement over existing semi-structured, and 2x improvement over neural program merge tools.
Pre-trained transformers have recently clinched top spots in the gamut of natural language tasks and pioneered solutions to software engineering tasks.
The joint task of bug localization and program repair is an integral part of the software development process.
In this work we introduce DeepDebug: a data-driven program repair approach which learns to detect and fix bugs in Java methods mined from real-world GitHub repositories.
To demonstrate the effectiveness of our model designs, we perform extensive experiments with CodeSearchNet which contains template functions and CoNaLa which contains Stack Overflow intent-snippet pairs.
3 code implementations • 9 Feb 2021 • Shuai Lu, Daya Guo, Shuo Ren, JunJie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
Ranked #1 on Cloze Test on CodeXGLUE - CT-maxmin
Simultaneously modeling source code and natural language has many exciting applications in automated software development and understanding.
Evaluation metrics play a vital role in the growth of an area as it defines the standard of distinguishing between good and bad models.
1 code implementation • • Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou
Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
Ranked #1 on Type prediction on ManyTypes4TypeScript
In this paper we present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
We execute the test cases, collect test coverage information, and compare them with test cases generated by EvoSuite and GPT-3, finding that our approach outperforms GPT-3 and has comparable coverage w. r. t.
In software development through integrated development environments (IDEs), code completion is one of the most widely used features.
We identify a novel instance of the background subtraction problem that focuses on extracting near-field foreground objects captured using handheld cameras.
We describe a completely automated large scale visual recommendation system for fashion.
Given the enormous growth in user-generated videos, it is becoming increasingly important to be able to navigate them efficiently.