Code Repair
9 papers with code • 1 benchmarks • 6 datasets
Most implemented papers
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
OctoPack: Instruction Tuning Code Large Language Models
We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46. 2% pass@1).
Learning Performance-Improving Code Edits
Next, we propose a broad range of adaptation strategies for code optimization; for prompting, these include retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
MACER: A Modular Framework for Accelerated Compilation Error Repair
Automated compilation error repair, the problem of suggesting fixes to buggy programs that fail to compile, has generated significant interest in recent years.
Break-It-Fix-It: Unsupervised Learning for Program Repair
To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code.
Guiding Language Models of Code with Global Context using Monitors
We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it.
INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair
INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher.
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct}
We introduce AutoCoder, the first Large Language Model to surpass GPT-4 Turbo (April 2024) and GPT-4o in pass@1 on the Human Eval benchmark test ($\mathbf{90. 9\%}$ vs. $\mathbf{90. 2\%}$).
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
We find that LLMs generally perform surprisingly well at generating relevant test cases, with Code Agents designed for code repair exceeding the performance of systems designed specifically for test generation.