mbpp

64 papers with code • 1 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

nlpxucan/wizardlm 14 Jun 2023

Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+.

ReCode: Robustness Evaluation of Code Generation Models

amazon-science/recode 20 Dec 2022

Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation.

EffiLearner: Enhancing Efficiency of Generated Code via Self-Optimization

huangd1999/effilearner 24 May 2024

To address this issue, we propose \textbf{EffiLearner}, a self-optimization framework that utilizes execution overhead profiles to improve the efficiency of LLM-generated code.

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

salesforce/coderl 5 Jul 2022

To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL).

Underwater Object Tracker: UOSTrack for Marine Organism Grasping of Underwater Vehicles

liyunfenglyf/uostrack 4 Jan 2023

The UOHT training paradigm is designed to train the sample-imbalanced underwater tracker so that the tracker is exposed to a great number of underwater domain training samples and learns the feature expressions.

Teaching Large Language Models to Self-Debug

amazon-science/SDFeedback 11 Apr 2023

In particular, we demonstrate that Self-Debugging can teach the large language model to perform rubber duck debugging; i. e., without any human feedback on the code correctness or error messages, the model is able to identify its mistakes by investigating the execution results and explaining the generated code in natural language.

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

princeton-nlp/intercode NeurIPS 2023

Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution, and is compatible out-of-the-box with traditional seq2seq coding methods, while enabling the development of new methods for interactive code generation.

Code Llama: Open Foundation Models for Code

facebookresearch/codellama 24 Aug 2023

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Clover: Closed-Loop Verifiable Code Generation

ChuyueSun/Clover 26 Oct 2023

In this paper, we introduce a new approach for addressing this challenge: the Clover paradigm, short for Closed-Loop Verifiable Code Generation, which uses consistency checking to provide a strong filter for incorrect code.

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

codelion/optillm 13 Feb 2024

To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains.