Code Generation

322 papers with code • 17 benchmarks • 41 datasets

Code Generation is an important field to predict explicit code or program structure from multimodal data sources such as incomplete code, programs in another programming language, natural language descriptions or execution examples. Code Generation tools can assist the development of automatic programming tools to improve programming productivity.

Source: Deep Learning for Source Code Modeling and Generation

Image source: Measuring Coding Challenge Competence With APPS

Libraries

Use these libraries to find Code Generation models and implementations

DevBench: A Comprehensive Benchmark for Software Development

open-compass/devbench 13 Mar 2024

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities.

57
13 Mar 2024

CleanAgent: Automating Data Standardization with LLM-based Agents

sfu-db/CleanAgent 13 Mar 2024

Data standardization is a crucial part in data science life cycle.

6
13 Mar 2024

Bugs in Large Language Models Generated Code: An Empirical Study

flowss/bugsinllms 13 Mar 2024

The bug patterns are presented in the form of a taxonomy.

1
13 Mar 2024

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Leeroo-AI/mergoo 12 Mar 2024

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

155
12 Mar 2024

Automatic Generation of Python Programs Using Context-Free Grammars

marwanair/tinypy-generator 11 Mar 2024

In recent years, data has emerged as the new gold, serving as a powerful tool for creating intelligent systems.

4
11 Mar 2024

Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

mulns/text2qr 11 Mar 2024

This approach harnesses the potent generation capabilities of stable-diffusion models, navigating the trade-off between image aesthetics and QR code scannability.

3
11 Mar 2024

UniSparse: An Intermediate Language for General Sparse Format Customization

cornell-zhang/unisparse 9 Mar 2024

The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound.

18
09 Mar 2024

Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models

yale-nlp/code-llm-contamination 6 Mar 2024

While large language models have achieved remarkable performance on various code generation benchmarks, there have been growing concerns regarding potential contamination of these benchmarks as they may be leaked into pretraining and finetuning data.

6
06 Mar 2024

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

UKPLab/arxiv2024-IRCoder 6 Mar 2024

In particular, most mainstream Code-LMs have been pre-trained on source code files alone.

4
06 Mar 2024

DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

shirley-wu/daco 4 Mar 2024

We construct the DACO dataset, containing (1) 440 databases (of tabular data) collected from real-world scenarios, (2) ~2k query-answer pairs that can serve as weak supervision for model training, and (3) a concentrated but high-quality test set with human refined annotations that serves as our main evaluation benchmark.

3
04 Mar 2024