Code Completion

104 papers with code • 6 benchmarks • 12 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Code Completion models and implementations

Most implemented papers

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

microsoft/CodeXGLUE 9 Feb 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks.

StarCoder 2 and The Stack v2: The Next Generation

bigcode-project/starcoder2 29 Feb 2024

Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

mwcvitkovic/Deep_Learning_On_Code_With_A_Graph_Vocabulary--Code_Preprocessor ICLR 2019

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

thudm/longbench 28 Aug 2023

In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding.

DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning

8023looker/datasculpt 2 Sep 2024

Through extensive experimental analysis, we identified three key challenges in designing effective data management strategies that enable the model to achieve long-context capability without sacrificing performance in other tasks: (1) a shortage of long documents across multiple domains, (2) effective construction of context windows, and (3) efficient organization of large-scale datasets.

Structural Language Models of Code

tech-srl/slm-code-generation ICML 2020

We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM).

Neural Software Analysis

superli3/codenavi 16 Nov 2020

The resulting tools complement and outperform traditional program analyses, and are used in industrial practice.

UniXcoder: Unified Cross-Modal Pre-training for Code Representation

microsoft/CodeBERT ACL 2022

Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.

Multi-lingual Evaluation of Code Generation Models

amazon-research/mbxp-exec-eval 26 Oct 2022

Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

greshake/llm-security 23 Feb 2023

Large Language Models (LLMs) are increasingly being integrated into various applications.