Code Completion
104 papers with code • 6 benchmarks • 12 datasets
Libraries
Use these libraries to find Code Completion models and implementationsDatasets
Most implemented papers
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Benchmark datasets have a significant impact on accelerating research in programming language tasks.
StarCoder 2 and The Stack v2: The Next Generation
Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.
Open Vocabulary Learning on Source Code with a Graph-Structured Cache
Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques.
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding.
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning
Through extensive experimental analysis, we identified three key challenges in designing effective data management strategies that enable the model to achieve long-context capability without sacrificing performance in other tasks: (1) a shortage of long documents across multiple domains, (2) effective construction of context windows, and (3) efficient organization of large-scale datasets.
Structural Language Models of Code
We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM).
Neural Software Analysis
The resulting tools complement and outperform traditional program analyses, and are used in industrial practice.
UniXcoder: Unified Cross-Modal Pre-training for Code Representation
Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.
Multi-lingual Evaluation of Code Generation Models
Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Large Language Models (LLMs) are increasingly being integrated into various applications.