Code Classification
9 papers with code • 0 benchmarks • 6 datasets
Benchmarks
These leaderboards are used to track progress in Code Classification
Datasets
Most implemented papers
SCC: Automatic Classification of Code Snippets
Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the programming language of source code files.
Embedding Java Classes with code2vec: Improvements from Variable Obfuscation
code2vec is a recently released embedding approach that uses the proxy task of method name prediction to map Java methods to feature vectors.
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks
In addition to its large scale, CodeNet has a rich set of high-quality annotations to benchmark and help accelerate research in AI techniques for a variety of critical coding tasks, including code similarity and classification, code translation between a large variety of programming languages, and code performance (runtime and memory) improvement techniques.
Semantic Code Classification for Automated Machine Learning
A range of applications for automatic machine learning need the generation process to be controllable.
Learning Program Semantics with Code Representations: An Empirical Study
However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed.
MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
Data augmentation has been a popular approach to supplement training data in domains such as computer vision and NLP.
Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification
In this study, we propose to represent AST as a heterogeneous directed hypergraph (HDHG) and process the graph by heterogeneous directed hypergraph neural network (HDHGN) for code classification.
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
These findings show that early layers can be used to obtain better results using the same resources, as well as to reduce resource usage during fine-tuning and inference.
Understanding Programs by Exploiting (Fuzzing) Test Cases
The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins.