Search Results for author: Saurabh Pujar

Found 9 papers, 4 papers with code

D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis

1 code implementation16 Feb 2021 Yunhui Zheng, Saurabh Pujar, Burn Lewis, Luca Buratti, Edward Epstein, Bo Yang, Jim Laredo, Alessandro Morari, Zhong Su

However, existing datasets to train models for vulnerability identification suffer from multiple limitations such as limited bug context, limited size, and synthetic and unrealistic source code.

Bug fixing Vulnerability Detection

CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

1 code implementation25 May 2021 Ruchir Puri, David S. Kung, Geert Janssen, Wei zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, Veronika Thost, Luca Buratti, Saurabh Pujar, Shyam Ramji, Ulrich Finkler, Susan Malaika, Frederick Reiss

In addition to its large scale, CodeNet has a rich set of high-quality annotations to benchmark and help accelerate research in AI techniques for a variety of critical coding tasks, including code similarity and classification, code translation between a large variety of programming languages, and code performance (runtime and memory) improvement techniques.

BIG-bench Machine Learning Code Classification +1

Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

no code implementations ACL 2022 Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty

We pre-train our model with a much smaller dataset, the size of which is only 5% of the state-of-the-art models' training datasets, to illustrate the effectiveness of our data augmentation and the pre-training approach.

Clone Detection Contrastive Learning +2

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

1 code implementation21 Oct 2023 Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

In this paper, we first formally define the self-consistency of Code LLMs and then design a framework, IdentityChain, which effectively and efficiently evaluates the self-consistency and conventional accuracy of a model at the same time.

Code Generation Code Summarization

Learning Transfers over Several Programming Languages

no code implementations25 Oct 2023 Razan Baltaji, Saurabh Pujar, Louis Mandel, Martin Hirzel, Luca Buratti, Lav Varshney

Third, which characteristics of a language pair are predictive of transfer performance, and how does that depend on the given task.

Cross-Lingual Transfer In-Context Learning +1

Ansible Lightspeed: A Code Generation Service for IT Automation

no code implementations27 Feb 2024 Priyam Sahoo, Saurabh Pujar, Ganesh Nalawade, Richard Gebhardt, Louis Mandel, Luca Buratti

The analysis shows that the user acceptance rate of Ansible Lightspeed suggestions is higher than comparable tools that are more general and not specific to a programming language.

Code Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.