Search Results for author: Baishakhi Ray

Found 49 papers, 32 papers with code

DIRECT : A Transformer-based Model for Decompiled Identifier Renaming

no code implementations • ACL (NLP4Prog) 2021 • Vikram Nitin, Anthony Saieva, Baishakhi Ray, Gail Kaiser

Decompiling binary executables to high-level code is an important step in reverse engineering scenarios, such as malware analysis and legacy code maintenance.

Malware Analysis

Paper
Add Code

Vulnerability Detection with Code Language Models: How Far Are We?

1 code implementation • 27 Mar 2024 • Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng Chen

Evaluating code LMs on PrimeVul reveals that existing benchmarks significantly overestimate the performance of these models.

Vulnerability Detection

Paper
Code

CYCLE: Learning to Self-Refine the Code Generation

1 code implementation • 27 Mar 2024 • Yangruibo Ding, Marcus J. Min, Gail Kaiser, Baishakhi Ray

Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers.

Code Generation

Paper
Code

PropTest: Automatic Property Testing for Improved Visual Programming

no code implementations • 25 Mar 2024 • Jaywon Koo, Ziyan Yang, Paola Cascante-Bonilla, Baishakhi Ray, Vicente Ordonez

Visual Programming has emerged as an alternative to end-to-end black-box visual reasoning models.

Question Answering Referring Expression +3

Paper
Add Code

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

no code implementations • 31 Jan 2024 • Gabriel Ryan, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, Baishakhi Ray

Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance.

Paper
Add Code

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

1 code implementation • 21 Oct 2023 • Marcus J. Min, Yangruibo Ding, Luca Buratti, Saurabh Pujar, Gail Kaiser, Suman Jana, Baishakhi Ray

In this paper, we first formally define the self-consistency of Code LLMs and then design a framework, IdentityChain, which effectively and efficiently evaluates the self-consistency and conventional accuracy of a model at the same time.

Code Generation Code Summarization

Paper
Code

Towards Causal Deep Learning for Vulnerability Detection

no code implementations • 12 Oct 2023 • Md Mahbubur Rahman, Ira Ceka, Chengzhi Mao, Saikat Chakraborty, Baishakhi Ray, Wei Le

Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented.

Vulnerability Detection

Paper
Add Code

Language-Guided Traffic Simulation via Scene-Level Diffusion

no code implementations • 10 Jun 2023 • Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray

Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development.

Language Modelling Large Language Model

Paper
Add Code

A Static Evaluation of Code Completion by Large Language Models

no code implementations • 5 Jun 2023 • Hantian Ding, Varun Kumar, Yuchen Tian, Zijian Wang, Rob Kwiatkowski, Xiaopeng Li, Murali Krishna Ramanathan, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

Large language models trained on code have shown great potential to increase productivity of software developers.

Code Completion Code Generation

Paper
Add Code

Variation of Gender Biases in Visual Recognition Models Before and After Finetuning

no code implementations • 14 Mar 2023 • Jaspreet Ranjit, Tianlu Wang, Baishakhi Ray, Vicente Ordonez

We also find that (2) models finetuned on larger scale datasets are more likely to introduce new biased associations.

Object Recognition

Paper
Add Code

Greener yet Powerful: Taming Large Code Generation Models with Quantization

no code implementations • 9 Mar 2023 • Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang

Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint.

Code Generation Code Summarization +2

Paper
Add Code

On ML-Based Program Translation: Perils and Promises

1 code implementation • 21 Feb 2023 • Aniketh Malyala, Katelyn Zhou, Baishakhi Ray, Saikat Chakraborty

In the future, we envision an end-to-end program translation tool where programming domain knowledge can be embedded into an ML-based translation pipeline using pre- and post-processing steps.

Translation

Paper
Code

ReCode: Robustness Evaluation of Code Generation Models

2 code implementations • 20 Dec 2022 • Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang

Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation.

Code Generation

Paper
Code

Guided Conditional Diffusion for Controllable Traffic Simulation

1 code implementation • 31 Oct 2022 • Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, Marco Pavone

Controllable and realistic traffic simulation is critical for developing and verifying autonomous vehicles.

Autonomous Vehicles Collision Avoidance

Paper
Code

Multi-lingual Evaluation of Code Generation Models

2 code implementations • 26 Oct 2022 • Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang

Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.

Code Completion Code Translation +1

Paper
Code

NeuDep: Neural Binary Memory Dependence Analysis

no code implementations • 4 Oct 2022 • Kexin Pei, Dongdong She, Michael Wang, Scott Geng, Zhou Xuan, Yaniv David, Junfeng Yang, Suman Jana, Baishakhi Ray

Notably, NeuDep also outperforms the current state-of-the-art on these tasks.

Paper
Add Code

ContraCLM: Contrastive Learning For Causal Language Model

no code implementations • 3 Oct 2022 • Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan, Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Xiaofei Ma, Bing Xiang

Specifically, we attain $44\%$ relative improvement on the Semantic Textual Similarity tasks and $34\%$ on Code-to-Code Search tasks.

Code Generation Code Search +4

Paper
Add Code

NatGen: Generative pre-training by "Naturalizing" source code

1 code implementation • 15 Jun 2022 • Saikat Chakraborty, Toufique Ahmed, Yangruibo Ding, Premkumar Devanbu, Baishakhi Ray

Pre-trained Generative Language models (e. g. PLBART, CodeT5, SPT-Code) for source code yielded strong results on several tasks in the past few years, including code generation and translation.

Code Translation Few-Shot Learning +1

Paper
Code

Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

1 code implementation • 23 May 2022 • Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

In code generation, the model learns to do the opposite.

Code Generation Code Summarization +2

Paper
Code

Repairing Group-Level Errors for DNNs Using Weighted Regularization

1 code implementation • 24 Mar 2022 • Ziyuan Zhong, Yuchi Tian, Conor J. Sweeney, Vicente Ordonez, Baishakhi Ray

In particular, it can repair confusion error and bias error of DNN models for both single-label and multi-label image classifications.

Paper
Code

Unicorn: Reasoning about Configurable System Performance through the lens of Causality

1 code implementation • 20 Jan 2022 • Md Shahriar Iqbal, Rahul Krishna, Mohammad Ali Javidian, Baishakhi Ray, Pooyan Jamshidi

Understanding and reasoning about the performance behavior of highly configurable systems, over a vast and variable space, is challenging.

BIG-bench Machine Learning Causal Inference +1

Paper
Code

VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements

1 code implementation • 20 Dec 2021 • Yangruibo Ding, Sahil Suneja, Yunhui Zheng, Jim Laredo, Alessandro Morari, Gail Kaiser, Baishakhi Ray

Automatically locating vulnerable statements in source code is crucial to assure software security and alleviate developers' debugging efforts.

Ensemble Learning

Paper
Code

A Survey on Scenario-Based Testing for Automated Driving Systems in High-Fidelity Simulation

no code implementations • 2 Dec 2021 • Ziyuan Zhong, Yun Tang, Yuan Zhou, Vania de Oliveira Neves, Yang Liu, Baishakhi Ray

To bridge this gap, in this work, we provide a generic formulation of scenario-based testing in high-fidelity simulation and conduct a literature review on the existing works.

Paper
Add Code

Towards Learning (Dis)-Similarity of Source Code from Program Contrasts

no code implementations • ACL 2022 • Yangruibo Ding, Luca Buratti, Saurabh Pujar, Alessandro Morari, Baishakhi Ray, Saikat Chakraborty

We pre-train our model with a much smaller dataset, the size of which is only 5% of the state-of-the-art models' training datasets, to illustrate the effectiveness of our data augmentation and the pre-training approach.

Clone Detection Contrastive Learning +2

Paper
Add Code

Detecting Multi-Sensor Fusion Errors in Advanced Driver-Assistance Systems

3 code implementations • 14 Sep 2021 • Ziyuan Zhong, Zhisheng Hu, Shengjian Guo, Xinyang Zhang, Zhenyu Zhong, Baishakhi Ray

We define the failures (e. g., car crashes) caused by the faulty MSF as fusion errors and develop a novel evolutionary-based domain-specific search framework, FusED, for the efficient detection of fusion errors.

Autonomous Driving Sensor Fusion

Paper
Code

Neural Network Guided Evolutionary Fuzzing for Finding Traffic Violations of Autonomous Vehicles

1 code implementation • 13 Sep 2021 • Ziyuan Zhong, Gail Kaiser, Baishakhi Ray

Self-driving cars and trucks, autonomous vehicles (AVs), should not be accepted by regulatory bodies and the public until they have much higher confidence in their safety and reliability -- which can most practically and convincingly be achieved by testing.

Self-Driving Cars

Paper
Code

Retrieval Augmented Code Generation and Summarization

1 code implementation • Findings (EMNLP) 2021 • Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models.

Ranked #1 on Code Generation on CodeXGLUE - CodeSearchNet (using extra training data)

Code Generation Code Summarization +1

Paper
Code

On Multi-Modal Learning of Editing Source Code

1 code implementation • 15 Aug 2021 • Saikat Chakraborty, Baishakhi Ray

With in-depth investigation and analysis, we show that developers' hint as an input modality can narrow the search space for patches and outperform state-of-the-art models to generate correctly patched code in top-1 position.

NMT

Paper
Code

Unified Pre-training for Program Understanding and Generation

1 code implementation • NAACL 2021 • Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models.

Clone Detection Code Summarization +6

177

Paper
Code

Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

1 code implementation • 16 Dec 2020 • Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, Baishakhi Ray

We thus train the model to learn execution semantics from the functions' micro-traces, without any manual labeling effort.

Transfer Learning Vulnerability Detection

Paper
Code

Understanding Local Robustness of Deep Neural Networks under Natural Variations

1 code implementation • 9 Oct 2020 • Ziyuan Zhong, Yuchi Tian, Baishakhi Ray

To this end, we study the local per-input robustness properties of the DNNs and leverage those properties to build a white-box (DeepRobust-W) and a black-box (DeepRobust-B) tool to automatically identify the non-robust points.

Autonomous Driving Image Classification

Paper
Code

Deep Learning & Software Engineering: State of Research and Future Directions

1 code implementation • 17 Sep 2020 • Prem Devanbu, Matthew Dwyer, Sebastian Elbaum, Michael Lowry, Kevin Moran, Denys Poshyvanyk, Baishakhi Ray, Rishabh Singh, Xiangyu Zhang

The intent of this report is to serve as a potential roadmap to guide future work that sits at the intersection of SE & DL.

Paper
Code

Deep Learning based Vulnerability Detection: Are We There Yet?

1 code implementation • 3 Sep 2020 • Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, Baishakhi Ray

In this paper, we ask, "how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?".

Software Engineering

271

Paper
Code

Patching as Translation: the Data and the Metaphor

1 code implementation • 24 Aug 2020 • Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J. Hellendoorn

Given these findings, we demonstrate how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions.

General Knowledge Program Repair +1

Paper
Code

Multitask Learning Strengthens Adversarial Robustness

1 code implementation • ECCV 2020 • Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick

Although deep networks achieve strong accuracy on a range of computer vision benchmarks, they remain vulnerable to adversarial attacks, where imperceptible input perturbations fool the network.

Adversarial Defense Adversarial Robustness

Paper
Code

MTFuzz: Fuzzing with a Multi-Task Neural Network

1 code implementation • 25 May 2020 • Dongdong She, Rahul Krishna, Lu Yan, Suman Jana, Baishakhi Ray

The compact embedding can be used to guide the mutation process effectively by focusing most of the mutations on the parts of the embedding where the gradient is high.

Software Engineering

Paper
Code

Pythia: Grammar-Based Fuzzing of REST APIs with Coverage-guided Feedback and Learning-based Mutations

no code implementations • 23 May 2020 • Vaggelis Atlidakis, Roxana Geambasu, Patrice Godefroid, Marina Polishchuk, Baishakhi Ray

This paper introduces Pythia, the first fuzzer that augments grammar-based fuzzing with coverage-guided feedback and a learning-based mutation strategy for stateful REST API fuzzing.

valid

Paper
Add Code

A Transformer-based Approach for Source Code Summarization

9 code implementations • ACL 2020 • Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

Generating a readable summary that describes the functionality of a program is known as source code summarization.

Code Summarization Position +1

237

Paper
Code

ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance

3 code implementations • 17 Oct 2019 • Rahul Krishna, Chong Tang, Kevin Sullivan, Baishakhi Ray

For cost reduction, we developed and experimentally tested and validated two approaches: using scaled-up big data jobs as proxies for the objective function for larger jobs and using a dynamic job similarity measure to infer that results obtained for one kind of big data problem will work well for similar problems.

Efficient Exploration

Paper
Code

AdvSPADE: Realistic Unrestricted Attacks for Semantic Segmentation

no code implementations • 6 Oct 2019 • Guangyu Shen, Chengzhi Mao, Junfeng Yang, Baishakhi Ray

Due to the inherent robustness of segmentation models, traditional norm-bounded attack methods show limited effect on such type of models.

Adversarial Attack Segmentation +1

Paper
Add Code

Metric Learning for Adversarial Robustness

1 code implementation • NeurIPS 2019 • Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray

Deep networks are well-known to be fragile to adversarial attacks.

Adversarial Robustness Metric Learning

Paper
Code

Neutaint: Efficient Dynamic Taint Analysis with Neural Networks

no code implementations • 8 Jul 2019 • Dongdong She, Yizheng Chen, Baishakhi Ray, Suman Jana

Dynamic taint analysis (DTA) is widely used by various applications to track information flow during runtime execution.

Cryptography and Security

Paper
Add Code

Testing DNN Image Classifiers for Confusion & Bias Errors

1 code implementation • 20 May 2019 • Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, Baishakhi Ray

We found that many of the reported erroneous cases in popular DNN image classifiers occur because the trained models confuse one class with another or show biases towards some classes over others.

Avg DNN Testing +2

Paper
Code

Tree2Tree Neural Translation Model for Learning Source Code Changes

no code implementations • 30 Sep 2018 • Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray

Our evaluation shows the effectiveness of CODIT in learning and suggesting abstract change templates.

Software Engineering

Paper
Add Code

A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks

no code implementations • 8 Aug 2018 • Md Masudur Rahman, Saikat Chakraborty, Gail Kaiser, Baishakhi Ray

In particular, we analyze two previously proposed tools for project recommendation and bug localization tasks, which leverage diverse software artifacts, and observe that an informed choice of similarity measure indeed leads to improved performance of the existing SE tools.

Information Retrieval Retrieval

Paper
Add Code

NEUZZ: Efficient Fuzzing with Neural Program Smoothing

1 code implementation • 15 Jul 2018 • Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana

However, even state-of-the-art fuzzers are not very efficient at finding hard-to-trigger software bugs.

Evolutionary Algorithms

390

Paper
Code

Obfuscation Resilient Search through Executable Classification

1 code implementation • 6 Jun 2018 • Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, Baishakhi Ray

It is challenging to search for executables relevant to an obfuscated application for developers to analyze efficiently.

Software Engineering Cryptography and Security

Paper
Code

Building Language Models for Text with Named Entities

2 code implementations • ACL 2018 • Md. Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang

Text in many domains involves a significant amount of named entities.

Ranked #1 on Recipe Generation on Now You're Cooking!

Code Generation Language Modelling +1

Paper
Code

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

1 code implementation • 28 Aug 2017 • Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray

Most existing testing techniques for DNN-driven vehicles are heavily dependent on the manual collection of test data under different driving conditions which become prohibitively expensive as the number of test conditions increases.

Autonomous Vehicles

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.