no code implementations • 17 Dec 2024 • Luís F. Gomes, Vincent J. Hellendoorn, Jonathan Aldrich, Rui Abreu
We investigate developers' mental models by analyzing patterns commonly observed in their sketches when developing an ML workflow.
no code implementations • 2 Nov 2023 • Kamel Alrashedy, Vincent J. Hellendoorn, Alessandro Orso
To investigate this conjecture, we propose an approach for identifying the subsets of these large yet unrealistic datasets that are most similar to examples in real-world datasets based on their learned representations.
1 code implementation • 3 Oct 2023 • Aidan Z. H. Yang, Ruben Martins, Claire Le Goues, Vincent J. Hellendoorn
Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information.
2 code implementations • 2 Oct 2023 • Nikitha Rao, Kush Jain, Uri Alon, Claire Le Goues, Vincent J. Hellendoorn
We also drastically increase the maximum sequence length of inputs to 8, 192 tokens, 4x more than typical code generation models, to ensure that the code context is available to the model when generating test code.
no code implementations • 5 Jun 2023 • Manisha Mukherjee, Vincent J. Hellendoorn
We align established best-practices for pre-training large language models with properties of SO as a data source, especially using a large context window (2, 048 tokens), coupled with a powerful toolkit (Megatron-LM) to train two models: SOBertBase (125M parameters) and SOBertLarge (762M parameters), at a budget of just $374 and $1600 each.
no code implementations • 31 May 2023 • Nikitha Rao, Jason Tsay, Kiran Kate, Vincent J. Hellendoorn, Martin Hirzel
We task 20 developers with varying levels of AI expertise with implementing four ML pipelines using LowCoder, replacing the LowCoder_NL component with a simple keyword search in half the tasks.
no code implementations • 30 Oct 2022 • Machel Reid, Vincent J. Hellendoorn, Graham Neubig
In text generation, models that generate text from scratch one token at a time are currently the dominant paradigm.
3 code implementations • 26 Feb 2022 • Frank F. Xu, Uri Alon, Graham Neubig, Vincent J. Hellendoorn
We aim to fill in some of these blanks through a systematic evaluation of the largest existing models: Codex, GPT-J, GPT-Neo, GPT-NeoX-20B, and CodeParrot, across various programming languages.
no code implementations • ICLR 2022 • Frank F. Xu, Junxian He, Graham Neubig, Vincent J. Hellendoorn
Structural locality is a ubiquitous feature of real-world datasets, wherein data points are organized into local hierarchies.
2 code implementations • 16 Jun 2021 • Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour, Vincent J. Hellendoorn
The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models.
2 code implementations • 7 Jun 2021 • Md Rafiqul Islam Rabin, Vincent J. Hellendoorn, Mohammad Amin Alipour
Our approach, SIVAND, uses simplification techniques that reduce the size of input programs of a CI model while preserving the predictions of the model.
1 code implementation • 24 Aug 2020 • Yangruibo Ding, Baishakhi Ray, Premkumar Devanbu, Vincent J. Hellendoorn
Given these findings, we demonstrate how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions.
1 code implementation • ICLR 2020 • Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, David Bieber
By studying a popular, non-trivial program repair task, variable-misuse identification, we explore the relative merits of traditional and hybrid model families for code representation.