Search Results for author: Arjun Guha

Found 13 papers, 8 papers with code

Activation Steering for Robust Type Prediction in CodeLLMs

no code implementations • 2 Apr 2024 • Francesca Lucchetti, Arjun Guha

We apply our approach to the task of type prediction for the gradually typed languages Python and TypeScript.

Paper
Add Code

StarCoder 2 and The Stack v2: The Next Generation

no code implementations • 29 Feb 2024 • Anton Lozhkov, Raymond Li, Loubna Ben allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.

Ranked #25 on Code Generation on MBPP

Code Completion Code Generation +1

Paper
Add Code

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions

1 code implementation • 11 Dec 2023 • Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha

These are tasks in which the model is provided a block of code and an instruction to modify the code.

Paper
Code

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

no code implementations • 19 Aug 2023 • Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha

We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket.

Transfer Learning

Paper
Add Code

StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

no code implementations • 7 Jun 2023 • Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

We use StudentEval to evaluate 5 Code LLMs and find that StudentEval is a better discriminator of model performance than existing benchmarks.

Code Generation

Paper
Add Code

Type Prediction With Program Decomposition and Fill-in-the-Type Training

1 code implementation • 25 May 2023 • Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen

TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain.

Type prediction

Paper
Code

StarCoder: may the source be with you!

4 code implementations • 9 May 2023 • Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15. 5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.

Ranked #43 on Code Generation on MBPP

8k Code Generation

7,115

Paper
Code

SantaCoder: don't reach for the stars!

5 code implementations • 9 Jan 2023 • Loubna Ben allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code.

Code Generation

7,115

Paper
Code

MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

1 code implementation • 17 Aug 2022 • Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda

Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex, CodeGen, and InCoder.

Benchmarking Code Generation

155

Paper
Code

Iterative Program Synthesis for Adaptable Social Navigation

1 code implementation • 8 Mar 2021 • Jarrett Holtz, Simon Andrews, Arjun Guha, Joydeep Biswas

Robot social navigation is influenced by human preferences and environment-specific scenarios such as elevators and doors, thus necessitating end-user adaptability.

Program Synthesis Robotics Programming Languages

Paper
Code

Accelerating Graph Sampling for Graph Machine Learning using GPUs

no code implementations • 14 Sep 2020 • Abhinav Jangda, Sandeep Polisetty, Arjun Guha, Marco Serafini

Several representation learning algorithms for graph data, such as DeepWalk, node2vec, and GraphSAGE, sample the graph to produce mini-batches that are suitable for training a DNN.

BIG-bench Machine Learning Graph Sampling +1

Paper
Add Code

Robot Action Selection Learning via Layered Dimension Informed Program Synthesis

1 code implementation • 10 Aug 2020 • Jarrett Holtz, Arjun Guha, Joydeep Biswas

Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art.

Autonomous Driving Program Repair

Paper
Code

Interactive Robot Transition Repair With SMT

2 code implementations • 5 Feb 2018 • Jarrett Holtz, Arjun Guha, Joydeep Biswas

Complex robot behaviors are often structured as state machines, where states encapsulate actions and a transition function switches between states.

Robotics Programming Languages

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.