2 code implementations • 31 Oct 2024 • Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman jain, Zachary Mueller, Harm de Vries, Leandro von Werra, Arjun Guha, Lingming Zhang
In our primary experiments, we use SelfCodeAlign with CodeQwen1. 5-7B to generate a dataset of 74k instruction-response pairs.
2 code implementations • 5 Sep 2024 • Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang
We empirically demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language.
1 code implementation • 12 Jun 2024 • Chloe Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, Max Tegmark
We introduce DafnyBench, the largest benchmark of its kind for training and evaluating machine learning systems for formal software verification.
4 code implementations • 29 Feb 2024 • Anton Lozhkov, Raymond Li, Loubna Ben allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries
Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.
Ranked #32 on Code Generation on MBPP
1 code implementation • 13 Feb 2024 • David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin
In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq.
1 code implementation • 11 Dec 2023 • Federico Cassano, Luisa Li, Akul Sethi, Noah Shinn, Abby Brennan-Jones, Jacob Ginesin, Edward Berman, George Chakhnashvili, Anton Lozhkov, Carolyn Jane Anderson, Arjun Guha
These are tasks in which the model is provided a block of code and an instruction to modify the code.
no code implementations • 19 Aug 2023 • Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Anders Freeman, Carolyn Jane Anderson, Molly Q Feldman, Michael Greenberg, Abhinav Jangda, Arjun Guha
We apply this approach to generate tens of thousands of validated training items for Julia, Lua, OCaml, R, and Racket.
1 code implementation • 25 May 2023 • Federico Cassano, Ming-Ho Yee, Noah Shinn, Arjun Guha, Steven Holtzen
TypeScript and Python are two programming languages that support optional type annotations, which are useful but tedious to introduce and maintain.
4 code implementations • NeurIPS 2023 • Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao
Large language models (LLMs) have been increasingly used to interact with external environments (e. g., games, compilers, APIs) as goal-driven agents.
1 code implementation • 17 Aug 2022 • Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda
Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex, CodeGen, and InCoder.