1 code implementation • 29 May 2025 • Manish Shetty, Naman jain, Jinjian Liu, Vijay Kethanaboyina, Koushik Sen, Ion Stoica
Developing high-performance software is a complex task that requires specialized expertise.
no code implementations • 9 Apr 2025 • Naman jain, Jaskirat Singh, Manish Shetty, Liang Zheng, Koushik Sen, Ion Stoica
Improving open-source models on real-world SWE tasks (solving GITHUB issues) faces two key challenges: 1) scalable curation of execution environments to train these models, and, 2) optimal scaling of test-time compute.
no code implementations • 28 Mar 2025 • Alex Gu, Naman jain, Wen-Ding Li, Manish Shetty, Yijia Shao, Ziyang Li, Diyi Yang, Kevin Ellis, Koushik Sen, Armando Solar-Lezama
First, we provide a structured taxonomy of concrete tasks in AI for software engineering, emphasizing the many other tasks in software engineering beyond code generation and completion.
no code implementations • 22 Jan 2025 • Naman jain, Amir Kalev
We introduce Quantum Feature Extraction (QuFeX), a novel quantum machine learning module.
no code implementations • 18 Dec 2024 • Manish Shetty, Naman jain, Adwait Godbole, Sanjit A. Seshia, Koushik Sen
We validate the translation by testing equivalence with the source C program on a set of inputs.
2 code implementations • 31 Oct 2024 • Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman jain, Zachary Mueller, Harm de Vries, Leandro von Werra, Arjun Guha, Lingming Zhang
In our primary experiments, we use SelfCodeAlign with CodeQwen1. 5-7B to generate a dataset of 74k instruction-response pairs.
4 code implementations • 22 Jun 2024 • Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu, Zijian Wang, Binyuan Hui, Niklas Muennighoff, David Lo, Daniel Fried, Xiaoning Du, Harm de Vries, Leandro von Werra
In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions.
Ranked #1 on
Code Generation
on BigCodeBench-Instruct
1 code implementation • 15 Mar 2024 • Tianjun Zhang, Shishir G. Patil, Naman jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings.
no code implementations • 12 Mar 2024 • Naman jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica
Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry.
no code implementations • 29 Feb 2024 • Alex Gu, Wen-Ding Li, Naman jain, Theo X. Olausson, Celine Lee, Koushik Sen, Armando Solar-Lezama
In this work, we focus on these counterfeit samples: programs sampled from a language model that 1) have a high enough log-probability to be generated at a moderate temperature and 2) pass weak correctness checks.
4 code implementations • 29 Feb 2024 • Anton Lozhkov, Raymond Li, Loubna Ben allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade, Wenhao Yu, Lucas Krauß, Naman jain, Yixuan Su, Xuanli He, Manan Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang, Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze, Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh, Yacine Jernite, Carlos Muñoz Ferrandis, Lingming Zhang, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries
Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size.
Ranked #36 on
Code Generation
on MBPP
no code implementations • 25 Nov 2023 • Naman jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
no code implementations • 7 Aug 2023 • Aditya G. Parameswaran, Shreya Shankar, Parth Asawa, Naman jain, Yujie Wang
Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone.
no code implementations • 26 May 2021 • Akhilesh Ravi, Amit Yadav, Jainish Chauhan, Jatin Dholakia, Naman jain, Mayank Singh
The increasing use of dialogue agents makes it extremely desirable for them to understand and acknowledge the implied emotions to respond like humans with empathy.
3 code implementations • 15 Feb 2021 • Ajaykrishna Karthikeyan, Naman jain, Nagarajan Natarajan, Prateek Jain
Decision trees provide a rich family of highly non-linear but efficient models, due to which they continue to be the go-to family of predictive models by practitioners across domains.
1 code implementation • 25 Jul 2020 • Naman Jain, Ankush Chauhan, Atharva Chewale, Ojas Mithbavkar, Ujjaval Shah, Mayank Singh
Song lyrics convey a meaningful story in a creative manner with complex rhythmic patterns.
no code implementations • WS 2020 • Sriram Balasubramanian, Naman jain, Gaurav Jindal, Abhijeet Awasthi, Sunita Sarawagi
We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input.
no code implementations • LREC 2020 • Arjit Srivastava, Avijit Vajpayee, Syed Sarfaraz Akhtar, Naman jain, Vinay Singh, Manish Shrivastava
The advent of social media has immensely proliferated the amount of opinions and arguments voiced on the internet.
2 code implementations • 23 Nov 2019 • Davinder Singh, Naman jain, Pranjali Jain, Pratik Kayal, Sudhakar Kumawat, Nipun Batra
Early detection of plant diseases remains difficult due to the lack of lab infrastructure and expertise.
no code implementations • 16 Oct 2019 • Monarch Parmar, Naman jain, Pranjali Jain, P Jayakrishna Sahit, Soham Pachpande, Shruti Singh, Mayank Singh
Also, it provides temporal statistics such as yearwise popularity of topics, datasets, and seminal papers.
no code implementations • 18 Aug 2019 • Sahil Shah, Naman jain, Abhishek Sharma, Arjun Jain
This paper provides a comprehensive and exhaustive study of adversarial attacks on human pose estimation models and the evaluation of their robustness.
no code implementations • COLING 2016 • Riyaz A. Bhat, Irshad A. Bhat, Naman jain, Dipti Misra Sharma
With respect to text processing, addressing the differences between the Hindi and Urdu texts would be beneficial in the following ways: (a) instead of training separate models, their individual resources can be augmented to train single, unified models for better generalization, and (b) their individual text processing applications can be used interchangeably under varied resource conditions.