Search Results for author: Pengcheng Yin

Found 35 papers, 17 papers with code

Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

no code implementations13 Feb 2024 Miltiadis Allamanis, Sheena Panthaplackel, Pengcheng Yin

To evaluate code large language models (LLMs), research has relied on a few small manually curated benchmarks, such as HumanEval and MBPP, which represent a narrow part of the real-world software domains.

Grounding Data Science Code Generation with Input-Output Specifications

no code implementations12 Feb 2024 Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri, Alex Polozov

Specifically, we propose GIFT4Code, a novel approach for the instruction fine-tuning of LLMs with respect to I/O specifications.

Code Generation

Universal Self-Consistency for Large Language Model Generation

no code implementations29 Nov 2023 Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, Denny Zhou

Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on various challenging tasks, by utilizing multiple reasoning paths sampled from large language models (LLMs).

Code Generation Language Modelling +3

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

no code implementations29 Sep 2023 Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner.

Code Generation Math +1

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

no code implementations26 Jul 2023 Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks.

Program Synthesis

SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL

no code implementations26 May 2023 Ruoxi Sun, Sercan O. Arik, Hootan Nakhost, Hanjun Dai, Rajarishi Sinha, Pengcheng Yin, Tomas Pfister

One impressive emergent capability of large language models (LLMs) is generation of code, including Structured Query Language (SQL) for databases.

In-Context Learning Language Modelling +2

PaLM 2 Technical Report

1 code implementation17 May 2023 Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Code Generation Common Sense Reasoning +6

Natural Language to Code Generation in Interactive Data Science Notebooks

no code implementations19 Dec 2022 Pengcheng Yin, Wen-Ding Li, Kefan Xiao, Abhishek Rao, Yeming Wen, Kensen Shi, Joshua Howland, Paige Bailey, Michele Catasta, Henryk Michalewski, Alex Polozov, Charles Sutton

To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks.

Code Generation Language Modelling

SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization

1 code implementation19 Jul 2022 Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, Pengcheng Yin

Recent advances in data processing have stimulated the demand for learning graphs of very large scales.

Graph Embedding Graph Learning

Compositional Generalization and Decomposition in Neural Program Synthesis

no code implementations7 Apr 2022 Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

We first characterize several different axes along which program synthesis methods would be desired to generalize, e. g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data.

Program Synthesis

Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data

1 code implementation ACL 2022 Shuyan Zhou, Li Zhang, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch, Graham Neubig

To this end, we develop a simple and efficient method that links steps (e. g., "purchase a camera") in an article to other articles with similar goals (e. g., "how to choose a camera"), recursively constructing the KB.

Retrieval Video Retrieval

Learning to Superoptimize Real-world Programs

no code implementations28 Sep 2021 Alex Shypula, Pengcheng Yin, Jeremy Lacomis, Claire Le Goues, Edward Schwartz, Graham Neubig

We also report that SILO's rate of superoptimization on our test set is over five times that of a standard policy gradient approach and a model pre-trained on compiler optimization demonstration.

Compiler Optimization Imitation Learning

Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language

no code implementations NAACL (SUKI) 2022 Shuyan Zhou, Pengcheng Yin, Graham Neubig

When humans conceive how to perform a particular task, they do so hierarchically: splitting higher-level tasks into smaller sub-tasks.

Instruction Following

Learning Structural Edits via Incremental Tree Transformations

1 code implementation ICLR 2021 Ziyu Yao, Frank F. Xu, Pengcheng Yin, Huan Sun, Graham Neubig

To show the unique benefits of modeling tree edits directly, we further propose a novel edit encoder for learning to represent edits, as well as an imitation learning method that allows the editor to be more robust.

Imitation Learning

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

1 code implementation ACL 2020 Pengcheng Yin, Graham Neubig, Wen-tau Yih, Sebastian Riedel

Recent years have witnessed the burgeoning of pretrained language models (LMs) for text-based natural language (NL) understanding tasks.

Semantic Parsing Text-To-SQL

Merging Weak and Active Supervision for Semantic Parsing

1 code implementation29 Nov 2019 Ansong Ni, Pengcheng Yin, Graham Neubig

Experiments on WikiTableQuestions with human annotators show that our method can improve the performance with only 100 active queries, especially for weakly-supervised parsers learnt from a cold start.

Active Learning Semantic Parsing

Reranking for Neural Semantic Parsing

no code implementations ACL 2019 Pengcheng Yin, Graham Neubig

Semantic parsing considers the task of transducing natural language (NL) utterances into machine executable meaning representations (MRs).

Code Generation Semantic Parsing

Improving Open Information Extraction via Iterative Rank-Aware Learning

1 code implementation ACL 2019 Zhengbao Jiang, Pengcheng Yin, Graham Neubig

We found that the extraction likelihood, a confidence measure used by current supervised open IE systems, is not well calibrated when comparing the quality of assertions extracted from different sentences.

Binary Classification General Classification +1

TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation

4 code implementations EMNLP 2018 Pengcheng Yin, Graham Neubig

We present TRANX, a transition-based neural semantic parser that maps natural language (NL) utterances into formal meaning representations (MRs).

Code Generation Semantic Parsing

Retrieval-Based Neural Code Generation

1 code implementation EMNLP 2018 Shirley Anugrah Hayati, Raphael Olivier, Pravalika Avvaru, Pengcheng Yin, Anthony Tomasic, Graham Neubig

In models to generate program source code from natural language, representing this code in a tree structure has been a common approach.

Code Generation Retrieval +2

A Tree-based Decoder for Neural Machine Translation

1 code implementation EMNLP 2018 Xinyi Wang, Hieu Pham, Pengcheng Yin, Graham Neubig

Recent advances in Neural Machine Translation (NMT) show that adding syntactic information to NMT systems can improve the quality of their translations.

Machine Translation NMT +2

StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

7 code implementations ACL 2018 Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig

Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures.

Code Generation Semantic Parsing

Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow

no code implementations23 May 2018 Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, Graham Neubig

For tasks like code synthesis from natural language, code retrieval, and code summarization, data-driven models have shown great promise.

Code Summarization Retrieval +1

Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

no code implementations ICLR 2018 Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy

Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes.

Dependency Parsing Image Captioning +6

A Syntactic Neural Model for General-Purpose Code Generation

6 code implementations ACL 2017 Pengcheng Yin, Graham Neubig

We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python.

Code Generation Semantic Parsing +1

DyNet: The Dynamic Neural Network Toolkit

4 code implementations15 Jan 2017 Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.

graph construction

Neural Enquirer: Learning to Query Tables with Natural Language

no code implementations3 Dec 2015 Pengcheng Yin, Zhengdong Lu, Hang Li, Ben Kao

Neural Enquirer can be trained with gradient descent, with which not only the parameters of the controlling components and semantic parsing component, but also the embeddings of the tables and query words can be learned from scratch.

Semantic Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.