1 code implementation • 11 Sep 2024 • Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu
Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch.
no code implementations • 20 Jun 2024 • Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes.
1 code implementation • 10 Jun 2024 • Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim
While more expressive variants of linear transformers which replace the additive outer-product update in linear transformers with the delta rule have been found to be more effective at associative recall, existing algorithms for training such models do not parallelize over sequence length and are thus inefficient to train on modern hardware.
1 code implementation • 4 Apr 2024 • Yi Ren, Shangmin Guo, Linlu Qiu, Bailin Wang, Danica J. Sutherland
With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase.
1 code implementation • 6 Mar 2024 • Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag
We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level.
1 code implementation • 23 Jan 2024 • Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas
Finally, we show that hard-wiring these heads into neural models improves performance not just on ICLL, but natural language modeling -- improving the perplexity of 340M-parameter models by up to 1. 14 points (6. 7%) on the SlimPajama dataset.
no code implementations • 19 Jan 2024 • Mayank Agarwal, Yikang Shen, Bailin Wang, Yoon Kim, Jie Chen
In this work, we explore data-efficient adaptation of pre-trained code models by further pre-training and fine-tuning them with program structures.
3 code implementations • 11 Dec 2023 • Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, Yoon Kim
When used as a replacement for the standard attention layer in Transformers, the resulting gated linear attention (GLA) Transformer is found to perform competitively against the LLaMA-architecture Transformer (Touvron et al., 2023) as well recent linear-time-inference baselines such as RetNet (Sun et al., 2023a) and Mamba (Gu & Dao, 2023) on moderate-scale language modeling experiments.
1 code implementation • 13 Nov 2023 • Zilu Tang, Mayank Agarwal, Alex Shypula, Bailin Wang, Derry Wijaya, Jie Chen, Yoon Kim
This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models.
1 code implementation • 12 Oct 2023 • Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence.
1 code implementation • 10 Oct 2023 • Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu
We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.
1 code implementation • 8 Oct 2023 • Chengwen Qi, Bowen Li, Binyuan Hui, Bailin Wang, Jinyang Li, Jinwang Wu, Yuanjun Laili
Our ConvRE features two tasks, Re2Text and Text2Re, which are formulated as multi-choice question answering to evaluate LLMs' ability to determine the matching between relations and associated text.
1 code implementation • 2 Oct 2023 • Lirui Wang, Yiyang Ling, Zhecheng Yuan, Mohit Shridhar, Chen Bao, Yuzhe Qin, Bailin Wang, Huazhe Xu, Xiaolong Wang
Collecting large amounts of real-world interaction data to train general robotic policies is often prohibitively expensive, thus motivating the use of simulation data.
1 code implementation • 5 Jul 2023 • Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, Yoon Kim
The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills.
2 code implementations • NeurIPS 2023 • Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, Yoon Kim
Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples.
1 code implementation • 27 May 2023 • Daking Rai, Bailin Wang, Yilun Zhou, Ziyu Yao
Compositional and domain generalization present significant challenges in semantic parsing, even for state-of-the-art semantic parsers based on pre-trained language models (LMs).
Ranked #6 on Text-To-SQL on spider
1 code implementation • 22 May 2023 • Jiaxi Yang, Binyuan Hui, Min Yang, Bailin Wang, Bowen Li, Binhua Li, Fei Huang, Yongbin Li
Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample.
no code implementations • NeurIPS 2023 • Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C. C. Chang, Fei Huang, Reynold Cheng, Yongbin Li
Our emphasis on database values highlights the new challenges of dirty database contents, external knowledge between NL questions and database contents, and SQL efficiency, particularly in the context of massive databases.
Ranked #1 on Text-To-SQL on BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) (Execution Accurarcy (Human) metric)
no code implementations • 25 Jan 2023 • Daking Rai, Yilun Zhou, Bailin Wang, Ziyu Yao
While large language models (LLMs) have demonstrated strong capability in structured prediction tasks such as semantic parsing, few amounts of research have explored the underlying mechanisms of their success.
1 code implementation • 15 Nov 2022 • Bailin Wang, Ivan Titov, Jacob Andreas, Yoon Kim
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference.
2 code implementations • 28 Jun 2022 • Lihan Wang, Bowen Qin, Binyuan Hui, Bowen Li, Min Yang, Bailin Wang, Binhua Li, Fei Huang, Luo Si, Yongbin Li
The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i. e., properly recognizing mentions of unseen columns or tables when generating SQLs.
1 code implementation • 16 Jan 2022 • Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I. Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu
Structured knowledge grounding (SKG) leverages structured knowledge to complete user requests, such as semantic parsing over databases and question answering over knowledge bases.
Ranked #1 on Task-Oriented Dialogue Systems on KVRET
1 code implementation • ACL 2021 • Henry Conklin, Bailin Wang, Kenny Smith, Ivan Titov
Natural language is compositional; the meaning of a sentence is a function of the meaning of its parts.
1 code implementation • NeurIPS 2021 • Bailin Wang, Mirella Lapata, Ivan Titov
Despite success in many domains, neural models struggle in settings where train and test examples are drawn from different distributions.
1 code implementation • NAACL 2021 • Bailin Wang, Mirella Lapata, Ivan Titov
Based on the observation that programs which correspond to NL utterances must be always executable, we propose to encourage a parser to generate executable programs for unlabeled utterances.
1 code implementation • NAACL 2021 • Bailin Wang, Wenpeng Yin, Xi Victoria Lin, Caiming Xiong
Moreover, explicitly modeling compositions using PCFG leads to a better exploration of unseen programs, thus generate more diverse data.
no code implementations • NAACL 2021 • Bailin Wang, Mirella Lapata, Ivan Titov
The importance of building semantic parsers which can be applied to new domains and generate programs unseen at training has long been acknowledged, and datasets testing out-of-domain performance are becoming increasingly available.
1 code implementation • ICLR 2021 • Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong
We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.
Ranked #8 on Semantic Parsing on spider
4 code implementations • ACL 2020 • Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, Matthew Richardson
The generalization challenge lies in (a) encoding the database relations in an accessible way for the semantic parser, and (b) modeling alignment between database columns and their mentions in a given query.
Ranked #9 on Semantic Parsing on spider
1 code implementation • IJCNLP 2019 • Bailin Wang, Ivan Titov, Mirella Lapata
Semantic parsing aims to map natural language utterances onto machine interpretable meaning representations, aka programs whose execution against a real-world environment produces a denotation.
Ranked #14 on Semantic Parsing on WikiTableQuestions
1 code implementation • IJCNLP 2019 • Bailin Wang, Wei Lu
In medical documents, it is possible that an entity of interest not only contains a discontiguous sequence of words but also overlaps with another entity.
1 code implementation • EMNLP 2018 • Bailin Wang, Wei Lu, Yu Wang, Hongxia Jin
It is common that entity mentions can contain other mentions recursively.
Ranked #6 on Nested Named Entity Recognition on NNE
Nested Mention Recognition Nested Named Entity Recognition +1
1 code implementation • EMNLP 2018 • Bailin Wang, Wei Lu
In this work, we propose a novel segmental hypergraph representation to model overlapping entity mentions that are prevalent in many practical datasets.
Ranked #5 on Nested Named Entity Recognition on NNE
Nested Mention Recognition Nested Named Entity Recognition +1
1 code implementation • AAAI-18 2018 • Bailin Wang, Wei Lu
Aspect-level sentiment classification aims at detecting the sentiment expressed towards a particular target in a sentence.