no code implementations • CL (ACL) 2021 • Lifeng Jin, Lane Schwartz, Finale Doshi-Velez, Timothy Miller, William Schuler
Abstract This article describes a simple PCFG induction model with a fixed category domain that predicts a large majority of attested constituent boundaries, and predicts labels consistent with nearly half of attested constituent labels on a standard evaluation data set of child-directed speech.
no code implementations • EMNLP 2021 • Lifeng Jin, Linfeng Song, Kun Xu, Dong Yu
In order to alleviate the huge demand for annotated datasets for different tasks, many recent natural language processing datasets have adopted automated pipelines for fast-tracking usable data.
no code implementations • Findings (EMNLP) 2021 • Lifeng Jin, Byung-Doh Oh, William Schuler
A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-of-the-art results on many languages, further supporting a distributional model of syntactic acquisition.
1 code implementation • 29 Jan 2025 • Ved Sirdeshmukh, Kaustubh Deshpande, Johannes Mols, Lifeng Jin, Ed-Yeremai Cardona, Dean Lee, Jeremy Kritz, Willow Primack, Summer Yue, Chen Xing
We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications.
1 code implementation • 18 Apr 2024 • Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu
Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning.
Ranked #1 on
GSM8K
on GSM8K
(Accuracy metric)
no code implementations • 14 Apr 2024 • Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu
Current state-of-the-art approaches refine decoding by contrasting early-exit distributions from a lower layer with the final layer to exploit information related to factuality within the model forward procedure.
no code implementations • 14 Mar 2024 • Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu
Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development.
1 code implementation • 6 Mar 2024 • Xiangci Li, Linfeng Song, Lifeng Jin, Haitao Mi, Jessica Ouyang, Dong Yu
In this paper, we present a high-quality benchmark named multi-source Wizard of Wikipedia (Ms. WoW) for evaluating multi-source dialogue knowledge selection and response generation.
no code implementations • 28 Feb 2024 • Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu
The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model.
no code implementations • 23 Feb 2024 • Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu
Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs.
no code implementations • 14 Feb 2024 • Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng
Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i. e. "hallucinations", even when they hold relevant knowledge.
1 code implementation • 18 Jan 2024 • Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Dong Yu
One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem.
1 code implementation • 9 Nov 2023 • Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu
We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner.
1 code implementation • 28 Sep 2023 • Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu
We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM.
no code implementations • 18 Sep 2023 • Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu
Large Language Models (LLMs) have revolutionized natural language processing, yet aligning these models with human values and preferences using RLHF remains a significant challenge.
no code implementations • 31 Jan 2023 • Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Xiabing Zhou, Dong Yu
Current self-training methods such as standard self-training, co-training, tri-training, and others often focus on improving model performance on a single task, utilizing differences in input features, model architectures, and training processes.
no code implementations • 8 Nov 2022 • Wenyue Hua, Lifeng Jin, Linfeng Song, Haitao Mi, Yongfeng Zhang, Dong Yu
Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors.
1 code implementation • 22 Oct 2022 • Fei Wang, Kaiqiang Song, Hongming Zhang, Lifeng Jin, Sangwoo Cho, Wenlin Yao, Xiaoyang Wang, Muhao Chen, Dong Yu
Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.
Ranked #7 on
Abstractive Text Summarization
on CNN / Daily Mail
1 code implementation • 22 Oct 2022 • Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo
While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.
1 code implementation • 22 Jun 2022 • Lisa Jin, Linfeng Song, Lifeng Jin, Dong Yu, Daniel Gildea
HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context.
no code implementations • 27 Apr 2022 • Lifeng Jin, Kun Xu, Linfeng Song, Dong Yu
Approaches for the stance classification task, an important task for understanding argumentation in debates and detecting fake news, have been relying on models which deal with individual debate topics.
1 code implementation • EMNLP 2021 • Wenlin Yao, Xiaoman Pan, Lifeng Jin, Jianshu Chen, Dian Yu, Dong Yu
We then train a model to identify semantic equivalence between a target word in context and one of its glosses using these aligned inventories, which exhibits strong transfer capability to many WSD tasks.
no code implementations • ACL 2021 • Han Wu, Kun Xu, Linfeng Song, Lifeng Jin, Haisong Zhang, Linqi Song
Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks.
1 code implementation • NAACL 2021 • Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo
We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Lifeng Jin, William Schuler
Recent work in unsupervised parsing has tried to incorporate visual information into learning, but results suggest that these models need linguistic bias to compete against models that only rely on text.
no code implementations • WS 2020 • Lifeng Jin, William Schuler
Syntactic surprisal has been shown to have an effect on human sentence processing, and can be predicted from prefix probabilities of generative incremental parsers.
no code implementations • WS 2020 • Lifeng Jin, William Schuler
Recent progress in grammar induction has shown that grammar induction is possible without explicit assumptions of language-specific knowledge.
no code implementations • ACL 2019 • Lifeng Jin, William Schuler
In unsupervised grammar induction, data likelihood is known to be only weakly correlated with parsing accuracy, especially at convergence after multiple runs.
no code implementations • ACL 2019 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, Lane Schwartz, William Schuler
This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information.
1 code implementation • EMNLP 2018 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016; Jin et al., 2018).
no code implementations • WS 2018 • Lifeng Jin, David King, Amad Hussein, Michael White, Douglas Danforth
When interpreting questions in a virtual patient dialogue system one must inevitably tackle the challenge of a long tail of relatively infrequently asked questions.
1 code implementation • TACL 2018 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016).
no code implementations • WS 2017 • Lifeng Jin, Michael White, Evan Jaffe, Laura Zimmerman, Douglas Danforth
For medical students, virtual patient dialogue systems can provide useful training opportunities without the cost of employing actors to portray standardized patients.
no code implementations • COLING 2016 • Cory Shain, William Bryce, Lifeng Jin, Victoria Krakovna, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM).