Search Results for author: Lingpeng Kong

Found 132 papers, 80 papers with code

PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models

1 code implementation4 Mar 2025 Xueliang Zhao, Wei Wu, Jian Guan, Lingpeng Kong

The ability of large language models to solve complex mathematical problems has progressed significantly, particularly for tasks requiring advanced reasoning.

GSM8K Math +1

Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions

1 code implementation4 Mar 2025 Zirui Wu, Xiao Liu, Jiayi Li, Lingpeng Kong, Yansong Feng

While Large Language Model-based agents have demonstrated substantial progress in task completion, existing evaluation benchmarks tend to overemphasize single-task performance, with insufficient attention given to the crucial aspects of multitask planning and execution efficiency required in real-world scenarios.

Language Modeling Language Modelling +1

Implicit Search via Discrete Diffusion: A Study on Chess

1 code implementation27 Feb 2025 Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong

Furthermore, DiffuSearch demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment.

Reasoning Does Not Necessarily Improve Role-Playing Ability

no code implementations24 Feb 2025 Xiachong Feng, Longxu Dou, Lingpeng Kong

The application of role-playing large language models (LLMs) is rapidly expanding in both academic and commercial domains, driving an increasing demand for high-precision role-playing models.

BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning

1 code implementation23 Feb 2025 Haiteng Zhao, Chang Ma, Fangzhi Xu, Lingpeng Kong, Zhi-Hong Deng

The applications of large language models (LLMs) in various biological domains have been explored recently, but their reasoning ability in complex biological systems, such as pathways, remains underexplored, which is crucial for predicting biological phenomena, formulating hypotheses, and designing experiments.

Benchmarking

ParallelComp: Parallel Long-Context Compressor for Length Extrapolation

no code implementations20 Feb 2025 Jing Xiong, Jianghan Shen, Chuanyang Zheng, Zhongwei Wan, Chenyang Zhao, Chiwun Yang, Fanghua Ye, Hongxia Yang, Lingpeng Kong, Ngai Wong

To mitigate the attention sink issue, we propose an attention calibration strategy that reduces biases, ensuring more stable long-range attention.

4k 8k

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

no code implementations26 Nov 2024 Lei LI, Yuancheng Wei, Zhihui Xie, Xuqing Yang, YiFan Song, Peiyi Wang, Chenxin An, Tianyu Liu, Sujian Li, Bill Yuchen Lin, Lingpeng Kong, Qi Liu

Vision-language generative reward models (VL-GenRMs) play a crucial role in aligning and evaluating multimodal AI systems, yet their own evaluation remains under-explored.

Hallucination

Why Does the Effective Context Length of LLMs Fall Short?

no code implementations24 Oct 2024 Chenxin An, Jun Zhang, Ming Zhong, Lei LI, Shansan Gong, Yao Luo, Jingjing Xu, Lingpeng Kong

Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs).

Attribute

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

1 code implementation23 Oct 2024 Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, Lingpeng Kong

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models.

In-Context Learning Language Modeling +1

Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration

no code implementations22 Oct 2024 Qintong Li, Jiahui Gao, Sheng Wang, Renjie Pi, Xueliang Zhao, Chuan Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong

In this paper, we present a novel approach, ReverseGen, designed to automatically generate effective training samples that expose the weaknesses of LLMs.

Math

Non-myopic Generation of Language Models for Reasoning and Planning

1 code implementation22 Oct 2024 Chang Ma, Haiteng Zhao, Junlei Zhang, Junxian He, Lingpeng Kong

Large Language Models have demonstrated remarkable abilities in reasoning and planning by breaking down complex problems into sequential steps.

Computational Efficiency Language Modelling +3

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

1 code implementation18 Oct 2024 Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, Lingpeng Kong

Our work highlights the potential of diffusion-based approaches in advancing AI capabilities for sophisticated language understanding and problem-solving tasks.

ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom

no code implementations18 Oct 2024 Jingqi Zhou, Sheng Wang, Jingwei Dong, Lei LI, Jiahui Gao, Lingpeng Kong, Chuan Wu

Notably, the disassociation of capabilities allows seamless integration of existing large language models (LLMs) to compensate for the reasoning deficits of LVLMs.

Visual Reasoning

Understanding the Role of LLMs in Multimodal Evaluation Benchmarks

1 code implementation16 Oct 2024 Botian Jiang, Lei LI, Xiaonan Li, Zhaowei Li, Xiachong Feng, Lingpeng Kong, Qi Liu, Xipeng Qiu

The rapid advancement of Multimodal Large Language Models (MLLMs) has been accompanied by the development of various benchmarks to evaluate their capabilities.

Benchmarking Large Language Model +2

QSpec: Speculative Decoding with Complementary Quantization Schemes

no code implementations15 Oct 2024 Juntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu

Compared to high-precision quantization methods, QSPEC empirically boosts token generation throughput by up to 1. 64x without any quality compromise, distinguishing it from other low-precision quantization approaches.

Quantization

TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs

1 code implementation14 Oct 2024 Haochuan Wang, Xiachong Feng, Lei LI, Zhanyue Qin, Dianbo Sui, Lingpeng Kong

The rapid advancement of large language models (LLMs) has accelerated their application in reasoning, with strategic reasoning drawing increasing attention.

Synthetic Data Generation

VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment

no code implementations12 Oct 2024 Lei LI, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong, Qi Liu

As large vision-language models (LVLMs) evolve rapidly, the demand for high-quality and diverse data to align these models becomes increasingly crucial.

Diversity Hallucination +3

Temporal Reasoning Transfer from Text to Video

no code implementations8 Oct 2024 Lei LI, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu sun, Lingpeng Kong, Qi Liu

Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships.

Diagnostic MME +2

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

no code implementations3 Oct 2024 Zixuan Li, Jing Xiong, Fanghua Ye, Chuanyang Zheng, Xun Wu, Jianqiao Lu, Zhongwei Wan, Xiaodan Liang, Chengming Li, Zhenan Sun, Lingpeng Kong, Ngai Wong

We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) that utilizes Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks.

Chunking Language Modeling +3

MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards

no code implementations1 Oct 2024 Sheng Wang, Liheng Chen, Pengan Chen, Jingwei Dong, Boyang Xue, Jiyue Jiang, Lingpeng Kong, Chuan Wu

The rapid scaling of large language models necessitates more lightweight finetuning methods to reduce the explosive GPU memory overhead when numerous customized models are served simultaneously.

CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration

no code implementations17 Sep 2024 Jiahui Gao, Renjie Pi, Tianyang Han, Han Wu, Lanqing Hong, Lingpeng Kong, Xin Jiang, Zhenguo Li

The deployment of multimodal large language models (MLLMs) has demonstrated remarkable success in engaging in conversations involving visual inputs, thanks to the superior power of large language models (LLMs).

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

1 code implementation29 Aug 2024 Jiyue Jiang, Pengan Chen, Liheng Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages.

Benchmarking General Knowledge

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

1 code implementation20 Aug 2024 Xueliang Zhao, Lin Zheng, Haige Bo, Changran Hu, Urmish Thakker, Lingpeng Kong

This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment.

Ranked #3 on Automated Theorem Proving on miniF2F-test (using extra training data)

Automated Theorem Proving

Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting

no code implementations24 Jun 2024 Jiyue Jiang, Liheng Chen, Sheng Wang, Lingpeng Kong, Yu Li, Chuan Wu

The thought generated by the progressive thought generator serves as a prompt to prevent the generated dialogue from having significant semantic deviations, while the psychology knowledge generator produces psychological knowledge to serve as the dialogue history for the LLM, guiding the dialogue generator to create multi-turn psychological dialogue.

Data Augmentation Dialogue Generation

Jailbreaking as a Reward Misspecification Problem

1 code implementation20 Jun 2024 Zhihui Xie, Jiahui Gao, Lei LI, Zhenguo Li, Qi Liu, Lingpeng Kong

In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process.

Red Teaming

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond

2 code implementations21 Mar 2024 Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, XiaoLi Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains.

Survey

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

1 code implementation5 Mar 2024 Xijia Tao, Shuai Zhong, Lei LI, Qi Liu, Lingpeng Kong

In this paper, we propose a novel jailbreaking attack against VLMs, aiming to bypass their safety barrier when a user inputs harmful instructions.

Training-Free Long-Context Scaling of Large Language Models

1 code implementation27 Feb 2024 Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong

The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length.

16k

LoRA Meets Dropout under a Unified Framework

no code implementations25 Feb 2024 Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu

Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked.

Empowering Large Language Model Agents through Action Learning

1 code implementation24 Feb 2024 Haiteng Zhao, Chang Ma, Guoyin Wang, Jing Su, Lingpeng Kong, Jingjing Xu, Zhi-Hong Deng, Hongxia Yang

Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error, a key element of intelligent behavior.

Language Modeling Language Modelling +2

PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

1 code implementation24 Feb 2024 Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

Hopefully, the conspicuously higher parameter efficiency can establish PRoLoRA as a resource-friendly alternative to LoRA.

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

1 code implementation12 Feb 2024 Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong

Autonomous interaction with the computer has been a longstanding challenge with great potential, and the recent proliferation of large language models (LLMs) has markedly accelerated progress in building digital agents.

Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models

1 code implementation12 Feb 2024 Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong

Recently, diffusion models have garnered significant interest in the field of text processing due to their many potential advantages compared to conventional autoregressive models.

Language Modeling Language Modelling +1

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

2 code implementations24 Jan 2024 Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, Junxian He

Evaluating Large Language Models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications.

Benchmarking

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

2 code implementations18 Dec 2023 Jiahui Gao, Renjie Pi, Jipeng Zhang, Jiacheng Ye, Wanjun Zhong, YuFei Wang, Lanqing Hong, Jianhua Han, Hang Xu, Zhenguo Li, Lingpeng Kong

We first analyze the limitations of current Multimodal Large Language Models (MLLMs) in this area: they struggle to accurately comprehending basic geometric elements and their relationships.

Language Modeling Language Modelling +2

Linear Attention via Orthogonal Memory

no code implementations18 Dec 2023 Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, Lingpeng Kong

Given that orthogonal memory compresses global information, we further dissect the context to amplify fine-grained local information.

Causal Language Modeling Computational Efficiency +2

Silkie: Preference Distillation for Large Visual Language Models

no code implementations17 Dec 2023 Lei LI, Zhihui Xie, Mukai Li, Shunian Chen, Peiyi Wang, Liang Chen, Yazheng Yang, Benyou Wang, Lingpeng Kong

This paper explores preference distillation for large vision language models (LVLMs), improving their ability to generate helpful and faithful responses anchoring the visual context.

Hallucination MME +1

Self-Infilling Code Generation

1 code implementation29 Nov 2023 Lin Zheng, Jianbo Yuan, Zhi Zhang, Hongxia Yang, Lingpeng Kong

This work introduces self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.

Code Generation

Exploring the Reliability of Large Language Models as Customized Evaluators for Diverse NLP Tasks

1 code implementation30 Oct 2023 Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi

Previous work adopts large language models (LLMs) as evaluators to evaluate natural language process (NLP) tasks.

Fairness Math +1

SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving

no code implementations19 Oct 2023 Xueliang Zhao, Xinting Huang, Wei Bi, Lingpeng Kong

Large Language Models (LLMs) have driven substantial progress in artificial intelligence in recent years, exhibiting impressive capabilities across a wide range of tasks, including mathematical problem-solving.

GSM8K Math +1

Attentive Multi-Layer Perceptron for Non-autoregressive Generation

1 code implementation14 Oct 2023 Shuyang Jiang, Jun Zhang, Jiangtao Feng, Lin Zheng, Lingpeng Kong

Furthermore, we marry AMLP with popular NAR models, deriving a highly efficient NAR-AMLP architecture with linear time and space complexity.

Machine Translation Speech Synthesis +2

Lemur: Harmonizing Natural Language and Code for Language Agents

1 code implementation10 Oct 2023 Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu

We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents.

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

1 code implementation30 Sep 2023 Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong

Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge.

World Knowledge

Extrapolating Large Language Models to Non-English by Aligning Languages

2 code implementations9 Aug 2023 Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen, Lei LI

We start from targeting individual languages by performing cross-lingual instruction-tuning (CoIT) on LLaMA, i. e. tuning it with translation task data and cross-lingual general task data to obtain cross-lingual models (x-LLaMAs), and formulate underlying scaling laws to investigate the advantages of using scalable translation data.

Translation

L-Eval: Instituting Standardized Evaluation for Long Context Language Models

3 code implementations20 Jul 2023 Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu

Recently, there has been growing interest in extending the context length of large language models (LLMs), aiming to effectively process long inputs of one turn or conversations with more extensive histories.

Instruction Following

Linearized Relative Positional Encoding

no code implementations18 Jul 2023 Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers.

Image Classification Language Modeling +3

Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability

1 code implementation11 Jun 2023 Jiacheng Ye, Xijia Tao, Lingpeng Kong

First, does multilingual transfer ability exist in English-centric models and how does it compare with multilingual pretrained models?

INK: Injecting kNN Knowledge in Nearest Neighbor Machine Translation

1 code implementation10 Jun 2023 Wenhao Zhu, Jingjing Xu, ShuJian Huang, Lingpeng Kong, Jiajun Chen

We propose an effective training framework INK to directly smooth the representation space via adjusting representations of kNN neighbors with a small number of new parameters.

Machine Translation Translation

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

no code implementations7 Jun 2023 Lei LI, Yuwei Yin, Shicheng Li, Liang Chen, Peiyi Wang, Shuhuai Ren, Mukai Li, Yazheng Yang, Jingjing Xu, Xu sun, Lingpeng Kong, Qi Liu

To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed to optimize VLM alignment with human instructions.

World Knowledge

Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving

1 code implementation25 May 2023 Xueliang Zhao, Wenda Li, Lingpeng Kong

Large language models~(LLMs) present an intriguing avenue of exploration in the domain of formal theorem proving.

Automated Theorem Proving

Optimizing Non-Autoregressive Transformers with Contrastive Learning

no code implementations23 May 2023 Chenxin An, Jiangtao Feng, Fei Huang, Xipeng Qiu, Lingpeng Kong

In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.

Contrastive Learning Machine Translation +2

Can Language Models Understand Physical Concepts?

1 code implementation23 May 2023 Lei LI, Jingjing Xu, Qingxiu Dong, Ce Zheng, Qi Liu, Lingpeng Kong, Xu sun

Language models~(LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite.

Generating Data for Symbolic Language with Large Language Models

1 code implementation23 May 2023 Jiacheng Ye, Chengzu Li, Lingpeng Kong, Tao Yu

However, such an approach has primarily been applied to natural language tasks and has not yet been explored for symbolic language tasks with complex structured outputs (e. g., semantic parsing and code generation).

Code Generation Semantic Parsing

DetGPT: Detect What You Need via Reasoning

1 code implementation23 May 2023 Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

Overall, our proposed paradigm and DetGPT demonstrate the potential for more sophisticated and intuitive interactions between humans and machines.

Autonomous Driving Object +2

A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment

no code implementations14 May 2023 Jiyue Jiang, Sheng Wang, Qintong Li, Lingpeng Kong, Chuan Wu

In this paper, we propose a multi-source knowledge fusion method for CS dialogue (CSD), to generate open-ended responses guided by the CS principle and emotional support strategy.

Decoder

A Challenging Benchmark for Low-Resource Learning

1 code implementation7 Mar 2023 Yudong Wang, Chang Ma, Qingxiu Dong, Lingpeng Kong, Jingjing Xu

Experiments on a wide range of models show that neural networks, even pre-trained language models, have sharp performance drops on our benchmark, demonstrating the effectiveness on evaluating the weaknesses of neural networks.

Retrieved Sequence Augmentation for Protein Representation Learning

1 code implementation24 Feb 2023 Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong

RSA links query protein sequences to a set of sequences with similar structures or properties in the database and combines these sequences for downstream prediction.

Prediction Property Prediction +2

Compositional Exemplars for In-context Learning

1 code implementation11 Feb 2023 Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Tao Yu, Lingpeng Kong

The performance of ICL is highly dominated by the quality of the selected in-context examples.

Code Generation Contrastive Learning +6

A Reparameterized Discrete Diffusion Model for Text Generation

1 code implementation11 Feb 2023 Lin Zheng, Jianbo Yuan, Lei Yu, Lingpeng Kong

This work studies discrete diffusion probabilistic models with applications to natural language generation.

model Text Generation

In-Context Learning with Many Demonstration Examples

1 code implementation9 Feb 2023 Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong Wu, Lingpeng Kong

Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data.

16k 8k +3

Efficient Attention via Control Variates

1 code implementation9 Feb 2023 Lin Zheng, Jianbo Yuan, Chong Wang, Lingpeng Kong

Built upon previous progress of RFA, we characterize this gap through the lens of control variates and show that RFA can be decomposed into a sum of multiple control variate estimators for each element in the sequence.

Audio-Visual Segmentation with Semantics

1 code implementation30 Jan 2023 Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering

1 code implementation20 Dec 2022 Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, Lingpeng Kong

Despite the surprising few-shot performance of in-context learning (ICL), it is still a common practice to randomly sample examples to serve as context.

In-Context Learning

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation

1 code implementation20 Dec 2022 Fei Yuan, Yinquan Lu, Wenhao Zhu, Lingpeng Kong, Lei LI, Yu Qiao, Jingjing Xu

To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT.

Machine Translation Translation

Explanation Regeneration via Information Bottleneck

1 code implementation19 Dec 2022 Qintong Li, Zhiyong Wu, Lingpeng Kong, Wei Bi

Explaining the black-box predictions of NLP models naturally and accurately is an important open problem in natural language generation.

Explanation Generation Language Modeling +3

Unsupervised Explanation Generation via Correct Instantiations

no code implementations21 Nov 2022 Sijie Cheng, Zhiyong Wu, Jiangjie Chen, Zhixing Li, Yang Liu, Lingpeng Kong

The major difficulty is finding the conflict point, where the statement contradicts our real world.

Explanation Generation

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

1 code implementation24 Oct 2022 Changlong Yu, Tianyi Xiao, Lingpeng Kong, Yangqiu Song, Wilfred Ng

Though linguistic knowledge emerges during large-scale language model pretraining, recent work attempt to explicitly incorporate human-defined linguistic priors into task-specific fine-tuning.

Language Modeling Language Modelling

ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

2 code implementations22 Oct 2022 Jiacheng Ye, Jiahui Gao, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples.

Data-free Knowledge Distillation Dataset Generation +4

The Devil in Linear Transformer

1 code implementation19 Oct 2022 Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong

In this paper, we examine existing kernel-based linear transformers and identify two key issues that lead to such performance gaps: 1) unbounded gradients in the attention computation adversely impact the convergence of linear transformer models; 2) attention dilution which trivially distributes attention scores over long sequences while neglecting neighbouring structures.

Language Modeling Language Modelling +1

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

1 code implementation17 Oct 2022 Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, Lingpeng Kong

Bringing together theoretical analysis and empirical evidence, we demonstrate the great potential of diffusion models in complex conditional language generation tasks.

Diversity Text Generation

CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

1 code implementation14 Oct 2022 Jun Zhang, Shuyang Jiang, Jiangtao Feng, Lin Zheng, Lingpeng Kong

In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions.

Benchmarking Language Modeling +1

Audio-Visual Segmentation

2 code implementations11 Jul 2022 Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

Vicinity Vision Transformer

1 code implementation21 Jun 2022 Weixuan Sun, Zhen Qin, Hui Deng, Jianyuan Wang, Yi Zhang, Kaihao Zhang, Nick Barnes, Stan Birchfield, Lingpeng Kong, Yiran Zhong

Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.

Image Classification

CoNT: Contrastive Neural Text Generation

2 code implementations29 May 2022 Chenxin An, Jiangtao Feng, Kai Lv, Lingpeng Kong, Xipeng Qiu, Xuanjing Huang

We validate CoNT on five generation tasks with ten benchmarks, including machine translation, summarization, code comment generation, data-to-text generation and commonsense generation.

Code Comment Generation Comment Generation +4

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning

2 code implementations25 May 2022 Jiahui Gao, Renjie Pi, Yong Lin, Hang Xu, Jiacheng Ye, Zhiyong Wu, Weizhong Zhang, Xiaodan Liang, Zhenguo Li, Lingpeng Kong

In this paradigm, the synthesized data from the PLM acts as the carrier of knowledge, which is used to train a task-specific model with orders of magnitude fewer parameters than the PLM, achieving both higher performance and efficiency than prompt-based zero-shot learning methods on PLMs.

text-classification Text Classification +1

Language Models Can See: Plugging Visual Controls in Text Generation

1 code implementation5 May 2022 Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier

MAGIC is a flexible framework and is theoretically compatible with any text generation tasks that incorporate image grounding.

Image Captioning Image-text matching +3

Lexical Knowledge Internalization for Neural Dialog Generation

1 code implementation ACL 2022 Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, Ben Kao

We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models.

Contrastive Learning

Event Transition Planning for Open-ended Text Generation

1 code implementation Findings (ACL) 2022 Qintong Li, Piji Li, Wei Bi, Zhaochun Ren, Yuxuan Lai, Lingpeng Kong

Open-ended text generation tasks, such as dialogue generation and story completion, require models to generate a coherent continuation given limited preceding context.

Dialogue Generation Diversity +1

Linear Complexity Randomized Self-attention Mechanism

1 code implementation10 Apr 2022 Lin Zheng, Chong Wang, Lingpeng Kong

By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA).

cosFormer: Rethinking Softmax in Attention

3 code implementations ICLR 2022 Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong

As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length.

Ranked #6 on D4RL on D4RL

D4RL Language Modeling +2

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

3 code implementations16 Feb 2022 Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, Lingpeng Kong

There is a growing interest in dataset generation recently due to the superior generative capacity of large pre-trained language models (PLMs).

Data-free Knowledge Distillation Dataset Generation +6

A Contrastive Framework for Neural Text Generation

2 code implementations13 Feb 2022 Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier

Text generation is of great importance to many natural language processing applications.

Diversity Text Generation

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

1 code implementation16 Jan 2022 Hao Wang, Yangguang Li, Zhen Huang, Yong Dou, Lingpeng Kong, Jing Shao

To alleviate feature suppression, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE).

Contrastive Learning Data Augmentation +7

Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling

1 code implementation NAACL 2022 Jakob Prange, Nathan Schneider, Lingpeng Kong

We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling.

Language Modeling Language Modelling

Ripple Attention for Visual Perception with Sub-quadratic Complexity

no code implementations6 Oct 2021 Lin Zheng, Huijie Pan, Lingpeng Kong

Transformer architectures are now central to sequence modeling tasks.

Cascaded Head-colliding Attention

1 code implementation ACL 2021 Lin Zheng, Zhiyong Wu, Lingpeng Kong

Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks.

Language Modeling Language Modelling +2

Good for Misconceived Reasons: An Empirical Revisiting on the Need for Visual Context in Multimodal Machine Translation

no code implementations ACL 2021 Zhiyong Wu, Lingpeng Kong, Wei Bi, Xiang Li, Ben Kao

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Multimodal Machine Translation Translation

Random Feature Attention

no code implementations ICLR 2021 Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong

RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism.

Language Modeling Language Modelling +4

Adaptive Semiparametric Language Models

no code implementations4 Feb 2021 Dani Yogatama, Cyprien de Masson d'Autume, Lingpeng Kong

We present a language model that combines a large parametric neural network (i. e., a transformer) with a non-parametric episodic memory component in an integrated architecture.

Language Modeling Language Modelling

Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation

no code implementations1 Jan 2021 Zhiyong Wu, Lingpeng Kong, Ben Kao

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information.

Multimodal Machine Translation Translation

Syntactic Structure Distillation Pretraining For Bidirectional Encoders

no code implementations27 May 2020 Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence.

Knowledge Distillation Language Modeling +4

A Mutual Information Maximization Perspective of Language Representation Learning

no code implementations ICLR 2020 Lingpeng Kong, Cyprien de Masson d'Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama

We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i. e., a sentence).

Representation Learning Sentence

Better Document-Level Machine Translation with Bayes' Rule

no code implementations TACL 2020 Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

We show that Bayes' rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents---a compelling benefit as parallel documents are not always available.

Document Level Machine Translation Document Translation +5

Putting Machine Translation in Context with the Noisy Channel Model

no code implementations25 Sep 2019 Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

We show that Bayes' rule provides a compelling mechanism for controlling unconditional document language models, using the long-standing challenge of effectively leveraging document context in machine translation.

Document Translation Language Modeling +4

Relative Pixel Prediction For Autoregressive Image Generation

no code implementations25 Sep 2019 Wang Ling, Chris Dyer, Lei Yu, Lingpeng Kong, Dani Yogatama, Susannah Young

In natural images, transitions between adjacent pixels tend to be smooth and gradual, a fact that has long been exploited in image compression models based on predictive coding.

Colorization Image Colorization +5

Episodic Memory in Lifelong Language Learning

2 code implementations NeurIPS 2019 Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama

We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier.

Continual Learning General Classification +3

Learning and Evaluating General Linguistic Intelligence

no code implementations31 Jan 2019 Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly.

Natural Language Understanding Question Answering

Neural Phrase-to-Phrase Machine Translation

no code implementations6 Nov 2018 Jiangtao Feng, Lingpeng Kong, Po-Sen Huang, Chong Wang, Da Huang, Jiayuan Mao, Kan Qiao, Dengyong Zhou

We also design an efficient dynamic programming algorithm to decode segments that allows the model to be trained faster than the existing neural phrase-based machine translation method by Huang et al. (2018).

Decoder Machine Translation +1

End-to-End Neural Segmental Models for Speech Recognition

no code implementations1 Aug 2017 Hao Tang, Liang Lu, Lingpeng Kong, Kevin Gimpel, Karen Livescu, Chris Dyer, Noah A. Smith, Steve Renals

Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time.

Decoder speech-recognition +1

DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks

1 code implementation13 Mar 2017 Lingpeng Kong, Chris Alberti, Daniel Andor, Ivan Bogatyy, David Weiss

In this work, we present a compact, modular framework for constructing novel recurrent neural architectures.

Decoder Dependency Parsing +2

Multitask Learning with CTC and Segmental CRF for Speech Recognition

no code implementations21 Feb 2017 Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models.

speech-recognition Speech Recognition

DyNet: The Dynamic Neural Network Toolkit

4 code implementations15 Jan 2017 Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.

graph construction

What Do Recurrent Neural Network Grammars Learn About Syntax?

1 code implementation EACL 2017 Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, Noah A. Smith

We investigate what information they learn, from a linguistic perspective, through various ablations to the model and the data, and by augmenting the model with an attention mechanism (GA-RNNG) to enable closer inspection.

Constituency Parsing Dependency Parsing +2

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

no code implementations1 Mar 2016 Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction.

Acoustic Modelling Language Modeling +3

Segmental Recurrent Neural Networks

2 code implementations18 Nov 2015 Lingpeng Kong, Chris Dyer, Noah A. Smith

Representations of the input segments (i. e., contiguous subsequences of the input) are computed by encoding their constituent tokens using bidirectional recurrent neural nets, and these "segment embeddings" are used to define compatibility scores with output labels.

Chinese Word Segmentation Handwriting Recognition +2

Document Context Language Models

1 code implementation12 Nov 2015 Yangfeng Ji, Trevor Cohn, Lingpeng Kong, Chris Dyer, Jacob Eisenstein

Text documents are structured on multiple levels of detail: individual words are related by syntax, but larger units of text are related by discourse structure.

Sentence

An Empirical Comparison of Parsing Methods for Stanford Dependencies

no code implementations16 Apr 2014 Lingpeng Kong, Noah A. Smith

Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems.

Dependency Parsing

Cannot find the paper you are looking for? You can Submit a new open access paper.