Search Results for author: Maosong Sun

Found 300 papers, 190 papers with code

Pass off Fish Eyes for Pearls: Attacking Model Selection of Pre-trained Models

1 code implementation ACL 2022 Biru Zhu, Yujia Qin, Fanchao Qi, Yangdong Deng, Zhiyuan Liu, Maosong Sun, Ming Gu

To validate our viewpoints, we design two methods to evaluate the robustness of FMS: (1) model disguise attack, which post-trains an inferior PTM with a contrastive objective, and (2) evaluation data selection, which selects a subset of the data points for FMS evaluation based on K-means clustering.

Backdoor Attack Model Selection

TopWORDS-Seg: Simultaneous Text Segmentation and Word Discovery for Open-Domain Chinese Texts via Bayesian Inference

no code implementations ACL 2022 Changzai Pan, Maosong Sun, Ke Deng

Processing open-domain Chinese texts has been a critical bottleneck in computational linguistics for decades, partially because text segmentation and word discovery often entangle with each other in this challenging scenario.

Bayesian Inference Segmentation +1

Self-Supervised Quality Estimation for Machine Translation

no code implementations EMNLP 2021 Yuanhang Zheng, Zhixing Tan, Meng Zhang, Mieradilijiang Maimaiti, Huanbo Luan, Maosong Sun, Qun Liu, Yang Liu

Quality estimation (QE) of machine translation (MT) aims to evaluate the quality of machine-translated sentences without references and is important in practical applications of MT.

Machine Translation Sentence +1

CodRED: A Cross-Document Relation Extraction Dataset for Acquiring Knowledge in the Wild

1 code implementation EMNLP 2021 Yuan YAO, Jiaju Du, Yankai Lin, Peng Li, Zhiyuan Liu, Jie zhou, Maosong Sun

Existing relation extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents.

Relation Relation Extraction

BMInf: An Efficient Toolkit for Big Model Inference and Tuning

1 code implementation ACL 2022 Xu Han, Guoyang Zeng, Weilin Zhao, Zhiyuan Liu, Zhengyan Zhang, Jie zhou, Jun Zhang, Jia Chao, Maosong Sun

In recent years, large-scale pre-trained language models (PLMs) containing billions of parameters have achieved promising results on various NLP tasks.

Quantization Scheduling

Going “Deeper”: Structured Sememe Prediction via Transformer with Tree Attention

1 code implementation Findings (ACL) 2022 Yining Ye, Fanchao Qi, Zhiyuan Liu, Maosong Sun

However, all existing sememe prediction studies ignore the hierarchical structures of sememes, which are important in the sememe-based semantic description system.

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

1 code implementation14 Mar 2024 Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su

Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences.

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

no code implementations13 Mar 2024 Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Ruobing Xie, BoWen Zhou, Zhiyuan Liu, Maosong Sun

Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously.

Math

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

1 code implementation12 Mar 2024 Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, Yang Liu

The virtual API server contains a caching system and API simulators which are complementary to alleviate the change in API status.

Benchmarking

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

no code implementations29 Feb 2024 Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e. g., harmlessness) can diminish performance in others (e. g., helpfulness).

Navigate

Beyond Language Models: Byte Models are Digital World Simulators

no code implementations29 Feb 2024 Shangda Wu, Xu Tan, Zili Wang, Rui Wang, Xiaobing Li, Maosong Sun

Traditional deep learning often overlooks bytes, the basic units of the digital world, where all forms of information and operations are encoded and manipulated in binary format.

Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

1 code implementation28 Feb 2024 Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs).

Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models

no code implementations27 Feb 2024 Xiaolong Wang, Yile Wang, Yuanchi Zhang, Fuwen Luo, Peng Li, Maosong Sun, Yang Liu

Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.

Dark Humor Detection Dialogue Generation +3

Cross-domain Chinese Sentence Pattern Parsing

no code implementations26 Feb 2024 Jingsi Yu, Cunliang Kong, Liner Yang, Meishan Zhang, Lin Zhu, Yujie Wang, Haozhe Lin, Maosong Sun, Erhong Yang

Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching. Existing SPS parsers rely heavily on textbook corpora for training, lacking cross-domain capability. To overcome this constraint, this paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework.

Sentence

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

no code implementations23 Feb 2024 Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.

Memorization Multi-Task Learning

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting

1 code implementation21 Feb 2024 Weilin Zhao, Yuxiang Huang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Maosong Sun

In this paper, we introduce Ouroboros, which constructs a phrase candidate pool from the verification process of LLMs to provide candidates for draft generation of the small model.

Text Generation

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

1 code implementation21 Feb 2024 Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, JunHao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction.

OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models

no code implementations21 Feb 2024 Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Yang Liu, Maosong Sun, Erhong Yang

We evaluate several representative multilingual LLMs on the proposed OMGEval, which we believe will provide a valuable reference for the community to further understand and improve the multilingual capability of LLMs.

General Knowledge Logical Reasoning

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

1 code implementation21 Feb 2024 Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li, Chen Chen, Zhiyuan Liu, Guangli Li, Tao Yang, Maosong Sun

Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance.

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

1 code implementation21 Feb 2024 Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun

Notably, the best-performing model, GPT-4V, attains an average score of 17. 23% on OlympiadBench, with a mere 11. 28% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.

Logical Fallacies

Model Composition for Multimodal Large Language Models

no code implementations20 Feb 2024 Chi Chen, Yiyang Du, Zheng Fang, Ziyue Wang, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

In this paper, we propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

no code implementations19 Feb 2024 Ziyue Wang, Chi Chen, Yiqi Zhu, Fuwen Luo, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Maosong Sun, Yang Liu

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks.

Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages

1 code implementation19 Feb 2024 Yuanchi Zhang, Yile Wang, Zijun Liu, Shuo Wang, Xiaolong Wang, Peng Li, Maosong Sun, Yang Liu

While large language models (LLMs) have been pre-trained on multilingual corpora, their performance still lags behind in most languages compared to a few resource-rich languages.

Transfer Learning

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks

no code implementations18 Feb 2024 Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun

Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights.

Math

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization

no code implementations18 Feb 2024 Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns.

Code Generation Data Visualization

Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents

1 code implementation14 Feb 2024 Cheng Qian, Bingxiang He, Zhong Zhuang, Jia Deng, Yujia Qin, Xin Cong, Zhong Zhang, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.

Language Modelling

Exploring Perceptual Limitation of Multimodal Large Language Models

1 code implementation12 Feb 2024 Jiarui Zhang, Jinyi Hu, Mahyar Khayatkhoei, Filip Ilievski, Maosong Sun

Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception.

Object Question Answering

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset

1 code implementation7 Feb 2024 Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun

Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs.

Cross-Lingual Transfer Data Augmentation

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

no code implementations7 Feb 2024 Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Song Han, Maosong Sun

To alleviate these issues, existing efforts employ sliding attention windows and discard distant tokens to achieve the processing of extremely long sequences.

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs

no code implementations6 Feb 2024 Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun

To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity.

UniMem: Towards a Unified View of Long-Context Large Language Models

no code implementations5 Feb 2024 Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, Zhiyuan Liu, Maosong Sun

Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments.

Management

Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution

no code implementations25 Jan 2024 Cheng Qian, Shihao Liang, Yujia Qin, Yining Ye, Xin Cong, Yankai Lin, Yesai Wu, Zhiyuan Liu, Maosong Sun

This paper introduces Investigate-Consolidate-Exploit (ICE), a novel strategy for enhancing the adaptability and flexibility of AI agents through inter-task self-evolution.

DebugBench: Evaluating Debugging Capability of Large Language Models

1 code implementation9 Jan 2024 Runchu Tian, Yining Ye, Yujia Qin, Xin Cong, Yankai Lin, Yinxu Pan, Yesai Wu, Zhiyuan Liu, Maosong Sun

Previous evaluations of LLMs' debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs.

Code Generation

Experiential Co-Learning of Software-Developing Agents

1 code implementation28 Dec 2023 Chen Qian, Yufan Dang, Jiahao Li, Wei Liu, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun

Recent advancements in large language models (LLMs) have brought significant changes to various domains, especially through LLM-driven autonomous agents.

GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension

no code implementations28 Dec 2023 Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun

As GitHub has hosted a multitude of repositories which can be seen as a good resource for tools, a promising solution is that LLM-based agents can autonomously integrate the repositories in GitHub according to the user queries to extend their tool set.

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

2 code implementations1 Dec 2023 Tianyu Yu, Yuan YAO, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, Tat-Seng Chua

Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction.

Hallucination

Sparse Low-rank Adaptation of Pre-trained Language Models

1 code implementation20 Nov 2023 Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, BoWen Zhou, Zhiyuan Liu, Maosong Sun

Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process.

Memorization

ProAgent: From Robotic Process Automation to Agentic Process Automation

1 code implementation2 Nov 2023 Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun

Empirical experiments are conducted to detail its construction and execution procedure of workflow, showcasing the feasibility of APA, unveiling the possibility of a new paradigm of automation driven by agents.

Decision Making

MUSER: A Multi-View Similar Case Retrieval Dataset

1 code implementation24 Oct 2023 Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen

Furthermore, the case similarities are typically measured solely by the textual semantics of the fact descriptions, which may fail to capture the full complexity of legal cases from the perspective of legal knowledge.

Fairness Retrieval +3

Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

1 code implementation24 Oct 2023 Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs.

Computational Efficiency

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

no code implementations19 Oct 2023 Weize Chen, Xiaoyue Xu, Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise.

Self-Knowledge Guided Retrieval Augmentation for Large Language Models

no code implementations8 Oct 2023 Yile Wang, Peng Li, Maosong Sun, Yang Liu

Large language models (LLMs) have shown superior performance without task-specific fine-tuning.

Question Answering Retrieval +1

UltraFeedback: Boosting Language Models with High-quality Feedback

1 code implementation2 Oct 2023 Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun

However, the scarcity of diverse, naturalistic datasets of human preferences on LLM outputs at scale poses a great challenge to RLHF as well as feedback learning research within the open-source community.

Language Modelling

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

2 code implementations1 Oct 2023 Tianyu Yu, Jinyi Hu, Yuan YAO, Haoye Zhang, Yue Zhao, Chongyi Wang, Shan Wang, Yinxv Pan, Jiao Xue, Dahai Li, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun

The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following.

Instruction Following

ConPET: Continual Parameter-Efficient Tuning for Large Language Models

1 code implementation26 Sep 2023 Chenyang Song, Xu Han, Zheni Zeng, Kuai Li, Chen Chen, Zhiyuan Liu, Maosong Sun, Tao Yang

First, Static ConPET can adapt former continual learning methods originally designed for relatively smaller models to LLMs through PET and a dynamic replay strategy, which largely reduces the tuning costs and alleviates the over-fitting and forgetting issue.

Continual Learning

QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation

no code implementations19 Sep 2023 Kunlun Zhu, Shihao Liang, Xu Han, Zhi Zheng, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks.

Data Augmentation Question Answering

Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

1 code implementation25 Aug 2023 Chi Chen, Ruoyu Qin, Fuwen Luo, Xiaoyue Mi, Peng Li, Maosong Sun, Yang Liu

However, existing visual instruction tuning methods only utilize image-language instruction data to align the language and image modalities, lacking a more fine-grained cross-modal alignment.

Position

Rational Decision-Making Agent with Internalized Utility Judgment

no code implementations24 Aug 2023 Yining Ye, Xin Cong, Shizuo Tian, Yujia Qin, Chong Liu, Yankai Lin, Zhiyuan Liu, Maosong Sun

Central to the development of rationality is the construction of an internalized utility judgment, capable of assigning numerical utilities to each decision.

Decision Making Language Modelling +1

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

2 code implementations23 Aug 2023 Jinyi Hu, Yuan YAO, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun

Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i. e., lack of large-scale, high-quality image-text data).

Language Modelling Large Language Model +1

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors

1 code implementation21 Aug 2023 Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

Autonomous agents empowered by Large Language Models (LLMs) have undergone significant improvements, enabling them to generalize across a broad spectrum of tasks.

Exploring Format Consistency for Instruction Tuning

1 code implementation28 Jul 2023 Shihao Liang, Runchu Tian, Kunlun Zhu, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun

Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions.

Denoising

Communicative Agents for Software Development

1 code implementation16 Jul 2023 Chen Qian, Xin Cong, Wei Liu, Cheng Yang, Weize Chen, Yusheng Su, Yufan Dang, Jiahao Li, Juyuan Xu, Dahai Li, Zhiyuan Liu, Maosong Sun

At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting.

Decision Making

CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models

no code implementations15 Jul 2023 Weilin Zhao, Yuxiang Huang, Xu Han, Zhiyuan Liu, Zhengyan Zhang, Maosong Sun

Parameter-efficient tuning (PET) has been widely explored in recent years because it tunes much fewer parameters (PET modules) than full-parameter fine-tuning (FT) while still stimulating sufficient knowledge from large language models (LLMs) for downstream tasks.

Won't Get Fooled Again: Answering Questions with False Premises

1 code implementation5 Jul 2023 Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu, Maosong Sun

In this paper, we find that the PLMs already possess the knowledge required to rebut such questions, and the key is how to activate the knowledge.

Question Answering

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

1 code implementation5 Jul 2023 Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun

The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning.

Interactive Molecular Discovery with Natural Language

1 code implementation21 Jun 2023 Zheni Zeng, Bangchen Yin, Shipeng Wang, Jiarui Liu, Cheng Yang, Haishen Yao, Xingzhi Sun, Maosong Sun, Guotong Xie, Zhiyuan Liu

Natural language is expected to be a key medium for various human-machine interactions in the era of large language models.

Property Prediction

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

1 code implementation4 Jun 2023 Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun

Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters.

From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

1 code implementation29 May 2023 Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji

In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework.

Adversarial Attack

Emergent Modularity in Pre-trained Transformers

1 code implementation28 May 2023 Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie zhou

In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes.

Plug-and-Play Knowledge Injection for Pre-trained Language Models

1 code implementation28 May 2023 Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models.

Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning

1 code implementation28 May 2023 Weize Chen, Xu Han, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie zhou

Since it is non-trivial to directly model the intermediate states and design a running cost function, we propose to use latent stochastic bridges to regularize the intermediate states and use the regularization as the running cost of PETs.

Plug-and-Play Document Modules for Pre-trained Models

1 code implementation28 May 2023 Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun

By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders.

Question Answering

Weakly Supervised Vision-and-Language Pre-training with Relative Representations

no code implementations24 May 2023 Chi Chen, Peng Li, Maosong Sun, Yang Liu

Weakly supervised vision-and-language pre-training (WVLP), which learns cross-modal representations with limited cross-modal supervision, has been shown to effectively reduce the data cost of pre-training while maintaining decent performance on downstream tasks.

Retrieval

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

1 code implementation23 May 2023 Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, BoWen Zhou

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT.

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

no code implementations19 May 2023 Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu, Maosong Sun

IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP.

Cross-Lingual Transfer Image Generation

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

1 code implementation NeurIPS 2023 Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He

We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context.

Multiple-choice

Recyclable Tuning for Continual Pre-training

1 code implementation15 May 2023 Yujia Qin, Cheng Qian, Xu Han, Yankai Lin, Huadong Wang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie zhou

In pilot studies, we find that after continual pre-training, the upgraded PLM remains compatible with the outdated adapted weights to some extent.

UNTER: A Unified Knowledge Interface for Enhancing Pre-trained Language Models

no code implementations2 May 2023 Deming Ye, Yankai Lin, Zhengyan Zhang, Maosong Sun

In this paper, we propose a UNified knowledge inTERface, UNTER, to provide a unified perspective to exploit both structured knowledge and unstructured knowledge.

Entity Typing named-entity-recognition +2

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

2 code implementations21 Apr 2023 Shangda Wu, Dingyao Yu, Xu Tan, Maosong Sun

We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal representations between natural language and symbolic music using a music encoder and a text encoder trained jointly with a contrastive loss.

Data Augmentation Information Retrieval +4

READIN: A Chinese Multi-Task Benchmark with Realistic and Diverse Input Noises

1 code implementation14 Feb 2023 Chenglei Si, Zhengyan Zhang, Yingfa Chen, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun

In order to fill this important gap, we construct READIN: a Chinese multi-task benchmark with REalistic And Diverse Input Noises.

Data Augmentation Fairness +2

An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation

1 code implementation19 Dec 2022 Xuancheng Huang, Zijun Liu, Peng Li, Tao Li, Maosong Sun, Yang Liu

Recently, multi-aspect controllable text generation that controls the generated text in multiple aspects (e. g., sentiment, topic, and keywords) has attracted increasing attention.

Machine Translation Text Generation +1

Continual Knowledge Distillation for Neural Machine Translation

1 code implementation18 Dec 2022 Yuanchi Zhang, Peng Li, Maosong Sun, Yang Liu

While many parallel corpora are not publicly accessible for data copyright, data privacy and competitive differentiation reasons, trained translation models are increasingly available on open platforms.

Knowledge Distillation Machine Translation +2

Decoder Tuning: Efficient Language Understanding as Decoding

2 code implementations16 Dec 2022 Ganqu Cui, Wentao Li, Ning Ding, Longtao Huang, Zhiyuan Liu, Maosong Sun

With the evergrowing sizes of pre-trained models (PTMs), it has been an emerging practice to only provide the inference APIs for users, namely model-as-a-service (MaaS) setting.

Natural Language Understanding

Visually Grounded Commonsense Knowledge Acquisition

1 code implementation22 Nov 2022 Yuan YAO, Tianyu Yu, Ao Zhang, Mengdi Li, Ruobing Xie, Cornelius Weber, Zhiyuan Liu, Hai-Tao Zheng, Stefan Wermter, Tat-Seng Chua, Maosong Sun

In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances.

Language Modelling

Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

1 code implementation21 Nov 2022 Shangda Wu, Maosong Sun

Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum.

Music Generation Text-to-Music Generation

Evade the Trap of Mediocrity: Promoting Diversity and Novelty in Text Generation via Concentrating Attention

1 code implementation14 Nov 2022 Wenhao Li, Xiaoyuan Yi, Jinyi Hu, Maosong Sun, Xing Xie

In this work, we dig into the intrinsic mechanism of this problem and found that sparser attention values in Transformer could improve diversity.

Attribute Text Generation

FPT: Improving Prompt Tuning Efficiency via Progressive Training

1 code implementation13 Nov 2022 Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu

Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.

Sparse Structure Search for Delta Tuning

1 code implementation NIPS 2022 Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs.

Exploring Mode Connectivity for Pre-trained Language Models

1 code implementation25 Oct 2022 Yujia Qin, Cheng Qian, Jing Yi, Weize Chen, Yankai Lin, Xu Han, Zhiyuan Liu, Maosong Sun, Jie zhou

(3) How does the PLM's task knowledge change along the path connecting two minima?

Different Tunes Played with Equal Skill: Exploring a Unified Optimization Subspace for Delta Tuning

1 code implementation24 Oct 2022 Jing Yi, Weize Chen, Yujia Qin, Yankai Lin, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun, Jie zhou

To fathom the mystery, we hypothesize that the adaptations of different DETs could all be reparameterized as low-dimensional optimizations in a unified optimization subspace, which could be found by jointly decomposing independent solutions of different DETs.

Recurrence Boosts Diversity! Revisiting Recurrent Latent Variable in Transformer-Based Variational AutoEncoder for Diverse Text Generation

no code implementations22 Oct 2022 Jinyi Hu, Xiaoyuan Yi, Wenhao Li, Maosong Sun, Xing Xie

We demonstrate that TRACE could enhance the entanglement of each segment and preceding latent variables and deduce a non-zero lower bound of the KL term, providing a theoretical guarantee of generation diversity.

Text Generation

Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP

1 code implementation19 Oct 2022 Yangyi Chen, Hongcheng Gao, Ganqu Cui, Fanchao Qi, Longtao Huang, Zhiyuan Liu, Maosong Sun

We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods.

Data Augmentation

Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

1 code implementation COLING 2022 Zichun Yu, Tianyu Gao, Zhengyan Zhang, Yankai Lin, Zhiyuan Liu, Maosong Sun, Jie zhou

Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models.

Few-Shot Learning Language Modelling +1

A Unified Understanding of Deep NLP Models for Text Classification

no code implementations19 Jun 2022 Zhen Li, Xiting Wang, Weikai Yang, Jing Wu, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun, HUI ZHANG, Shixia Liu

The rapid development of deep natural language processing (NLP) models for text classification has led to an urgent need for a unified understanding of these models proposed individually.

text-classification Text Classification

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

1 code implementation17 Jun 2022 Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong Sun

However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e. g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving.

text similarity

Sparse Structure Search for Parameter-Efficient Tuning

no code implementations15 Jun 2022 Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

The searched structures preserve more than 99\% fine-tuning performance with 0. 01\% trainable parameters.

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

1 code implementation23 May 2022 Yuan YAO, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

We show that PEVL enables state-of-the-art performance of detector-free VLP models on position-sensitive tasks such as referring expression comprehension and phrase grounding, and also improves the performance on position-insensitive tasks with grounded inputs.

Language Modelling Object +7

Prompt Tuning for Discriminative Pre-trained Language Models

1 code implementation Findings (ACL) 2022 Yuan YAO, Bowen Dong, Ao Zhang, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Leyu Lin, Maosong Sun, Jianyong Wang

Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.

Language Modelling Question Answering +2

A Template-based Method for Constrained Neural Machine Translation

1 code implementation23 May 2022 Shuo Wang, Peng Li, Zhixing Tan, Zhaopeng Tu, Maosong Sun, Yang Liu

In this work, we propose a template-based method that can yield results with high translation quality and match accuracy and the inference speed of our method is comparable with unconstrained NMT models.

Machine Translation NMT +1

Efficient and Training-Free Control of Language Generation

no code implementations12 May 2022 Shangda Wu, Maosong Sun

In recent years, there has been a growing interest in the development of language models capable of generating text with controllable attributes.

Attribute Language Modelling +2

Symphony Generation with Permutation Invariant Language Model

1 code implementation10 May 2022 Jiafeng Liu, Yuanliang Dong, Zehua Cheng, Xinran Zhang, Xiaobing Li, Feng Yu, Maosong Sun

In this work, we propose a permutation invariant language model, SymphonyNet, as a solution for symbolic symphony music generation.

Audio Generation Language Modelling +2

LEVEN: A Large-Scale Chinese Legal Event Detection Dataset

1 code implementation Findings (ACL) 2022 Feng Yao, Chaojun Xiao, Xiaozhi Wang, Zhiyuan Liu, Lei Hou, Cunchao Tu, Juanzi Li, Yun Liu, Weixing Shen, Maosong Sun

However, existing Legal Event Detection (LED) datasets only concern incomprehensive event types and have limited annotated data, which restricts the development of LED methods and their downstream applications.

Event Detection Retrieval

A Simple but Effective Pluggable Entity Lookup Table for Pre-trained Language Models

1 code implementation ACL 2022 Deming Ye, Yankai Lin, Peng Li, Maosong Sun, Zhiyuan Liu

Pre-trained language models (PLMs) cannot well recall rich factual knowledge of entities exhibited in large-scale corpora, especially those rare entities.

Domain Adaptation

QuoteR: A Benchmark of Quote Recommendation for Writing

1 code implementation ACL 2022 Fanchao Qi, Yanhui Yang, Jing Yi, Zhili Cheng, Zhiyuan Liu, Maosong Sun

To facilitate the research on this task, we build a large and fully open quote recommendation dataset called QuoteR, which comprises three parts including English, standard Chinese and classical Chinese.

Chord-Conditioned Melody Harmonization with Controllable Harmonicity

1 code implementation17 Feb 2022 Shangda Wu, Xiaobing Li, Maosong Sun

Melody harmonization has long been closely associated with chorales composed by Johann Sebastian Bach.

YACLC: A Chinese Learner Corpus with Multidimensional Annotation

1 code implementation30 Dec 2021 Yingying Wang, Cunliang Kong, Liner Yang, Yijun Wang, Xiaorong Lu, Renfen Hu, Shan He, Zhenghao Liu, Yun Chen, Erhong Yang, Maosong Sun

This resource is of great relevance for second language acquisition research, foreign-language teaching, and automatic grammatical error correction.

Grammatical Error Correction Language Acquisition +1

On Transferability of Prompt Tuning for Natural Language Processing

1 code implementation NAACL 2022 Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie zhou

To explore whether we can improve PT via prompt transfer, we empirically investigate the transferability of soft prompts across different downstream tasks and PLMs in this work.

Natural Language Understanding Transfer Learning

OpenPrompt: An Open-source Framework for Prompt-learning

2 code implementations ACL 2022 Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun

Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to $cloze$-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks.

Textual Backdoor Attacks Can Be More Harmful via Two Simple Tricks

1 code implementation15 Oct 2021 Yangyi Chen, Fanchao Qi, Hongcheng Gao, Zhiyuan Liu, Maosong Sun

In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful.

Vocal Bursts Valence Prediction

Exploring Universal Intrinsic Task Subspace via Prompt Tuning

1 code implementation15 Oct 2021 Yujia Qin, Xiaozhi Wang, Yusheng Su, Yankai Lin, Ning Ding, Jing Yi, Weize Chen, Zhiyuan Liu, Juanzi Li, Lei Hou, Peng Li, Maosong Sun, Jie zhou

In the experiments, we study diverse few-shot NLP tasks and surprisingly find that in a 250-dimensional subspace found with 100 tasks, by only tuning 250 free parameters, we can recover 97% and 83% of the full prompt tuning performance for 100 seen tasks (using different training data) and 20 unseen tasks, respectively, showing great generalization ability of the found intrinsic task subspace.

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

1 code implementation EMNLP 2021 Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong Sun

In this paper, we make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its meaning.

Backdoor Attack Sentence +2

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

1 code implementation24 Sep 2021 Yuan YAO, Ao Zhang, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

Pre-Trained Vision-Language Models (VL-PTMs) have shown promising capabilities in grounding natural language in image data, facilitating a broad variety of cross-modal tasks.

Visual Grounding

Packed Levitated Marker for Entity and Relation Extraction

2 code implementations ACL 2022 Deming Ye, Yankai Lin, Peng Li, Maosong Sun

In particular, we propose a neighborhood-oriented packing strategy, which considers the neighbor spans integrally to better model the entity boundary information.

Joint Entity and Relation Extraction Relation

Lingxi: A Diversity-aware Chinese Modern Poetry Generation System

no code implementations27 Aug 2021 Xinran Zhang, Maosong Sun, Jiafeng Liu, Xiaobing Li

We propose nucleus sampling with randomized head (NS-RH) algorithm, which randomizes the high frequency part ("head") of the predicted distribution, in order to emphasize on the "comparatively low frequency" words.

Semantic Similarity Semantic Textual Similarity +2

Language Models are Good Translators

no code implementations25 Jun 2021 Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, Yang Liu

Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limited scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters.

Language Modelling Machine Translation +2

CPM-2: Large-scale Cost-effective Pre-trained Language Models

2 code implementations20 Jun 2021 Zhengyan Zhang, Yuxian Gu, Xu Han, Shengqi Chen, Chaojun Xiao, Zhenbo Sun, Yuan YAO, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun

We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

1 code implementation ACL 2021 Fanchao Qi, Yuan YAO, Sophia Xu, Zhiyuan Liu, Maosong Sun

Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks.

On the Language Coverage Bias for Neural Machine Translation

no code implementations Findings (ACL) 2021 Shuo Wang, Zhaopeng Tu, Zhixing Tan, Shuming Shi, Maosong Sun, Yang Liu

Language coverage bias, which indicates the content-dependent differences between sentence pairs originating from the source and target languages, is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.

Data Augmentation Machine Translation +3

CCPM: A Chinese Classical Poetry Matching Dataset

1 code implementation3 Jun 2021 Wenhao Li, Fanchao Qi, Maosong Sun, Xiaoyuan Yi, Jiarui Zhang

We hope this dataset can further enhance the study on incorporating deep semantics into the understanding and generation system of Chinese classical poetry.

Translation

Sub-Character Tokenization for Chinese Pretrained Language Models

2 code implementations1 Jun 2021 Chenglei Si, Zhengyan Zhang, Yingfa Chen, Fanchao Qi, Xiaozhi Wang, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to homophone typos.

Chinese Word Segmentation Computational Efficiency +2

Open Hierarchical Relation Extraction

1 code implementation NAACL 2021 Kai Zhang, Yuan YAO, Ruobing Xie, Xu Han, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun

To establish the bidirectional connections between OpenRE and relation hierarchy, we propose the task of open hierarchical relation extraction and present a novel OHRE framework for the task.

Clustering Relation +1

Transfer Learning for Sequence Generation: from Single-source to Multi-source

1 code implementation ACL 2021 Xuancheng Huang, Jingfang Xu, Maosong Sun, Yang Liu

Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient.

Automatic Post-Editing Document Summarization +3

Fully Hyperbolic Neural Networks

1 code implementation ACL 2022 Weize Chen, Xu Han, Yankai Lin, Hexu Zhao, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Hyperbolic neural networks have shown great potential for modeling complex data.

Knowledge Inheritance for Pre-trained Language Models

2 code implementations NAACL 2022 Yujia Qin, Yankai Lin, Jing Yi, Jiajie Zhang, Xu Han, Zhengyan Zhang, Yusheng Su, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Specifically, we introduce a pre-training framework named "knowledge inheritance" (KI) and explore how could knowledge distillation serve as auxiliary supervision during pre-training to efficiently learn larger PLMs.

Domain Adaptation Knowledge Distillation +2

Automatic Construction of Sememe Knowledge Bases via Dictionaries

1 code implementation Findings (ACL) 2021 Fanchao Qi, Yangyi Chen, Fengyu Wang, Zhiyuan Liu, Xiao Chen, Maosong Sun

We use this method to build an English SKB and a French SKB, and conduct comprehensive evaluations from both intrinsic and extrinsic perspectives.

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

2 code implementations ACL 2021 Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, Maosong Sun

As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort.

Backdoor Attack

TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference

1 code implementation NAACL 2021 Deming Ye, Yankai Lin, Yufei Huang, Maosong Sun

To address this issue, we propose a dynamic token reduction approach to accelerate PLMs' inference, named TR-BERT, which could flexibly adapt the layer number of each token in inference to avoid redundant calculation.

PTR: Prompt Tuning with Rules for Text Classification

1 code implementation24 May 2021 Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, Maosong Sun

This indicates that PTR is a promising approach to take advantage of both human prior knowledge and PLMs for those complicated classification tasks.

Natural Language Inference Relation Classification +4

Dynamic Multi-Branch Layers for On-Device Neural Machine Translation

1 code implementation14 May 2021 Zhixing Tan, Zeyuan Yang, Meng Zhang, Qun Liu, Maosong Sun, Yang Liu

With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications, such as neural machine translation (NMT), from cloud to mobile devices.

Machine Translation NMT +1

Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

1 code implementation9 May 2021 Chaojun Xiao, Xueyu Hu, Zhiyuan Liu, Cunchao Tu, Maosong Sun

Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP).

Language Modelling Question Answering +2

Visual Distant Supervision for Scene Graph Generation

1 code implementation ICCV 2021 Yuan YAO, Ao Zhang, Xu Han, Mengdi Li, Cornelius Weber, Zhiyuan Liu, Stefan Wermter, Maosong Sun

In this work, we propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.

Graph Generation Predicate Classification +2

Equality before the Law: Legal Judgment Consistency Analysis for Fairness

no code implementations25 Mar 2021 Yuzhong Wang, Chaojun Xiao, Shirong Ma, Haoxi Zhong, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun

We propose to simulate judges from different groups with legal judgment prediction (LJP) models and measure the judicial inconsistency with the disagreement of the judgment results given by LJP models trained on different groups.

Fairness

Improving Diversity of Neural Text Generation via Inverse Probability Weighting

no code implementations13 Mar 2021 Xinran Zhang, Maosong Sun, Jiafeng Liu, Xiaobing Li

Traditional stochastic sampling methods only focus on truncating the unreliable "tail" of the distribution, and do not address the "head" part, which we show might contain tedious or even repetitive candidates with high probability that lead to repetition loops.

Language Modelling Text Generation

Optimal Embedding Calibration for Symbolic Music Similarity

no code implementations13 Mar 2021 Xinran Zhang, Maosong Sun, Jiafeng Liu, Xiaobing Li

In natural language processing (NLP), the semantic similarity task requires large-scale, high-quality human-annotated labels for fine-tuning or evaluation.

Language Modelling Representation Learning +2

UPRec: User-Aware Pre-training for Recommender Systems

no code implementations22 Feb 2021 Chaojun Xiao, Ruobing Xie, Yuan YAO, Zhiyuan Liu, Maosong Sun, Xu Zhang, Leyu Lin

Existing sequential recommendation methods rely on large amounts of training data and usually suffer from the data sparsity problem.

Self-Supervised Learning Sequential Recommendation

CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models

1 code implementation7 Feb 2021 Yusheng Su, Xu Han, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Peng Li, Jie zhou, Maosong Sun

We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.

Representation Learning for Natural Language Processing

no code implementations7 Feb 2021 Zhiyuan Liu, Yankai Lin, Maosong Sun

This book aims to review and present the recent advances of distributed representation learning for NLP, including why representation learning can improve NLP, how representation learning takes part in various important topics of NLP, and what challenges are still not well addressed by distributed representation.

Representation Learning

OpenMatch: An Open Source Library for Neu-IR Research

1 code implementation30 Jan 2021 Zhenghao Liu, Kaitao Zhang, Chenyan Xiong, Zhiyuan Liu, Maosong Sun

OpenMatch is a Python-based library that serves for Neural Information Retrieval (Neu-IR) research.

Document Ranking Information Retrieval +1

Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

1 code implementation ICML Workshop AML 2021 Zhengyan Zhang, Guangxuan Xiao, Yongwei Li, Tian Lv, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Xin Jiang, Maosong Sun

In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks.

Backdoor Attack

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

1 code implementation31 Dec 2020 Chenglei Si, Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

In this work, we propose a simple and effective method to cover a much larger proportion of the attack search space, called Adversarial and Mixup Data Augmentation (AMDA).

Adversarial Robustness Text Augmentation +2

Neural Machine Translation: A Review of Methods, Resources, and Tools

no code implementations31 Dec 2020 Zhixing Tan, Shuo Wang, Zonghan Yang, Gang Chen, Xuancheng Huang, Maosong Sun, Yang Liu

Machine translation (MT) is an important sub-field of natural language processing that aims to translate natural languages using computers.

Data Augmentation Machine Translation +2

Towards a Universal Continuous Knowledge Base

no code implementations25 Dec 2020 Gang Chen, Maosong Sun, Yang Liu

In this work, we propose a method for building a continuous knowledge base (CKB) that can store knowledge imported from multiple, diverse neural networks.

Knowledge Distillation text-classification +2

Mask-Align: Self-Supervised Neural Word Alignment

1 code implementation ACL 2021 Chi Chen, Maosong Sun, Yang Liu

Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks.

Machine Translation Translation +1

Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet

1 code implementation COLING 2020 Bairu Hou, Fanchao Qi, Yuan Zang, Xurui Zhang, Zhiyuan Liu, Maosong Sun

In this paper, we propose a new unsupervised method for HowNet-based Chinese WSD, which exploits the masked language model task of pre-trained language models.

Language Modelling Word Sense Disambiguation

Denoising Relation Extraction from Document-level Distant Supervision

1 code implementation EMNLP 2020 Chaojun Xiao, Yuan YAO, Ruobing Xie, Xu Han, Zhiyuan Liu, Maosong Sun, Fen Lin, Leyu Lin

Distant supervision (DS) has been widely used to generate auto-labeled data for sentence-level relation extraction (RE), which improves RE performance.

Denoising Document-level Relation Extraction +2

Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads

no code implementations7 Nov 2020 Zhengyan Zhang, Fanchao Qi, Zhiyuan Liu, Qun Liu, Maosong Sun

To measure the informativeness of attention heads, we train our Single-Shot Meta-Pruner (SMP) with a meta-learning paradigm aiming to maintain the distribution of text representations after pruning.

Informativeness Meta-Learning +1

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.

Explanation Generation Natural Language Understanding

Learning from Context or Names? An Empirical Study on Neural Relation Extraction

1 code implementation EMNLP 2020 Hao Peng, Tianyu Gao, Xu Han, Yankai Lin, Peng Li, Zhiyuan Liu, Maosong Sun, Jie zhou

We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks.

Memorization Relation +1

WantWords: An Open-source Online Reverse Dictionary System

1 code implementation EMNLP 2020 Fanchao Qi, Lei Zhang, Yanhui Yang, Zhiyuan Liu, Maosong Sun

A reverse dictionary takes descriptions of words as input and outputs words semantically matching the input descriptions.

Reverse Dictionary

IsOBS: An Information System for Oracle Bone Script

no code implementations EMNLP 2020 Xu Han, Yuzhuo Bai, Keyue Qiu, Zhiyuan Liu, Maosong Sun

Oracle bone script (OBS) is the earliest known ancient Chinese writing system and the ancestor of modern Chinese.

Few-Shot Learning Retrieval

CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models

1 code implementation29 Sep 2020 Yusheng Su, Xu Han, Zhengyan Zhang, Peng Li, Zhiyuan Liu, Yankai Lin, Jie zhou, Maosong Sun

In this paper, we propose a novel framework named Coke to dynamically select contextual knowledge and embed knowledge context according to textual context for PLMs, which can avoid the effect of redundant and ambiguous knowledge in KGs that cannot match the input text.

Knowledge Graphs

Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect

no code implementations19 Sep 2020 Zheni Zeng, Chaojun Xiao, Yuan YAO, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun

Recommender systems aim to provide item recommendations for users, and are usually faced with data sparsity problem (e. g., cold start) in real-world scenarios.

Recommendation Systems Transfer Learning

Country Image in COVID-19 Pandemic: A Case Study of China

1 code implementation12 Sep 2020 Huimin Chen, Zeyu Zhu, Fanchao Qi, Yining Ye, Zhiyuan Liu, Maosong Sun, Jianbin Jin

Therefore, in this study, we take China as a specific and typical case and investigate its image with aspect-based sentiment analysis on a large-scale Twitter dataset.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

Modeling Voting for System Combination in Machine Translation

1 code implementation14 Jul 2020 Xuancheng Huang, Jiacheng Zhang, Zhixing Tan, Derek F. Wong, Huanbo Luan, Jingfang Xu, Maosong Sun, Yang Liu

System combination is an important technique for combining the hypotheses of different machine translation systems to improve translation performance.

Machine Translation Translation

Continual Relation Learning via Episodic Memory Activation and Reconsolidation

no code implementations ACL 2020 Xu Han, Yi Dai, Tianyu Gao, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, Jie zhou

Continual relation learning aims to continually train a model on new data to learn incessantly emerging novel relations while avoiding catastrophically forgetting old relations.

Continual Learning Relation

Fast Network Embedding Enhancement via High Order Proximity Approximation

2 code implementations ‏‏‎ ‎ 2020 Cheng Yang, Maosong Sun, Zhiyuan Liu, Cunchao Tu

Many Network Representation Learning (NRL) methods have been proposed to learn vector representations for vertices in a network recently.

Dimensionality Reduction Link Prediction +3

KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion

1 code implementation Findings (ACL) 2021 Jie Zhou, Shengding Hu, Xin Lv, Cheng Yang, Zhiyuan Liu, Wei Xu, Jie Jiang, Juanzi Li, Maosong Sun

Based on the datasets, we propose novel tasks such as multi-hop knowledge abstraction (MKA), multi-hop knowledge concretization (MKC) and then design a comprehensive benchmark.

Knowledge Graphs Transfer Learning

How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence

2 code implementations ACL 2020 Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun

Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain.

Train No Evil: Selective Masking for Task-Guided Pre-Training

1 code implementation EMNLP 2020 Yuxian Gu, Zhengyan Zhang, Xiaozhi Wang, Zhiyuan Liu, Maosong Sun

In this stage, the model is trained by masked language modeling on in-domain unsupervised data to learn domain-specific patterns and we propose a novel selective masking strategy to learn task-specific patterns.

Language Modelling Masked Language Modeling +1

Coreferential Reasoning Learning for Language Representation

2 code implementations EMNLP 2020 Deming Ye, Yankai Lin, Jiaju Du, Zheng-Hao Liu, Peng Li, Maosong Sun, Zhiyuan Liu

Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning.

Relation Extraction

MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space

no code implementations13 Mar 2020 Xiaoyuan Yi, Ruoyu Li, Cheng Yang, Wenhao Li, Maosong Sun

Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity.

Generating Major Types of Chinese Classical Poetry in a Uniformed Framework

no code implementations LREC 2020 Jinyi Hu, Maosong Sun

In this paper, we propose a GPT-2 based uniformed framework for generating major types of Chinese classical poems.

Text Generation

Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence

1 code implementation16 Jan 2020 Jiaju Du, Fanchao Qi, Maosong Sun, Zhiyuan Liu

We find that sememes of each word are usually semantically matched to different words in its dictionary definition, and we name this matching relationship local semantic correspondence.

Semantic correspondence

Multi-channel Reverse Dictionary Model

1 code implementation18 Dec 2019 Lei Zhang, Fanchao Qi, Zhiyuan Liu, Yasheng Wang, Qun Liu, Maosong Sun

A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description.

Reverse Dictionary Sentence

Learning to Predict Explainable Plots for Neural Story Generation

no code implementations5 Dec 2019 Gang Chen, Yang Liu, Huanbo Luan, Meng Zhang, Qun Liu, Maosong Sun

While the use of neural networks has proven effective in improving story generation, how to learn to generate an explainable high-level plot still remains a major challenge.

Sentence Story Generation

JEC-QA: A Legal-Domain Question Answering Dataset

no code implementations27 Nov 2019 Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun

We present JEC-QA, the largest question answering dataset in the legal domain, collected from the National Judicial Examination of China.

Question Answering Reading Comprehension

Neural Machine Translation with Explicit Phrase Alignment

no code implementations26 Nov 2019 Jiacheng Zhang, Huanbo Luan, Maosong Sun, FeiFei Zhai, Jingfang Xu, Yang Liu

The lack of alignment in NMT models leads to three problems: it is hard to (1) interpret the translation process, (2) impose lexical constraints, and (3) impose structural constraints.

Machine Translation NMT +1

Learning to Copy for Automatic Post-Editing

2 code implementations IJCNLP 2019 Xuancheng Huang, Yang Liu, Huanbo Luan, Jingfang Xu, Maosong Sun

To better identify translation errors, our method learns the representations of source sentences and system outputs in an interactive way.

Automatic Post-Editing Translation

Multi-Paragraph Reasoning with Knowledge-enhanced Graph Neural Network

no code implementations6 Nov 2019 Deming Ye, Yankai Lin, Zheng-Hao Liu, Zhiyuan Liu, Maosong Sun

Multi-paragraph reasoning is indispensable for open-domain question answering (OpenQA), which receives less attention in the current OpenQA systems.

Open-Domain Question Answering

Adversarial Language Games for Advanced Natural Language Intelligence

no code implementations5 Nov 2019 Yuan Yao, Haoxi Zhong, Zhengyan Zhang, Xu Han, Xiaozhi Wang, Chaojun Xiao, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

In this work, we propose a challenging adversarial language game called Adversarial Taboo as an example, in which an attacker and a defender compete around a target word.

Board Games

HMEAE: Hierarchical Modular Event Argument Extraction

1 code implementation IJCNLP 2019 Xiaozhi Wang, Ziqi Wang, Xu Han, Zhiyuan Liu, Juanzi Li, Peng Li, Maosong Sun, Jie zhou, Xiang Ren

Existing event extraction methods classify each argument role independently, ignoring the conceptual correlations between different argument roles.

Event Argument Extraction Event Extraction +1

Word-level Textual Adversarial Attacking as Combinatorial Optimization

1 code implementation ACL 2020 Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu, Maosong Sun

Also, further experiments show our model has higher transferability and can bring more robustness enhancement to victim models by adversarial training.

Adversarial Attack Combinatorial Optimization +3

Cannot find the paper you are looking for? You can Submit a new open access paper.