Search Results for author: Tianxiang Sun

Found 32 papers, 24 papers with code

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance

1 code implementation • 25 Mar 2024 • Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu

Pretraining data of large language models composes multiple domains (e. g., web texts, academic papers, codes), whose mixture proportions crucially impact the competence of outcome models.

Language Modelling

Paper
Code

In-Memory Learning: A Declarative Learning Framework for Large Language Models

no code implementations • 5 Mar 2024 • Bo wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic.

Paper
Add Code

Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT

no code implementations • 19 Feb 2024 • Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to attack superposition and extract more human-understandable features from model activations.

Dictionary Learning

Paper
Add Code

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

1 code implementation • 19 Feb 2024 • Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music.

Language Modelling Large Language Model

439

Paper
Code

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation

no code implementations • 17 Feb 2024 • Siyin Wang, ShiMin Li, Tianxiang Sun, Jinlan Fu, Qinyuan Cheng, Jiasheng Ye, Junjie Ye, Xipeng Qiu, Xuanjing Huang

HAG extends the current paradigm in the text generation process, highlighting the feasibility of endowing the LLMs with self-regulate decoding strategies.

Text Generation

Paper
Add Code

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

no code implementations • 17 Feb 2024 • Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification.

Computational Efficiency

Paper
Add Code

DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

1 code implementation • 24 Jan 2024 • Xinghao Wang, Junliang He, Pengyu Wang, Yunhua Zhou, Tianxiang Sun, Xipeng Qiu

These methods regularize the representation space by pulling similar sentence representations closer and pushing away the dissimilar ones and have been proven effective in various NLP tasks, e. g., semantic textual similarity (STS) tasks.

Contrastive Learning Denoising +4

Paper
Code

Can AI Assistants Know What They Don't Know?

1 code implementation • 24 Jan 2024 • Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, ShiMin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu

To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets.

Math Open-Domain Question Answering +1

Paper
Code

Agent Alignment in Evolving Social Norms

no code implementations • 9 Jan 2024 • ShiMin Li, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

Agents based on Large Language Models (LLMs) are increasingly permeating various domains of human production and life, highlighting the importance of aligning them with human values.

Paper
Add Code

LLatrieval: LLM-Verified Retrieval for Verifiable Generation

1 code implementation • 14 Nov 2023 • Xiaonan Li, Changtai Zhu, Linyang Li, Zhangyue Yin, Tianxiang Sun, Xipeng Qiu

Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation.

Language Modelling Large Language Model +1

Paper
Code

Flames: Benchmarking Value Alignment of LLMs in Chinese

1 code implementation • 12 Nov 2023 • Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin

The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values.

Benchmarking Fairness

Paper
Code

Evaluating Hallucinations in Chinese Large Language Models

2 code implementations • 5 Oct 2023 • Qinyuan Cheng, Tianxiang Sun, Wenwei Zhang, Siyin Wang, Xiangyang Liu, Mozhi Zhang, Junliang He, Mianqiu Huang, Zhangyue Yin, Kai Chen, Xipeng Qiu

We analyze the primary types of hallucinations in different types of models and their causes.

Hallucination Question Answering

494

Paper
Code

Secrets of RLHF in Large Language Models Part I: PPO

1 code implementation • 11 Jul 2023 • Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang

Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model.

1,159

Paper
Code

CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors

1 code implementation • 9 May 2023 • Peng Li, Tianxiang Sun, Qiong Tang, Hang Yan, Yuanbin Wu, Xuanjing Huang, Xipeng Qiu

A common practice is to recast the task into a text-to-text format such that generative LLMs of natural language (NL-LLMs) like GPT-3 can be prompted to solve it.

Code Generation Few-Shot Learning +4

Paper
Code

Improving Contrastive Learning of Sentence Embeddings from AI Feedback

1 code implementation • 3 May 2023 • Qinyuan Cheng, Xiaogui Yang, Tianxiang Sun, Linyang Li, Xipeng Qiu

Our method utilizes AI feedback from large pre-trained language models (LLMs) to construct sample pairs with fine-grained sample similarity scores to improve contrastive learning.

Contrastive Learning Data Augmentation +5

Paper
Code

Origin Tracing and Detecting of LLMs

no code implementations • 27 Apr 2023 • Linyang Li, Pengyu Wang, Ke Ren, Tianxiang Sun, Xipeng Qiu

The extraordinary performance of large language models (LLMs) heightens the importance of detecting whether the context is generated by an AI system.

Paper
Add Code

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

1 code implementation • 28 Nov 2022 • Zhengfu He, Tianxiang Sun, Kuanning Wang, Xuanjing Huang, Xipeng Qiu

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models.

Denoising Language Modelling +1

270

Paper
Code

Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts

1 code implementation • 20 Oct 2022 • Xiangyang Liu, Tianxiang Sun, Xuanjing Huang, Xipeng Qiu

Through extensive experimental results across various tasks and PTMs, we show that LPT can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios while possessing faster training speed and lower memory cost.

Paper
Code

Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning

1 code implementation • 14 Oct 2022 • Tianxiang Sun, Zhengfu He, Qin Zhu, Xipeng Qiu, Xuanjing Huang

MP2 is a set of combinable prompts pre-trained on 38 Chinese tasks.

Few-Shot Learning Machine Reading Comprehension

Paper
Code

BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation

1 code implementation • 14 Oct 2022 • Tianxiang Sun, Junliang He, Xipeng Qiu, Xuanjing Huang

Automatic evaluation metrics are crucial to the development of generative systems.

Fairness Language Modelling +1

Paper
Code

BBTv2: Towards a Gradient-Free Future with Large Language Models

1 code implementation • 23 May 2022 • Tianxiang Sun, Zhengfu He, Hong Qian, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu

By contrast, gradient-free methods only require the forward computation of the PTM to tune the prompt, retaining the benefits of efficient tuning and deployment.

Few-Shot Learning Language Modelling

253

Paper
Code

A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation

1 code implementation • Findings (ACL) 2022 • Tianxiang Sun, Xiangyang Liu, Wei Zhu, Zhichao Geng, Lingling Wu, Yilong He, Yuan Ni, Guotong Xie, Xuanjing Huang, Xipeng Qiu

Previous works usually adopt heuristic metrics such as the entropy of internal outputs to measure instance difficulty, which suffers from generalization and threshold-tuning.

Paper
Code

Black-Box Tuning for Language-Model-as-a-Service

2 code implementations • 10 Jan 2022 • Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu

In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable.

In-Context Learning Language Modelling

253

Paper
Code

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

1 code implementation • NAACL 2022 • Xiangyang Liu, Tianxiang Sun, Junliang He, Jiawen Wu, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

ELUE is dedicated to depict the Pareto Frontier for various language understanding tasks, such that it can tell whether and how much a method achieves Pareto improvement.

Paper
Code

Paradigm Shift in Natural Language Processing

1 code implementation • 26 Sep 2021 • Tianxiang Sun, Xiangyang Liu, Xipeng Qiu, Xuanjing Huang

In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.

Chunking NER +3

Paper
Code

Learning to Teach with Student Feedback

no code implementations • 10 Sep 2021 • Yitao Liu, Tianxiang Sun, Xipeng Qiu, Xuanjing Huang

This one-way interaction leads to the teacher's inability to perceive the characteristics of the student and its training progress.

Knowledge Distillation

Paper
Add Code

Early Exiting with Ensemble Internal Classifiers

no code implementations • 28 May 2021 • Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

In this paper, we show that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory.

Ensemble Learning

Paper
Add Code

Accelerating BERT Inference for Sequence Labeling via Early-Exit

1 code implementation • ACL 2021 • Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang

To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks.

Sentence

Paper
Code

Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa

1 code implementation • NAACL 2021 • Junqi Dai, Hang Yan, Tianxiang Sun, PengFei Liu, Xipeng Qiu

In this paper, we firstly compare the induced trees from PTMs and the dependency parsing trees on several popular models for the ABSA task, showing that the induced tree from fine-tuned RoBERTa (FT-RoBERTa) outperforms the parser-provided tree.

Ranked #5 on Aspect-Based Sentiment Analysis (ABSA) on SemEval-2014 Task-4

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

124

Paper
Code

CoLAKE: Contextualized Language and Knowledge Embedding

1 code implementation • COLING 2020 • Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, Zheng Zhang

With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models.

Entity Embeddings Knowledge Graph Completion +1