Search Results for author: Tianxiang Sun

Found 32 papers, 24 papers with code

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance

1 code implementation25 Mar 2024 Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu

Pretraining data of large language models composes multiple domains (e. g., web texts, academic papers, codes), whose mixture proportions crucially impact the competence of outcome models.

Language Modelling

In-Memory Learning: A Declarative Learning Framework for Large Language Models

no code implementations5 Mar 2024 Bo wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic.

Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT

no code implementations19 Feb 2024 Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to attack superposition and extract more human-understandable features from model activations.

Dictionary Learning

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

1 code implementation19 Feb 2024 Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music.

Language Modelling Large Language Model

LLM can Achieve Self-Regulation via Hyperparameter Aware Generation

no code implementations17 Feb 2024 Siyin Wang, ShiMin Li, Tianxiang Sun, Jinlan Fu, Qinyuan Cheng, Jiasheng Ye, Junjie Ye, Xipeng Qiu, Xuanjing Huang

HAG extends the current paradigm in the text generation process, highlighting the feasibility of endowing the LLMs with self-regulate decoding strategies.

Text Generation

Turn Waste into Worth: Rectifying Top-$k$ Router of MoE

no code implementations17 Feb 2024 Zhiyuan Zeng, Qipeng Guo, Zhaoye Fei, Zhangyue Yin, Yunhua Zhou, Linyang Li, Tianxiang Sun, Hang Yan, Dahua Lin, Xipeng Qiu

To address the dropped tokens and padding, we propose the Rectify-Router, comprising the Intra-GPU Rectification and the Fill-in Rectification.

Computational Efficiency

DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

1 code implementation24 Jan 2024 Xinghao Wang, Junliang He, Pengyu Wang, Yunhua Zhou, Tianxiang Sun, Xipeng Qiu

These methods regularize the representation space by pulling similar sentence representations closer and pushing away the dissimilar ones and have been proven effective in various NLP tasks, e. g., semantic textual similarity (STS) tasks.

Contrastive Learning Denoising +4

Can AI Assistants Know What They Don't Know?

1 code implementation24 Jan 2024 Qinyuan Cheng, Tianxiang Sun, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, ShiMin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu

To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets.

Math Open-Domain Question Answering +1

Agent Alignment in Evolving Social Norms

no code implementations9 Jan 2024 ShiMin Li, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu

Agents based on Large Language Models (LLMs) are increasingly permeating various domains of human production and life, highlighting the importance of aligning them with human values.

LLatrieval: LLM-Verified Retrieval for Verifiable Generation

1 code implementation14 Nov 2023 Xiaonan Li, Changtai Zhu, Linyang Li, Zhangyue Yin, Tianxiang Sun, Xipeng Qiu

Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation.

Language Modelling Large Language Model +1

Flames: Benchmarking Value Alignment of LLMs in Chinese

1 code implementation12 Nov 2023 Kexin Huang, Xiangyang Liu, Qianyu Guo, Tianxiang Sun, Jiawei Sun, Yaru Wang, Zeyang Zhou, Yixu Wang, Yan Teng, Xipeng Qiu, Yingchun Wang, Dahua Lin

The widespread adoption of large language models (LLMs) across various regions underscores the urgent need to evaluate their alignment with human values.

Benchmarking Fairness

CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors

1 code implementation9 May 2023 Peng Li, Tianxiang Sun, Qiong Tang, Hang Yan, Yuanbin Wu, Xuanjing Huang, Xipeng Qiu

A common practice is to recast the task into a text-to-text format such that generative LLMs of natural language (NL-LLMs) like GPT-3 can be prompted to solve it.

Code Generation Few-Shot Learning +4

Improving Contrastive Learning of Sentence Embeddings from AI Feedback

1 code implementation3 May 2023 Qinyuan Cheng, Xiaogui Yang, Tianxiang Sun, Linyang Li, Xipeng Qiu

Our method utilizes AI feedback from large pre-trained language models (LLMs) to construct sample pairs with fine-grained sample similarity scores to improve contrastive learning.

Contrastive Learning Data Augmentation +5

Origin Tracing and Detecting of LLMs

no code implementations27 Apr 2023 Linyang Li, Pengyu Wang, Ke Ren, Tianxiang Sun, Xipeng Qiu

The extraordinary performance of large language models (LLMs) heightens the importance of detecting whether the context is generated by an AI system.

Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts

1 code implementation20 Oct 2022 Xiangyang Liu, Tianxiang Sun, Xuanjing Huang, Xipeng Qiu

Through extensive experimental results across various tasks and PTMs, we show that LPT can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios while possessing faster training speed and lower memory cost.

BBTv2: Towards a Gradient-Free Future with Large Language Models

1 code implementation23 May 2022 Tianxiang Sun, Zhengfu He, Hong Qian, Yunhua Zhou, Xuanjing Huang, Xipeng Qiu

By contrast, gradient-free methods only require the forward computation of the PTM to tune the prompt, retaining the benefits of efficient tuning and deployment.

Few-Shot Learning Language Modelling

A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation

1 code implementation Findings (ACL) 2022 Tianxiang Sun, Xiangyang Liu, Wei Zhu, Zhichao Geng, Lingling Wu, Yilong He, Yuan Ni, Guotong Xie, Xuanjing Huang, Xipeng Qiu

Previous works usually adopt heuristic metrics such as the entropy of internal outputs to measure instance difficulty, which suffers from generalization and threshold-tuning.

Black-Box Tuning for Language-Model-as-a-Service

2 code implementations10 Jan 2022 Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu

In such a scenario, which we call Language-Model-as-a-Service (LMaaS), the gradients of PTMs are usually unavailable.

In-Context Learning Language Modelling

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

1 code implementation NAACL 2022 Xiangyang Liu, Tianxiang Sun, Junliang He, Jiawen Wu, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

ELUE is dedicated to depict the Pareto Frontier for various language understanding tasks, such that it can tell whether and how much a method achieves Pareto improvement.

Paradigm Shift in Natural Language Processing

1 code implementation26 Sep 2021 Tianxiang Sun, Xiangyang Liu, Xipeng Qiu, Xuanjing Huang

In this paper, we review such phenomenon of paradigm shifts in recent years, highlighting several paradigms that have the potential to solve different NLP tasks.

Chunking NER +3

Learning to Teach with Student Feedback

no code implementations10 Sep 2021 Yitao Liu, Tianxiang Sun, Xipeng Qiu, Xuanjing Huang

This one-way interaction leads to the teacher's inability to perceive the characteristics of the student and its training progress.

Knowledge Distillation

Early Exiting with Ensemble Internal Classifiers

no code implementations28 May 2021 Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

In this paper, we show that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory.

Ensemble Learning

Accelerating BERT Inference for Sequence Labeling via Early-Exit

1 code implementation ACL 2021 Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang

To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks.

Sentence

Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa

1 code implementation NAACL 2021 Junqi Dai, Hang Yan, Tianxiang Sun, PengFei Liu, Xipeng Qiu

In this paper, we firstly compare the induced trees from PTMs and the dependency parsing trees on several popular models for the ABSA task, showing that the induced tree from fine-tuned RoBERTa (FT-RoBERTa) outperforms the parser-provided tree.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

CoLAKE: Contextualized Language and Knowledge Embedding

1 code implementation COLING 2020 Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, Zheng Zhang

With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models.

Entity Embeddings Knowledge Graph Completion +1

Pre-trained Models for Natural Language Processing: A Survey

3 code implementations18 Mar 2020 Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang

Recently, the emergence of pre-trained models (PTMs) has brought natural language processing (NLP) to a new era.

Representation Learning

Learning Sparse Sharing Architectures for Multiple Tasks

1 code implementation12 Nov 2019 Tianxiang Sun, Yunfan Shao, Xiaonan Li, PengFei Liu, Hang Yan, Xipeng Qiu, Xuanjing Huang

Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing.

Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.