Search Results for author: Jingang Wang

Found 59 papers, 21 papers with code

Ltri-LLM: Streaming Long Context Inference for LLMs with Training-Free Dynamic Triangular Attention Pattern

no code implementations6 Dec 2024 Hongyin Tang, Di Xiu, Lanrui Wang, Xiurui Geng, Jingang Wang, Xunliang Cai

The quadratic computational complexity of the attention mechanism in current Large Language Models (LLMs) renders inference with long contexts prohibitively expensive.

Chunking

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

no code implementations5 Nov 2024 Bei Li, Tong Zheng, Rui Wang, Jiahao Liu, Qingyan Guo, Junliang Guo, Xu Tan, Tong Xiao, Jingbo Zhu, Jingang Wang, Xunliang Cai

First, we introduce a predictor-corrector learning framework to minimize truncation errors, which consists of a high-order predictor and a multistep corrector.

Abstractive Text Summarization Language Modelling +3

Multi-Programming Language Sandbox for LLMs

1 code implementation30 Oct 2024 Shihan Dou, Jiazheng Zhang, Jianxiang Zang, Yunbo Tao, Weikang Zhou, Haoxiang Jia, Shichun Liu, Yuming Yang, Zhiheng Xi, Shenxi Wu, Shaoqing Zhang, Muling Wu, Changze Lv, Limao Xiong, WenYu Zhan, Lin Zhang, Rongxiang Weng, Jingang Wang, Xunliang Cai, Yueming Wu, Ming Wen, Rui Zheng, Tao Ji, Yixin Cao, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

We introduce MPLSandbox, an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler and analysis tools for Large Language Models (LLMs).

FIRP: Faster LLM inference via future intermediate representation prediction

no code implementations27 Oct 2024 Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

Recent advancements in Large Language Models (LLMs) have shown remarkable performance across a wide range of tasks.

Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning

no code implementations9 Oct 2024 Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, Kaize Ding

Textual Attributed Graphs (TAGs) are crucial for modeling complex real-world systems, yet leveraging large language models (LLMs) for TAGs presents unique challenges due to the gap between sequential text processing and graph-structured data.

Graph Neural Network In-Context Learning +2

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

no code implementations8 Oct 2024 Siqi Wang, Zhengyu Chen, Bei Li, Keqing He, Min Zhang, Jingang Wang

The scaling of large language models (LLMs) is a critical research area for the efficiency and effectiveness of model training and deployment.

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

1 code implementation7 Oct 2024 Xinyu Liu, Runsong Zhao, Pengcheng Huang, Chunyang Xiao, Bei Li, Jingang Wang, Tong Xiao, Jingbo Zhu

We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models.

Memorization

Length Desensitization in Direct Preference Optimization

no code implementations10 Sep 2024 Wei Liu, Yang Bai, Chengcheng Han, Rongxiang Weng, Jun Xu, Xuezhi Cao, Jingang Wang, Xunliang Cai

Direct Preference Optimization (DPO) is widely utilized in the Reinforcement Learning from Human Feedback (RLHF) phase to align Large Language Models (LLMs) with human preferences, thereby enhancing both their harmlessness and efficacy.

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

no code implementations28 Aug 2024 Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao

While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models.

Mamba

Graph-Structured Speculative Decoding

no code implementations23 Jul 2024 Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

We apply GSD across a range of LLMs, including a 70-billion parameter LLaMA-2 model, and observe a remarkable speedup of 1. 73$\times$ to 1. 96$\times$, significantly surpassing standard speculative decoding.

Language Modelling

Rethinking LLM-based Preference Evaluation

no code implementations1 Jul 2024 Zhengyu Hu, Linxin Song, Jieyu Zhang, Zheyuan Xiao, Jingang Wang, Zhenyu Chen, Hui Xiong

We decompose the preference evaluation metric, i. e., win rate, from the perspective of human to identify the deeper factors and conclude that the win rate is affected by two axes of model response: desirability and information mass, where the former is length-independent and related to trustworthiness, and the latter is length-dependent and can be represented by conditional entropy.

Language Modelling Large Language Model

Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism

no code implementations6 Jun 2024 Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai

The recent advancements in large language models (LLMs) have been extraordinary, yet the escalating inference costs associated with them present challenges in real-world applications.

Thompson Sampling

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

no code implementations18 Apr 2024 Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

In this paper, we propose a novel parallel decoding approach, namely \textit{hidden transfer}, which decodes multiple successive tokens simultaneously in a single forward pass.

Language Modelling Large Language Model

What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

no code implementations11 Mar 2024 Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization.

Computational Efficiency Quantization

Beyond the Known: Investigating LLMs Performance on Out-of-Domain Intent Detection

no code implementations27 Feb 2024 Pei Wang, Keqing He, Yejie Wang, Xiaoshuai Song, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems.

Intent Detection Transfer Learning

C-ICL: Contrastive In-context Learning for Information Extraction

no code implementations17 Feb 2024 Ying Mo, Jiahao Liu, Jian Yang, Qifan Wang, Shun Zhang, Jingang Wang, Zhoujun Li

There has been increasing interest in exploring the capabilities of advanced large language models (LLMs) in the field of information extraction (IE), specifically focusing on tasks related to named entity recognition (NER) and relation extraction (RE).

In-Context Learning Miscellaneous +4

Sibyl: Empowering Empathetic Dialogue Generation in Large Language Models via Sensible and Visionary Commonsense Inference

1 code implementation26 Nov 2023 Lanrui Wang, Jiangnan Li, Chenxu Yang, Zheng Lin, Hongyin Tang, Huan Liu, Yanan Cao, Jingang Wang, Weiping Wang

Recently, there has been a heightened interest in building chatbots based on Large Language Models (LLMs) to emulate human-like qualities in multi-turn conversations.

Dialogue Generation

Improving Input-label Mapping with Demonstration Replay for In-context Learning

no code implementations30 Oct 2023 Zhuocheng Gong, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

The effectiveness of ICL can be attributed to the strong language modeling capabilities of large language models (LLMs), which enable them to learn the mapping between input and labels based on in-context demonstrations.

In-Context Learning Language Modelling

Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression

no code implementations24 Oct 2023 Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Ran Lucien Wang, Rui Yan

In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference.

Language Modelling Large Language Model +3

APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection

no code implementations20 Oct 2023 Pei Wang, Keqing He, Yutao Mou, Xiaoshuai Song, Yanan Wu, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

Detecting out-of-domain (OOD) intents from user queries is essential for a task-oriented dialogue system.

Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT

1 code implementation16 Oct 2023 Xiaoshuai Song, Keqing He, Pei Wang, Guanting Dong, Yutao Mou, Jingang Wang, Yunsen Xian, Xunliang Cai, Weiran Xu

The tasks of out-of-domain (OOD) intent discovery and generalized intent discovery (GID) aim to extend a closed intent classifier to open-world intent sets, which is crucial to task-oriented dialogue (TOD) systems.

In-Context Learning Intent Discovery

mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view Contrastive Learning

no code implementations17 Aug 2023 Ying Mo, Jian Yang, Jiahao Liu, Qifan Wang, Ruoyu Chen, Jingang Wang, Zhoujun Li

A multi-view contrastive learning framework is introduced to encompass semantic contrasts between source, codeswitched, and target sentences, as well as contrasts among token-to-token relations.

Contrastive Learning named-entity-recognition +2

Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation

1 code implementation17 Jun 2023 Weihao Zeng, Lulu Zhao, Keqing He, Ruotong Geng, Jingang Wang, Wei Wu, Weiran Xu

In this paper, we explore the compositional generalization for multi-attribute controllable dialogue generation where a model can learn from seen attribute values and generalize to unseen combinations.

Attribute Dialogue Generation +1

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

1 code implementation11 Jun 2023 Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao, Peng Zhang, Jie Tang

Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices.

General Knowledge Knowledge Distillation +1

PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

no code implementations30 May 2023 Zhuocheng Gong, Jiahao Liu, Qifan Wang, Yang Yang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Rui Yan

While transformer-based pre-trained language models (PLMs) have dominated a number of NLP applications, these models are heavy to deploy and expensive to use.

parameter-efficient fine-tuning Quantization

Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery

1 code implementation28 May 2023 Yutao Mou, Xiaoshuai Song, Keqing He, Chen Zeng, Pei Wang, Jingang Wang, Yunsen Xian, Weiran Xu

Previous methods suffer from a coupling of pseudo label disambiguation and representation learning, that is, the reliability of pseudo labels relies on representation learning, and representation learning is restricted by pseudo labels in turn.

Intent Discovery Pseudo Label +1

RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank

1 code implementation26 May 2023 Jiduan Liu, Jiahao Liu, Qifan Wang, Jingang Wang, Wei Wu, Yunsen Xian, Dongyan Zhao, Kai Chen, Rui Yan

In this paper, we propose a novel approach, RankCSE, for unsupervised sentence representation learning, which incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.

Contrastive Learning Learning-To-Rank +4

Task-agnostic Distillation of Encoder-Decoder Language Models

no code implementations21 May 2023 Chen Zhang, Yang Yang, Jingang Wang, Dawei Song

Finetuning pretrained language models (LMs) have enabled appealing performance on a diverse array of tasks.

Abstractive Text Summarization Decoder

Lifting the Curse of Capacity Gap in Distilling Language Models

1 code implementation20 May 2023 Chen Zhang, Yang Yang, Jiahao Liu, Jingang Wang, Yunsen Xian, Benyou Wang, Dawei Song

However, when the capacity gap between the teacher and the student is large, a curse of capacity gap appears, invoking a deficiency in distilling LMs.

Knowledge Distillation

Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration

no code implementations15 Dec 2022 Liqi Yan, Qifan Wang, Siqi Ma, Jingang Wang, Changbin Yu

Instance segmentation in videos, which aims to segment and track multiple objects in video frames, has garnered a flurry of research attention in recent years.

Depth Estimation Instance Segmentation +3

UniNL: Aligning Representation Learning with Scoring Function for OOD Detection via Unified Neighborhood Learning

1 code implementation19 Oct 2022 Yutao Mou, Pei Wang, Keqing He, Yanan Wu, Jingang Wang, Wei Wu, Weiran Xu

Specifically, we design a K-nearest neighbor contrastive learning (KNCL) objective for representation learning and introduce a KNN-based scoring function for OOD detection.

Contrastive Learning Out of Distribution (OOD) Detection +2

Semi-Supervised Knowledge-Grounded Pre-training for Task-Oriented Dialog Systems

1 code implementation17 Oct 2022 Weihao Zeng, Keqing He, Zechen Wang, Dayuan Fu, Guanting Dong, Ruotong Geng, Pei Wang, Jingang Wang, Chaobo Sun, Wei Wu, Weiran Xu

Recent advances in neural approaches greatly improve task-oriented dialogue (TOD) systems which assist users to accomplish their goals.

Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery

1 code implementation17 Oct 2022 Yutao Mou, Keqing He, Pei Wang, Yanan Wu, Jingang Wang, Wei Wu, Weiran Xu

For OOD clustering stage, we propose a KCC method to form compact clusters by mining true hard negative samples, which bridges the gap between clustering and representation learning.

Clustering Contrastive Learning +4

XPrompt: Exploring the Extreme of Prompt Tuning

no code implementations10 Oct 2022 Fang Ma, Chen Zhang, Lei Ren, Jingang Wang, Qifan Wang, Wei Wu, Xiaojun Quan, Dawei Song

Prompt tuning learns soft prompts to condition frozen Pre-trained Language Models (PLMs) for performing downstream tasks in a parameter-efficient manner.

Unified Knowledge Prompt Pre-training for Customer Service Dialogues

no code implementations31 Aug 2022 Keqing He, Jingang Wang, Chaobo Sun, Wei Wu

In this paper, we propose a novel unified knowledge prompt pre-training framework, UFA (\textbf{U}nified Model \textbf{F}or \textbf{A}ll Tasks), for customer service dialogues.

Natural Language Understanding Text Generation

MiniDisc: Minimal Distillation Schedule for Language Model Compression

1 code implementation29 May 2022 Chen Zhang, Yang Yang, Qifan Wang, Jiahao Liu, Jingang Wang, Wei Wu, Dawei Song

In particular, motivated by the finding that the performance of the student is positively correlated to the scale-performance tradeoff of the teacher assistant, MiniDisc is designed with a $\lambda$-tradeoff to measure the optimality of the teacher assistant without trial distillation to the student.

Knowledge Distillation Language Modelling +2

Making Pretrained Language Models Good Long-tailed Learners

1 code implementation11 May 2022 Chen Zhang, Lei Ren, Jingang Wang, Wei Wu, Dawei Song

Prompt-tuning has shown appealing performance in few-shot classification by virtue of its capability in effectively exploiting pre-trained knowledge.

Classification

GNN-encoder: Learning a Dual-encoder Architecture via Graph Neural Networks for Dense Passage Retrieval

no code implementations18 Apr 2022 Jiduan Liu, Jiahao Liu, Yang Yang, Jingang Wang, Wei Wu, Dongyan Zhao, Rui Yan

To enhance the performance of dense retrieval models without loss of efficiency, we propose a GNN-encoder model in which query (passage) information is fused into passage (query) representations via graph neural networks that are constructed by queries and their top retrieved passages.

Natural Questions Passage Retrieval +2

Deep Partial Multiplex Network Embedding

no code implementations5 Mar 2022 Qifan Wang, Yi Fang, Anirudh Ravula, Ruining He, Bin Shen, Jingang Wang, Xiaojun Quan, Dongfang Liu

Network embedding is an effective technique to learn the low-dimensional representations of nodes in networks.

Link Prediction Network Embedding +1

VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction

no code implementations8 Dec 2021 Dan Li, Yang Yang, Hongyin Tang, Jingang Wang, Tong Xu, Wei Wu, Enhong Chen

With the booming of pre-trained transformers, representation-based models based on Siamese transformer encoders have become mainstream techniques for efficient text matching.

Text Matching

Improving Document Representations by Generating Pseudo Query Embeddings for Dense Retrieval

no code implementations ACL 2021 Hongyin Tang, Xingwu Sun, Beihong Jin, Jingang Wang, Fuzheng Zhang, Wei Wu

Recently, the retrieval models based on dense representations have been gradually applied in the first stage of the document retrieval tasks, showing better performance than traditional sparse vector space models.

Clustering Retrieval

Query-aware Tip Generation for Vertical Search

no code implementations19 Oct 2020 Yang Yang, Junmei Hao, Canjia Li, Zili Wang, Jingang Wang, Fuzheng Zhang, Rao Fu, Peixu Hou, Gong Zhang, Zhongyuan Wang

Existing work on tip generation does not take query into consideration, which limits the impact of tips in search scenarios.

Decision Making Decoder

Earlier Attention? Aspect-Aware LSTM for Aspect-Based Sentiment Analysis

no code implementations19 May 2019 Bowen Xing, Lejian Liao, Dandan song, Jingang Wang, Fuzheng Zhang, Zhongyuan Wang, He-Yan Huang

This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

A Multi-task Learning Approach for Improving Product Title Compression with User Search Log Data

no code implementations5 Jan 2018 Jingang Wang, Junfeng Tian, Long Qiu, Sheng Li, Jun Lang, Luo Si, Man Lan

It is a challenging and practical research problem to obtain effective compression of lengthy product titles for E-commerce.

Decoder Multi-Task Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.