Search Results for author: Yichun Yin

Found 23 papers, 9 papers with code

Preparing Lessons for Progressive Training on Language Models

1 code implementation17 Jan 2024 Yu Pan, Ye Yuan, Yichun Yin, Jiaxin Shi, Zenglin Xu, Ming Zhang, Lifeng Shang, Xin Jiang, Qun Liu

The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes.

FIMO: A Challenge Formal Dataset for Automated Theorem Proving

1 code implementation8 Sep 2023 Chengwu Liu, Jianhao Shen, Huajian Xin, Zhengying Liu, Ye Yuan, Haiming Wang, Wei Ju, Chuanyang Zheng, Yichun Yin, Lin Li, Ming Zhang, Qun Liu

We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems.

Automated Theorem Proving

AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models

no code implementations12 Aug 2023 Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang

Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years.

Few-Shot Learning Language Modelling

NewsDialogues: Towards Proactive News Grounded Conversation

1 code implementation12 Aug 2023 Siheng Li, Yichun Yin, Cheng Yang, Wangjie Jiang, Yiwei Li, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang

In this paper, we propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news.

Response Generation

G-MAP: General Memory-Augmented Pre-trained Language Model for Domain Tasks

1 code implementation7 Dec 2022 Zhongwei Wan, Yichun Yin, Wei zhang, Jiaxin Shi, Lifeng Shang, Guangyong Chen, Xin Jiang, Qun Liu

Recently, domain-specific PLMs have been proposed to boost the task performance of specific domains (e. g., biomedical and computer science) by continuing to pre-train general PLMs with domain-specific corpora.

General Knowledge Language Modelling +3

FPT: Improving Prompt Tuning Efficiency via Progressive Training

1 code implementation13 Nov 2022 Yufei Huang, Yujia Qin, Huadong Wang, Yichun Yin, Maosong Sun, Zhiyuan Liu, Qun Liu

Inspired by these observations, we propose Fast Prompt Tuning (FPT), which starts by conducting PT using a small-scale partial PLM, and then progressively expands its depth and width until the full-model size.

bert2BERT: Towards Reusable Pretrained Language Models

no code implementations ACL 2022 Cheng Chen, Yichun Yin, Lifeng Shang, Xin Jiang, Yujia Qin, Fengyu Wang, Zhi Wang, Xiao Chen, Zhiyuan Liu, Qun Liu

However, large language model pre-training costs intensive computational resources and most of the models are trained from scratch without reusing the existing pre-trained models, which is wasteful.

Language Modelling Large Language Model

AutoTinyBERT: Automatic Hyper-parameter Optimization for Efficient Pre-trained Language Models

1 code implementation ACL 2021 Yichun Yin, Cheng Chen, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Specifically, we carefully design the techniques of one-shot learning and the search space to provide an adaptive and efficient development way of tiny PLMs for various latency constraints.

Neural Architecture Search One-Shot Learning

Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation

no code implementations24 Apr 2021 Cheng Chen, Yichun Yin, Lifeng Shang, Zhi Wang, Xin Jiang, Xiao Chen, Qun Liu

Task-agnostic knowledge distillation, a teacher-student framework, has been proved effective for BERT compression.

Knowledge Distillation

LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation

no code implementations11 Mar 2021 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

The multilingual pre-trained language models (e. g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks.

Natural Language Understanding XLM-R

Improving Task-Agnostic BERT Distillation with Layer Mapping Search

no code implementations11 Dec 2020 Xiaoqi Jiao, Huating Chang, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

Comprehensive experiments on the evaluation benchmarks demonstrate that 1) layer mapping strategy has a significant effect on task-agnostic BERT distillation and different layer mappings can result in quite different performances; 2) the optimal layer mapping strategy from the proposed search process consistently outperforms the other heuristic ones; 3) with the optimal layer mapping, our student model achieves state-of-the-art performance on the GLUE tasks.

Knowledge Distillation

TernaryBERT: Distillation-aware Ultra-low Bit BERT

5 code implementations EMNLP 2020 Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks. However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices.

Knowledge Distillation Quantization

PoD: Positional Dependency-Based Word Embedding for Aspect Term Extraction

no code implementations COLING 2020 Yichun Yin, Chenguang Wang, Ming Zhang

Dependency context-based word embedding jointly learns the representations of word and dependency context, and has been proved effective in aspect term extraction.

Aspect Term Extraction and Sentiment Classification POS +2

TinyBERT: Distilling BERT for Natural Language Understanding

7 code implementations Findings of the Association for Computational Linguistics 2020 Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models.

Knowledge Distillation Language Modelling +6

Dialog State Tracking with Reinforced Data Augmentation

no code implementations21 Aug 2019 Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

Neural dialog state trackers are generally limited due to the lack of quantity and diversity of annotated training data.

Data Augmentation dialog state tracking +1

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction

no code implementations25 May 2016 Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou

In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.

Term Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.