Search Results for author: Shengding Hu

Found 26 papers, 19 papers with code

UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

1 code implementation • 11 Apr 2024 • Chaoqun He, Renjie Luo, Shengding Hu, Yuanqian Zhao, Jie zhou, Hanghao Wu, Jiajie Zhang, Xu Han, Zhiyuan Liu, Maosong Sun

The rapid development of LLMs calls for a lightweight and easy-to-use framework for swift evaluation deployment.

144

Paper
Code

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

2 code implementations • 9 Apr 2024 • Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan YAO, Chenyang Zhao, Jie zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.

Domain Adaptation

3,726

Paper
Code

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

no code implementations • 23 Feb 2024 • Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models.

Memorization Multi-Task Learning

Paper
Add Code

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

1 code implementation • 21 Feb 2024 • Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, JunHao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu, Maosong Sun

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction.

186

Paper
Code

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models

1 code implementation • 21 Feb 2024 • Chenyang Song, Xu Han, Zhengyan Zhang, Shengding Hu, Xiyu Shi, Kuai Li, Chen Chen, Zhiyuan Liu, Guangli Li, Tao Yang, Maosong Sun

Some recent efforts have explored introducing ReLU or its variants as the substitutive activation function to help LLMs achieve activation sparsity and inference acceleration, but few can simultaneously obtain high sparsity and comparable model performance.

Paper
Code

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

1 code implementation • 21 Feb 2024 • Chaoqun He, Renjie Luo, Yuzhuo Bai, Shengding Hu, Zhen Leng Thai, Junhao Shen, Jinyi Hu, Xu Han, Yujie Huang, Yuxiang Zhang, Jie Liu, Lei Qi, Zhiyuan Liu, Maosong Sun

Notably, the best-performing model, GPT-4V, attains an average score of 17. 23% on OlympiadBench, with a mere 11. 28% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.

Logical Fallacies

Paper
Code

Predicting Emergent Abilities with Infinite Resolution Evaluation

no code implementations • 5 Oct 2023 • Shengding Hu, Xin Liu, Xu Han, Xinrong Zhang, Chaoqun He, Weilin Zhao, Yankai Lin, Ning Ding, Zebin Ou, Guoyang Zeng, Zhiyuan Liu, Maosong Sun

With PassUntil, we conduct a quantitative investigation into the scaling law of task performance.

Code Generation

Paper
Add Code

Won't Get Fooled Again: Answering Questions with False Premises

1 code implementation • 5 Jul 2023 • Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu, Maosong Sun

In this paper, we find that the PLMs already possess the knowledge required to rebut such questions, and the key is how to activate the knowledge.

Question Answering

Paper
Code

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

1 code implementation • 5 Jul 2023 • Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun

The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning.

938

Paper
Code

Exploring the Impact of Model Scaling on Parameter-Efficient Tuning

1 code implementation • 4 Jun 2023 • Yusheng Su, Chi-Min Chan, Jiali Cheng, Yujia Qin, Yankai Lin, Shengding Hu, Zonghan Yang, Ning Ding, Xingzhi Sun, Guotong Xie, Zhiyuan Liu, Maosong Sun

Our investigations reveal that model scaling (1) mitigates the effects of the positions of tunable parameters on performance, and (2) enables tuning methods to achieve performance comparable to full-parameter fine-tuning by optimizing fewer tunable parameters.

Paper
Code

Exploring Lottery Prompts for Pre-trained Language Models

no code implementations • 31 May 2023 • Yulin Chen, Ning Ding, Xiaobin Wang, Shengding Hu, Hai-Tao Zheng, Zhiyuan Liu, Pengjun Xie

Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning.

Paper
Add Code

Enhancing Chat Language Models by Scaling High-quality Instructional Conversations

1 code implementation • 23 May 2023 • Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, BoWen Zhou

Fine-tuning on instruction data has been widely validated as an effective practice for implementing chat language models like ChatGPT.

2,106

Paper
Code

Tool Learning with Foundation Models

3 code implementations • 17 Apr 2023 • Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, Yi Ren Fung, Yusheng Su, Huadong Wang, Cheng Qian, Runchu Tian, Kunlun Zhu, Shihao Liang, Xingyu Shen, Bokai Xu, Zhen Zhang, Yining Ye, Bowen Li, Ziwei Tang, Jing Yi, Yuzhang Zhu, Zhenning Dai, Lan Yan, Xin Cong, Yaxi Lu, Weilin Zhao, Yuxiang Huang, Junxi Yan, Xu Han, Xian Sun, Dahai Li, Jason Phang, Cheng Yang, Tongshuang Wu, Heng Ji, Zhiyuan Liu, Maosong Sun

Considering the lack of a systematic tool learning evaluation in prior works, we experiment with 18 representative tools and show the potential of current foundation models in skillfully utilizing tools.

4,396

Paper
Code

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

no code implementations • 8 Apr 2023 • Zihao Fu, Wai Lam, Qian Yu, Anthony Man-Cho So, Shengding Hu, Zhiyuan Liu, Nigel Collier

Grounded on our analysis, we propose a novel partial attention language model to solve the attention degeneration problem.

Data-to-Text Generation Language Modelling +1

Paper
Add Code

COPEN: Probing Conceptual Knowledge in Pre-trained Language Models

1 code implementation • 8 Nov 2022 • Hao Peng, Xiaozhi Wang, Shengding Hu, Hailong Jin, Lei Hou, Juanzi Li, Zhiyuan Liu, Qun Liu

We believe this is a critical bottleneck for realizing human-like cognition in PLMs.

Knowledge Probing

Paper
Code

Sparse Structure Search for Delta Tuning

1 code implementation • NIPS 2022 • Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs.

Paper
Code

Sparse Structure Search for Parameter-Efficient Tuning

no code implementations • 15 Jun 2022 • Shengding Hu, Zhen Zhang, Ning Ding, Yadao Wang, Yasheng Wang, Zhiyuan Liu, Maosong Sun

The searched structures preserve more than 99\% fine-tuning performance with 0. 01\% trainable parameters.

Paper
Add Code

A Roadmap for Big Model

no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.

Language Modelling Machine Translation +1

Paper
Add Code

Prototypical Verbalizer for Prompt-based Few-shot Tuning

1 code implementation • ACL 2022 • Ganqu Cui, Shengding Hu, Ning Ding, Longtao Huang, Zhiyuan Liu

However, manual verbalizers heavily depend on domain-specific prior knowledge and human efforts, while finding appropriate label words automatically still remains challenging. In this work, we propose the prototypical verbalizer (ProtoVerb) which is built directly from training data.

Contrastive Learning Entity Typing +2

4,148

Paper
Code

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

1 code implementation • 14 Mar 2022 • Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, Maosong Sun

This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper.

Text Classification

938

Paper
Code

OpenPrompt: An Open-source Framework for Prompt-learning

2 code implementations • ACL 2022 • Ning Ding, Shengding Hu, Weilin Zhao, Yulin Chen, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun

Prompt-learning has become a new paradigm in modern natural language processing, which directly adapts pre-trained language models (PLMs) to $cloze$-style prediction, autoregressive modeling, or sequence to sequence generation, resulting in promising performances on various tasks.

4,148

Paper
Code

Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification

2 code implementations • ACL 2022 • Shengding Hu, Ning Ding, Huadong Wang, Zhiyuan Liu, Jingang Wang, Juanzi Li, Wei Wu, Maosong Sun

Tuning pre-trained language models (PLMs) with task-specific prompts has been a promising approach for text classification.

Few-Shot Text Classification Language Modelling +2

200

Paper
Code

Graph Policy Network for Transferable Active Learning on Graphs

1 code implementation • NeurIPS 2020 • Shengding Hu, Zheng Xiong, Meng Qu, Xingdi Yuan, Marc-Alexandre Côté, Zhiyuan Liu, Jian Tang

Graph neural networks (GNNs) have been attracting increasing popularity due to their simplicity and effectiveness in a variety of fields.

Active Learning

Paper
Code

KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion

1 code implementation • Findings (ACL) 2021 • Jie Zhou, Shengding Hu, Xin Lv, Cheng Yang, Zhiyuan Liu, Wei Xu, Jie Jiang, Juanzi Li, Maosong Sun

Based on the datasets, we propose novel tasks such as multi-hop knowledge abstraction (MKA), multi-hop knowledge concretization (MKC) and then design a comprehensive benchmark.

Knowledge Graphs Transfer Learning

Paper
Code

Transfer Active Learning For Graph Neural Networks

no code implementations • 25 Sep 2019 • Shengding Hu, Meng Qu, Zhiyuan Liu, Jian Tang

Moreover, we also studied how to learn a universal policy for labeling nodes on graphs with multiple training graphs and then transfer the learned policy to unseen graphs.

Active Learning Node Classification +1

Paper
Add Code

Graph Neural Networks: A Review of Methods and Applications

5 code implementations • 20 Dec 2018 • Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, LiFeng Wang, Changcheng Li, Maosong Sun

Lots of learning tasks require dealing with graph data which contains rich relation information among elements.

Graph Attention

15,521

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.