Search Results for author: Xiaozhe Ren

Found 15 papers, 7 papers with code

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

10 code implementations • 31 Aug 2019 • Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu

The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora.

named-entity-recognition Named Entity Recognition +6

11,406

Paper
Code

CAME: Confidence-guided Adaptive Memory Efficient Optimization

2 code implementations • 5 Jul 2023 • Yang Luo, Xiaozhe Ren, Zangwei Zheng, Zhuo Jiang, Xin Jiang, Yang You

Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models.

2,957

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

343

Paper
Code

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

4 code implementations • 26 Apr 2021 • Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li, Ziyan Gong, Yifan Yao, Xinjing Huang, Jun Wang, Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang, Jin Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang Peng, Fangqing Jiang, Han Zhang, Lingfeng Deng, Yehong Zhang, Zhe Lin, Chao Zhang, Shaojie Zhang, Mingyue Guo, Shanzhi Gu, Gaojun Fan, YaoWei Wang, Xuefeng Jin, Qun Liu, Yonghong Tian

To enhance the generalization ability of PanGu-$\alpha$, we collect 1. 1TB high-quality Chinese data from a wide range of domains to pretrain the model.

Ranked #1 on Reading Comprehension (One-Shot) on DuReader

Cloze (multi-choices) (Few-Shot) Cloze (multi-choices) (One-Shot) +19

219

Paper
Code

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

1 code implementation • NeurIPS 2023 • Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches.

Quantization Scheduling

Paper
Code

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

1 code implementation • Findings (EMNLP) 2021 • Chenhe Dong, Guangrun Wang, Hang Xu, Jiefeng Peng, Xiaozhe Ren, Xiaodan Liang

In this paper, we have a critical insight that improving the feed-forward network (FFN) in BERT has a higher gain than improving the multi-head attention (MHA) since the computational cost of FFN is 2$\sim$3 times larger than MHA.

Data Augmentation Knowledge Distillation

Paper
Code

SparseBERT: Rethinking the Importance Analysis in Self-attention

1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok

A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.

Paper
Code

AutoBERT-Zero: Evolving BERT Backbone from Scratch

no code implementations • 15 Jul 2021 • Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks.

Ranked #10 on Semantic Textual Similarity on MRPC

Inductive Bias Language Modelling +3

Paper
Add Code

NumGPT: Improving Numeracy Ability of Generative Pre-trained Models

no code implementations • 7 Sep 2021 • Zhihua Jin, Xin Jiang, Xingbo Wang, Qun Liu, Yong Wang, Xiaozhe Ren, Huamin Qu

However, those models do not consider the numerical properties of numbers and cannot perform robustly on numerical reasoning tasks (e. g., math word problems and measurement estimation).

Math

Paper
Add Code

Large-Scale Deep Learning Optimizations: A Comprehensive Survey

no code implementations • 1 Nov 2021 • Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You

Deep learning have achieved promising results on a wide spectrum of AI applications.

Paper
Add Code

One Student Knows All Experts Know: From Sparse to Dense

no code implementations • 26 Jan 2022 • Fuzhao Xue, Xiaoxin He, Xiaozhe Ren, Yuxuan Lou, Yang You

Mixture-of-experts (MoE) is a powerful sparse architecture including multiple experts.

Knowledge Distillation

Paper
Add Code

A Study on Transformer Configuration and Training Objective

no code implementations • 21 May 2022 • Fuzhao Xue, Jianghai Chen, Aixin Sun, Xiaozhe Ren, Zangwei Zheng, Xiaoxin He, Yongming Chen, Xin Jiang, Yang You

In this paper, we revisit these conventional configurations.

Ranked #103 on Image Classification on ImageNet

Image Classification

Paper
Add Code

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

no code implementations • 20 Mar 2023 • Xiaozhe Ren, Pingyi Zhou, Xinfan Meng, Xinjing Huang, Yadao Wang, Weichao Wang, Pengfei Li, Xiaoda Zhang, Alexander Podolskiy, Grigory Arshinov, Andrey Bout, Irina Piontkovskaya, Jiansheng Wei, Xin Jiang, Teng Su, Qun Liu, Jun Yao

In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1. 085T parameters named PanGu-{\Sigma}.

Code Generation Language Modelling +4

Paper
Add Code

EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge

no code implementations • 18 Nov 2023 • Bufang Yang, Lixing He, Neiwen Ling, Zhenyu Yan, Guoliang Xing, Xian Shuai, Xiaozhe Ren, Xin Jiang

We implement EdgeFM using two FMs on two edge platforms.

Open Set Learning

Paper
Add Code

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

no code implementations • 7 Mar 2024 • Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.

4k Image Captioning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.