Search Results for author: Xucheng Ye

Found 6 papers, 2 papers with code

KwaiYiiMath: Technical Report

no code implementations11 Oct 2023 Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, ShengNan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning.

Ranked #87 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +1

FedSkel: Efficient Federated Learning on Heterogeneous Systems with Skeleton Gradients Update

1 code implementation20 Aug 2021 Junyu Luo, Jianlei Yang, Xucheng Ye, Xin Guo, Weisheng Zhao

Federated learning aims to protect users' privacy while performing data analysis from different participants.

Federated Learning

S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

2 code implementations15 Jun 2021 Jianlei Yang, Wenzhi Fu, Xingzhou Cheng, Xucheng Ye, Pengcheng Dai, Weisheng Zhao

Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks.

RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models

no code implementations7 Jun 2021 Xin Guo, Jianlei Yang, Haoyi Zhou, Xucheng Ye, JianXin Li

In order to overcome these security problems, RoSearch is proposed as a comprehensive framework to search the student models with better adversarial robustness when performing knowledge distillation.

Adversarial Robustness Knowledge Distillation +1

Accelerating CNN Training by Pruning Activation Gradients

no code implementations ECCV 2020 Xucheng Ye, Pengcheng Dai, Junyu Luo, Xin Guo, Yingjie Qi, Jianlei Yang, Yiran Chen

Sparsification is an efficient approach to accelerate CNN inference, but it is challenging to take advantage of sparsity in training procedure because the involved gradients are dynamically changed.

Cannot find the paper you are looking for? You can Submit a new open access paper.