Search Results for author: Min Ye

Found 8 papers, 3 papers with code

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

1 code implementation13 Aug 2024 Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang

In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.

Language Modelling Transfer Learning

Improving the List Decoding Version of the Cyclically Equivariant Neural Decoder

1 code implementation15 Jun 2021 Xiangyu Chen, Min Ye

In the same paper, a list decoding procedure was also introduced for two widely used classes of cyclic codes -- BCH codes and punctured Reed-Muller (RM) codes.

Decoder

Cyclically Equivariant Neural Decoders for Cyclic Codes

1 code implementation12 May 2021 Xiangyu Chen, Min Ye

Finally, we propose a list decoding procedure that can significantly reduce the decoding error probability for BCH codes and punctured RM codes.

Decoder

Exact recovery and sharp thresholds of Stochastic Ising Block Model

no code implementations13 Apr 2020 Min Ye

We show that when $m\ge m^\ast$, one can recover the clusters from $m$ samples in $O(n)$ time as the number of vertices $n$ goes to infinity.

Stochastic Block Model

Optimal locally private estimation under $\ell_p$ loss for $1\le p\le 2$

no code implementations16 Oct 2018 Min Ye, Alexander Barg

In this paper, we sharpen this result by showing asymptotic optimality of the proposed scheme under the $\ell_p^p$ loss for all $1\le p\le 2.$ More precisely, we show that for any $p\in[1, 2]$ and any $k$ and $\epsilon,$ the ratio between the worst-case $\ell_p^p$ estimation loss of our scheme and the optimal value approaches $1$ as the number of samples tends to infinity.

valid

Communication-Computation Efficient Gradient Coding

no code implementations ICML 2018 Min Ye, Emmanuel Abbe

This paper develops coding techniques to reduce the running time of distributed learning tasks.

Asymptotically optimal private estimation under mean square loss

no code implementations31 Jul 2017 Min Ye, Alexander Barg

In other words, for a large number of samples the worst-case estimation loss of our scheme was shown to differ from the optimal value by at most a constant factor.

Optimal Schemes for Discrete Distribution Estimation under Locally Differential Privacy

no code implementations2 Feb 2017 Min Ye, Alexander Barg

For a given $\epsilon,$ we consider the problem of constructing optimal privatization schemes with $\epsilon$-privacy level, i. e., schemes that minimize the expected estimation loss for the worst-case distribution.

Cannot find the paper you are looking for? You can Submit a new open access paper.