no code implementations • 16 Feb 2025 • Shange Tang, Yuanhao Wang, Chi Jin
Elo rating, widely used for skill assessment across diverse domains ranging from competitive games to large language models, is often understood as an incremental update algorithm for estimating a stationary Bradley-Terry (BT) model.
1 code implementation • 11 Feb 2025 • Yong Lin, Shange Tang, Bohan Lyu, Jiayun Wu, Hongzhou Lin, Kaiyu Yang, Jia Li, Mengzhou Xia, Danqi Chen, Sanjeev Arora, Chi Jin
On the miniF2F benchmark, it achieves a 57. 6% success rate (Pass@32), exceeding the previous best open-source model by 7. 6%.
no code implementations • 10 Feb 2025 • Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, Yue Wu, Ming Yin, Shange Tang, Yangsibo Huang, Chi Jin, Xinyun Chen, Chiyuan Zhang, Mengdi Wang
This issue is amplified when using original problems for in-context learning.
no code implementations • 19 Dec 2024 • Shange Tang, Jiayun Wu, Jianqing Fan, Chi Jin
Benign overfitting refers to the phenomenon where an over-parameterized model fits the training data perfectly, including noise in the data, but still generalizes well to the unseen test data.
no code implementations • 22 Aug 2024 • Shange Tang, Soham Jana, Jianqing Fan
This paper studies a factor modeling-based approach for clustering high-dimensional data generated from a mixture of strongly correlated variables.
no code implementations • 27 Nov 2023 • Jiawei Ge, Shange Tang, Jianqing Fan, Cong Ma, Chi Jin
This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting.
no code implementations • 2 Mar 2023 • Jiawei Ge, Shange Tang, Jianqing Fan, Chi Jin
Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems.
no code implementations • 19 Jul 2022 • Yuzheng Hu, Tianle Cai, Jinyong Shan, Shange Tang, Chaochao Cai, Ethan Song, Bo Li, Dawn Song
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared.
no code implementations • 20 Dec 2019 • Yuzheng Hu, Licong Lin, Shange Tang
To the best of our knowledge, this is the first paper that seriously considers the necessity of square root among all adaptive methods.