1 code implementation • 11 Jun 2025 • Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu
Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations.
1 code implementation • 24 Jan 2025 • Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin
In this work, we introduce a new benchmark designed to assess the critique capabilities of LLMs.
no code implementations • 10 Jan 2025 • Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin
Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans.
no code implementations • 2 Dec 2024 • Linxin Yang, Bingheng Li, Tian Ding, Jianghua Wu, Akang Wang, Yuyi Wang, Jiliang Tang, Ruoyu Sun, Xiaodong Luo
Unlike the standard learning-to-optimize framework that requires optimization solutions generated by solvers, our unsupervised method adjusts the network weights directly from the evaluation of the primal-dual gap.
no code implementations • 30 Jul 2024 • Yupeng Chen, Senmiao Wang, Yushun Zhang, Zhihang Lin, Haozhe Zhang, Weijian Sun, Tian Ding, Ruoyu Sun
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks.
1 code implementation • 24 Jun 2024 • Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun
Adam-mini reduces memory by cutting down the learning rate resources in Adam (i. e., $1/\sqrt{v}$).
1 code implementation • 4 Jun 2024 • Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun
In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems.
2 code implementations • 26 Feb 2024 • Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo
SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.
no code implementations • 23 Apr 2022 • Huiyuan Yang, Tian Ding, Xiaojun Yuan
We then conduct an FL convergence analysis to connect the aggregation distortion and the FL convergence performance.
no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant
Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.
no code implementations • 4 Nov 2019 • Tian Ding, Dawei Li, Ruoyu Sun
More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).
no code implementations • 28 Dec 2018 • Dawei Li, Tian Ding, Ruoyu Sun
Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?
no code implementations • 28 Jul 2018 • Tian Ding, Xiaojun Yuan, Soung Chang Liew
In this work, we study the multiuser detection (MUD) problem for a grant-free massive-device multiple access (MaDMA) system, where a large number of single-antenna user devices transmit sporadic data to a multi-antenna base station (BS).