1 code implementation • 26 Feb 2024 • Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo
SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.
no code implementations • 23 Apr 2022 • Huiyuan Yang, Tian Ding, Xiaojun Yuan
We then conduct an FL convergence analysis to connect the aggregation distortion and the FL convergence performance.
no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant
Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.
no code implementations • 4 Nov 2019 • Tian Ding, Dawei Li, Ruoyu Sun
More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).
no code implementations • 28 Dec 2018 • Dawei Li, Tian Ding, Ruoyu Sun
Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?
no code implementations • 28 Jul 2018 • Tian Ding, Xiaojun Yuan, Soung Chang Liew
In this work, we study the multiuser detection (MUD) problem for a grant-free massive-device multiple access (MaDMA) system, where a large number of single-antenna user devices transmit sporadic data to a multi-antenna base station (BS).