Search Results for author: Tian Ding

Found 6 papers, 1 papers with code

Why Transformers Need Adam: A Hessian Perspective

1 code implementation • 26 Feb 2024 • Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Paper
Code

Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization

no code implementations • 23 Apr 2022 • Huiyuan Yang, Tian Ding, Xiaojun Yuan

We then conduct an FL convergence analysis to connect the aggregation distortion and the FL convergence performance.

Federated Learning Quantization

Paper
Add Code

The Global Landscape of Neural Networks: An Overview

no code implementations • 2 Jul 2020 • Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

Paper
Add Code

Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

no code implementations • 4 Nov 2019 • Tian Ding, Dawei Li, Ruoyu Sun

More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).

Paper
Add Code

On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

no code implementations • 28 Dec 2018 • Dawei Li, Tian Ding, Ruoyu Sun

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?

Paper
Add Code

Sparsity Learning Based Multiuser Detection in Grant-Free Massive-Device Multiple Access

no code implementations • 28 Jul 2018 • Tian Ding, Xiaojun Yuan, Soung Chang Liew

In this work, we study the multiuser detection (MUD) problem for a grant-free massive-device multiple access (MaDMA) system, where a large number of single-antenna user devices transmit sporadic data to a multi-antenna base station (BS).

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.