# An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis

Yuandong Tian

In this paper, we explore theoretical properties of training a two-layered ReLU network $g(\mathbf{x}; \mathbf{w}) = \sum_{j=1}^K \sigma(\mathbf{w}_j^T\mathbf{x})$ with centered $d$-dimensional spherical Gaussian input $\mathbf{x}$ ($\sigma$=ReLU). We train our network with gradient descent on $\mathbf{w}$ to mimic the output of a teacher network with the same architecture and fixed parameters $\mathbf{w}^*$... (read more)

PDF Abstract ICML 2017 PDF ICML 2017 Abstract