Activation Functions

# Kernel Activation Function

Introduced by Scardapane et al. in Kafnets: kernel-based non-parametric activation functions for neural networks

A Kernel Activation Function is a non-parametric activation function defined as a one-dimensional kernel approximator:

$$f(s) = \sum_{i=1}^D \alpha_i \kappa( s, d_i)$$

where:

1. The dictionary of the kernel elements $d_0, \ldots, d_D$ is fixed by sampling the $x$-axis with a uniform step around 0.
2. The user selects the kernel function (e.g., Gaussian, ReLU, Softplus) and the number of kernel elements $D$ as a hyper-parameter. A larger dictionary leads to more expressive activation functions and a larger number of trainable parameters.
3. The linear coefficients are adapted independently at every neuron via standard back-propagation.

In addition, the linear coefficients can be initialized using kernel ridge regression to behave similarly to a known function in the beginning of the optimization process.

#### Papers

Paper Code Results Date Stars