Deep Tabular Learning

TabNet

Introduced by Arik et al. in TabNet: Attentive Interpretable Tabular Learning

TabNet is a deep tabular data learning architecture that uses sequential attention to choose which features to reason from at each decision step.

The TabNet encoder is composed of a feature transformer, an attentive transformer and feature masking. A split block divides the processed representation to be used by the attentive transformer of the subsequent step as well as for the overall output. For each step, the feature selection mask provides interpretable information about the model’s functionality, and the masks can be aggregated to obtain global feature important attribution. The TabNet decoder is composed of a feature transformer block at each step.

In the feature transformer block, a 4-layer network is used, where 2 are shared across all decision steps and 2 are decision step-dependent. Each layer is composed of a fully-connected (FC) layer, BN and GLU nonlinearity. An attentive transformer block example – a single layer mapping is modulated with a prior scale information which aggregates how much each feature has been used before the current decision step. sparsemax is used for normalization of the coefficients, resulting in sparse selection of the salient features.

Source: TabNet: Attentive Interpretable Tabular Learning

Papers


Paper Code Results Date Stars

Tasks


Categories