Deep Tabular Learning

TaBERT is a pretrained language model (LM) that jointly learns representations for natural language sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts.

In summary, TaBERT's process for learning representations for NL sentences is as follows: Given an utterance $u$ and a table $T$, TaBERT first creates a content snapshot of $T$. This snapshot consists of sampled rows that summarize the information in $T$ most relevant to the input utterance. The model then linearizes each row in the snapshot, concatenates each linearized row with the utterance, and uses the concatenated string as input to a Transformer model, which outputs row-wise encoding vectors of utterance tokens and cells. The encodings for all the rows in the snapshot are fed into a series of vertical self-attention layers, where a cell representation (or an utterance token representation) is computed by attending to vertically-aligned vectors of the same column (or the same NL token). Finally, representations for each utterance token and column are generated from a pooling layer.

Source: TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data


Paper Code Results Date Stars


Task Papers Share
Semantic Parsing 3 42.86%
Question Answering 2 28.57%
Natural Language Understanding 1 14.29%
Text-To-SQL 1 14.29%


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign