TaBERT

Introduced by Yin et al. in TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

TaBERT is a pretrained language model (LM) that jointly learns representations for natural language sentences and (semi-)structured tables. TaBERT is trained on a large corpus of 26 million tables and their English contexts.

In summary, TaBERT's process for learning representations for NL sentences is as follows: Given an utterance $u$ and a table $T$, TaBERT first creates a content snapshot of $T$. This snapshot consists of sampled rows that summarize the information in $T$ most relevant to the input utterance. The model then linearizes each row in the snapshot, concatenates each linearized row with the utterance, and uses the concatenated string as input to a Transformer model, which outputs row-wise encoding vectors of utterance tokens and cells. The encodings for all the rows in the snapshot are fed into a series of vertical self-attention layers, where a cell representation (or an utterance token representation) is computed by attending to vertically-aligned vectors of the same column (or the same NL token). Finally, representations for each utterance token and column are generated from a pooling layer.

Source: TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Semantic Parsing	3	42.86%
Question Answering	2	28.57%
Natural Language Understanding	1	14.29%
Text-To-SQL	1	14.29%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Deep Tabular Learning