To improve efficiency while maintaining a high accuracy, we propose a new architecture, DoT, a double transformer model, that decomposes the problem into two sub-tasks: A shallow pruning transformer that selects the top-K tokens, followed by a deep task-specific transformer that takes as input those K tokens.
Recent advances in open-domain QA have led to strong models based on dense retrieval, but only focused on retrieving textual passages.
To be able to use long examples as input of BERT models, we evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency at a moderate drop in accuracy.
Ranked #4 on Table-based Fact Verification on TabFact
For example, applying constraints a posteriori can result in incomplete recommendations or low-quality results for the tail of the distribution (i. e., less popular items).
Our method imposes a particular decomposition of the nonsymmetric kernel that enables such tractable learning algorithms, which we analyze both theoretically and experimentally.