Language Model Pre-Training with Sparse Latent Typing

23 Oct 2022  ·  Liliang Ren, Zixuan Zhang, Han Wang, Clare R. Voss, ChengXiang Zhai, Heng Ji ·

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

PDF Abstract

Datasets


Results from the Paper


Ranked #6 on Few-shot NER on Few-NERD (INTRA) (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Few-shot NER Few-NERD (INTER) BERT-SparseLT + CONTaiNER 5 way 1~2 shot 57.14 # 7
5 way 5~10 shot 66.17 # 7
10 way 1~2 shot 52.75 # 6
10 way 5~10 shot 62.43 # 7
Few-shot NER Few-NERD (INTRA) BERT-SparseLT+CONTainNER 5 way 1~2 shot 47.20 # 6
5 way 5~10 shot 59.67 # 6
10 way 1~2 shot 40.48 # 6
10 way 5~10 shot 53.04 # 6

Methods


No methods listed for this paper. Add relevant methods here