Improving NER's Performance with Massive financial corpus

31 Jul 2020  ·  Han Zhang ·

Training large deep neural networks needs massive high quality annotation data, but the time and labor costs are too expensive for small business. We start a company-name recognition task with a small scale and low quality training data, then using skills to enhanced model training speed and predicting performance with minimum labor cost. The methods we use involve pre-training a lite language model such as Albert-small or Electra-small in financial corpus, knowledge of distillation and multi-stage learning. The result is that we raised the recall rate by nearly 20 points and get 4 times as fast as BERT-CRF model.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here