Using Fisher's Exact Test to Evaluate Association Measures for N-grams

29 Apr 2021  ·  Yves Bestgen ·

To determine whether some often-used lexical association measures assign high scores to n-grams that chance could have produced as frequently as observed, we used an extension of Fisher's exact test to sequences longer than two words to analyse a corpus of four million words. The results, based on the precision-recall curve and a new index called chance-corrected average precision, show that, as expected, simple-ll is extremely effective. They also show, however, that MI3 is more efficient than the other hypothesis tests-based measures and even reaches a performance level almost equal to simple-ll for 3-grams. It is additionally observed that some measures are more efficient for 3-grams than for 2-grams, while others stagnate.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here