TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Native Language Identification	italki NLI	Tubasfs	Average F1	0.5807	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fewer-features-perform-well-at-native/native-language-identification-on-italki-nli)](https://paperswithcode.com/sota/native-language-identification-on-italki-nli?p=fewer-features-perform-well-at-native)`

Fewer features perform well at Native Language Identification task

WS 2017 · Taraka Rama, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin ·

This paper describes our results at the NLI shared task 2017. We participated in essays, speech, and fusion task that uses text, speech, and i-vectors for the task of identifying the native language of the given input. In the essay track, a linear SVM system using word bigrams and character 7-grams performed the best. In the speech track, an LDA classifier based only on i-vectors performed better than a combination system using text features from speech transcriptions and i-vectors. In the fusion task, we experimented with systems that used combination of i-vectors with higher order n-grams features, combination of i-vectors with word unigrams, a mean probability ensemble, and a stacked ensemble system. Our finding is that word unigrams in combination with i-vectors achieve higher score than systems trained with larger number of $n$-gram features. Our best-performing systems achieved F1-scores of 87.16{\%}, 83.33{\%} and 91.75{\%} on the essay track, the speech track and the fusion track respectively.

PDF Abstract