Extraction of Verbal Synsets and Relations for FarsNet

GWC 2018  ·  Fatemeh Khalghani, Mehrnoush Shamsfard ·

WordNet or ontology development for resource-poor languages like Persian, requires composition of several strategies and employment of appropriate heuristics. Lexical and linguistic structured resources are limited for Persian and there is a lot of diversity and structural and syntagmatic complexities. This paper proposes a system for extraction of verbal synsets and relations to extend FarsNet (Persian WordNet). The proposed method extracts verbal words and concepts using noun and adjective words and synsets. It exploits the data from digital lexicon glossaries, which leads to the identification of 6890 proper verbal words and 2790 verbal synsets, with 91% and 67% precision respectively. The proposed system also extracts relations such as semantic roles of verbal arguments (instrument, location, agent, and patient) and also “related-to” (unlabeled) relations and co-occurrence among verbs and other concepts. For this purpose, a combination of linguistic approaches such as morphological analysis of words, semantic analysis, and use of key phrases and syntactic and semantic patterns, corpus-based approach, statistical techniques and co-occurrence analysis have been utilized. The presented strategy extracts 5600 proper relations between the existing concepts in FarsNet 2.0 with 76% precision.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here