Naslov (eng)

Creating Domain Dictionaries for Serbian Language

Autor

Ljajić, Adela
Marovac, Ulfeta
Avdić, Aldina
Kajan, Ejub

Opis (eng)

Abstract: Automatically created thesauruses are used in order to improve methods for clustering, mining and determining the sentiments of some specific data corpus. There are different methods for the automatic discovering of similar words. Some of them are based on text corpora and mathematical similarity measures, while others use graphs and monolingual dictionaries. Serbian language is the richer than the English, by vocabulary and grammatical issues. Known methods for automatic thesaurus generation may neglect some of these specific issues. This paper deals with a method for automatic generation of a thesaurus from the repositories of documents in the Serbian language based on mathematical methods such as chi-square test, cosine similarity and Jaccard similarity coefficient. The proposed method can be applied either to normalized or non-normalized documents.

Jezik

engleski

Datum

2016

Licenca

© All rights reserved

Deo kolekcije (1)

o:28516 Radovi nastavnika i saradnika Državnog univerziteta u Novom Pazaru