Naslov (eng)

N-gram analysis of COG categorized protein sequences

Autor

Mitić, Nenad
Marovac, Ulfeta

Opis (eng)

Abstract: The classification of proteins categorized in the Cluster of Orthologous Groups (COGs) is important for better understanding of biological processes, as well as for various pathological conditions in human and other organisms. In this paper, a model for classification of proteins in the COG categories based on characteristic amino acid n-grams is proposed. A novel method, based on Boolean algebra, for extracting n-grams which characterize proteins belonging to a certain COG category is presented. The presented method significantly reduces the number of processed n-grams, which implies the reduction of required storage space and processing time. The obtained results show that the proteins of a certain COG category contain n-grams which satisfy specific patterns; such n-grams are unique, related to different COG categories. The model for classification based on the proposed method assigns a correct COG category to a protein with the confidence of 96%. data mining, Boolean algebra, classification, proteins categories

Jezik

engleski

Datum

2015

Licenca

© All rights reserved

Deo kolekcije (1)

o:28516 Radovi nastavnika i saradnika Državnog univerziteta u Novom Pazaru