Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Electrical and Computer Engineering

TÜBİTAK

Journal

2017

Bioinformatics

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Using Latent Semantic Analysis For Automated Keyword Extraction From Large Document Corpora, Tuğba Önal Süzek Jan 2017

Using Latent Semantic Analysis For Automated Keyword Extraction From Large Document Corpora, Tuğba Önal Süzek

Turkish Journal of Electrical Engineering and Computer Sciences

In this study, we describe a keyword extraction technique that uses latent semantic analysis (LSA) to identify semantically important single topic words or keywords. We compare our method against two other automated keyword extractors, Tf-idf (term frequency-inverse document frequency) and Metamap, using human-annotated keywords as a reference. Our results suggest that the LSA-based keyword extraction method performs comparably to the other techniques. Therefore, in an incremental update setting, the LSA-based keyword extraction method can be preferably used to extract keywords from text descriptions from big data when compared to existing keyword extraction methods.


Protein Fold Classification With Grow-And-Learn Network, Özlem Polat, Zümray Dokur Jan 2017

Protein Fold Classification With Grow-And-Learn Network, Özlem Polat, Zümray Dokur

Turkish Journal of Electrical Engineering and Computer Sciences

Protein fold classification is an important subject in computational biology and a compelling work from the point of machine learning. To deal with such a challenging problem, in this study, we propose a solution method for the classification of protein folds using Grow-and-Learn (GAL) neural network together with one-versus-others (OvO) method. To classify the most common 27 protein folds, 125 dimensional data, constituted by the physicochemical properties of amino acids, are used. The study is conducted on a database including 694 proteins: 311 of these proteins are used for training and 383 of them for testing. Overall, the classification system …