Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 3 of 3

Full-Text Articles in Computer Sciences

Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi Dec 2012

Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi

CSE Conference and Workshop Papers

Data mining of protein databases poses special challenges because many protein databases are non- relational whereas most data mining and machine learning algorithms assume the input data to be a type of rela- tional database that is also representable as an ARFF file. We developed a method to restructure protein databases so that they become amenable for various data mining and machine learning tools. Our restructuring method en- abled us to apply both decision tree and support vector machine classifiers to a pancreatic protein database. The SVM classifier that used both GO term and PFAM families to characterize proteins gave …


Data Mining Of Protein Databases, Christopher Assi Jul 2012

Data Mining Of Protein Databases, Christopher Assi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …


Redistricting Using Constrained Polygonal Clustering, Deepti Joshi, Leen-Kiat Soh, Ashok Samal Jan 2012

Redistricting Using Constrained Polygonal Clustering, Deepti Joshi, Leen-Kiat Soh, Ashok Samal

School of Computing: Faculty Publications

Redistricting is the process of dividing a geographic area consisting of spatial units—often represented as spatial polygons—into smaller districts that satisfy some properties. It can therefore be formulated as a set partitioning problem where the objective is to cluster the set of spatial polygons into groups such that a value function is maximized [1]. Widely used algorithms developed for point-based data sets are not readily applicable because polygons introduce the concepts of spatial contiguity and other topological properties that cannot be captured by representing polygons as points. Furthermore, when clustering polygons, constraints such as spatial contiguity and unit distributedness should …