Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 4 of 4
Full-Text Articles in Physical Sciences and Mathematics
So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis
So What Are You Going To Do With That? The Promises And Pitfalls Of Massive Data Sets, Sigrid Anderson Cordell, Melissa Gomis
UNL Libraries: Faculty Publications
This article takes as its case study the challenge of data sets for text mining, sources that offer tremendous promise for digital humanities (DH) methodology but present specific challenges for humanities scholars. These text sets raise a range of issues: What skills do you train humanists to have? What is the library’s role in enabling and supporting use of those materials? How do you allocate staff? Who oversees sustainability and data management? By addressing these questions through a specific use case scenario, this article shows how these questions are central to mapping out future directions for a range of library …
Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi
Data Mining The Functional Characterizations Of Proteins To Predict Their Cancer-Relatedness, Peter Revesz, Christopher Assi
School of Computing: Faculty Publications
This paper considers two types of protein data. First, data about protein function described in a number of ways, such as, GO terms and PFAM families. Second, data about whether individual proteins are experimentally associated with cancer by an anomalous elevation or lowering of their expressions within cancerous cells. We combine these two types of protein data and test whether the first type of data, that is, the functional descriptors, can predict the second type of data, that is, cancer-relatedness. By using data mining and machine learning, we derive a classifier algorithm that using only GO term and PFAM family …
Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi
Data Mining Of Pancreatic Cancer Protein Databases, Peter Revesz, Christopher Assi
CSE Conference and Workshop Papers
Data mining of protein databases poses special challenges because many protein databases are non- relational whereas most data mining and machine learning algorithms assume the input data to be a type of rela- tional database that is also representable as an ARFF file. We developed a method to restructure protein databases so that they become amenable for various data mining and machine learning tools. Our restructuring method en- abled us to apply both decision tree and support vector machine classifiers to a pancreatic protein database. The SVM classifier that used both GO term and PFAM families to characterize proteins gave …
Data Mining Of Protein Databases, Christopher Assi
Data Mining Of Protein Databases, Christopher Assi
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …