Computer Engineering | Open Access Articles | Digital Commons Network™

Data Mining Of Protein Databases, Christopher Assi Jul 2012

Data Mining Of Protein Databases, Christopher Assi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …

Go to article

The Censsis Web-Accessible Image Database System, Furong Yang, David Kaeli Apr 2012

The Censsis Web-Accessible Image Database System, Furong Yang, David Kaeli

David Kaeli

The Gordon-CenSSIS Web accessible Image Database System (CenSSIS-DB) is a scientific database that enables effective collaborative scientific data sharing and accelerates fundamental research. We describe a state-of-the-art system using the Oracle RDBMS and J2EE technologies to provide remote, Internet based data management. The system incorporates efficient submission and retrieval of images and metadata, indexing of metadata for efficient searching, and complex relational query capabilities.

Go to article

Provable De-Anonymization Of Large Datasets With Sparse Dimensions, Anupam Datta, Divya Sharma, Arunesh Sinha Apr 2012

Provable De-Anonymization Of Large Datasets With Sparse Dimensions, Anupam Datta, Divya Sharma, Arunesh Sinha

Research Collection School Of Computing and Information Systems

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-dataabout individuals, e.g., their preferences, movie ratings, or transactiondata. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm thatwas used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of thedatabase and the auxiliary information available to the adversary thatenable two classes of privacy attacks. In the first attack, the adversarysuccessfully identifies the individual about whom she possesses auxiliaryinformation (an isolation attack). In the second attack, the adversarylearns additional …

Go to article

Computer Engineering Commons^™

Full-Text Articles in Computer Engineering

Data Mining Of Protein Databases, Christopher Assi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The Censsis Web-Accessible Image Database System, Furong Yang, David Kaeli

David Kaeli

Provable De-Anonymization Of Large Datasets With Sparse Dimensions, Anupam Datta, Divya Sharma, Arunesh Sinha

Research Collection School Of Computing and Information Systems