Computer Sciences | Open Access Articles | Digital Commons Network™

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini Oct 2012

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini

Doctoral Dissertations

Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …

Go to article

Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin Oct 2012

Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin

Computational Modeling & Simulation Engineering Theses & Dissertations

Initialization is one of the most important processes for obtaining successful results from a simulation. However, initialization is a challenge when 1) a simulation requires hundreds or even thousands of input parameters or 2) re-initializing the simulation due to different initial conditions or runtime errors. These challenges lead to the modeler spending more time initializing a simulation and may lead to errors due to poor input data.

This thesis proposes two semi-automatic simulation initialization approaches that provide initialization using data mining from structured and unstructured data formats from local and web data sources. First, the System Initialization with Retrieval (SIR) …

Go to article

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson Aug 2012

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson

Theses and Dissertations

Streams in small watersheds are often known to exhibit diel fluctuations, in which streamflow oscillates on a 24-hour cycle. Streamflow diel fluctuations, which we investigate in this study, are an informative indicator of environmental processes. However, in Environmental Data sets, as well as many others, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may include a range of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they …

Go to article

Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu May 2012

Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu

Theses

RNA (Ribonucleic acid) Motifs are tertiary structures that play an important role in the folding mechanism of the RNA molecule. The overall function of a RNA Motif depends on its specific bp (base pairs) sequence that constitutes the secondary structure. Data mining is a novel method in both discovering potential tertiary structures within DNA (Deoxyribonucleic acid), RNA, and protein molecules and storing the information in databases. The RNA Motif of interest is the tetraloop-tetraloop receptor, which is composed of a highly conserved 11 nt (nucleotide) sequence and a tetraloop with the generic form of GNRA (where N = any base …

Go to article

Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor Mar 2012

Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor

Theses and Dissertations

Software development is a process fraught with unpredictability, in part because software is created by people. Human interactions add complexity to development processes, and collaborative development can become a liability if not properly understood and managed. Recent years have seen an increase in the use of data mining techniques on publicly-available repository data with the goal of improving software development processes, and by extension, software quality. In this thesis, we introduce the concept of author entropy as a metric for quantifying interaction and collaboration (both within individual files and across projects), present results from two empirical observational studies of open-source …

Go to article

Medical Data Analysis Method For Epilepsy, Ameen Eetemadi Jan 2012

Medical Data Analysis Method For Epilepsy, Ameen Eetemadi

Wayne State University Theses

Applying data mining techniques on medical databases which contain un-structured and semi-structured data is a challenging task. It is not only due to the complexity of such databases but also due to the characteristics of the medical domain. This thesis describes how multiple layers of data mining techniques have been applied to a Human Brain Image Database system. It starts with data preparation which paves the way for conventional data analysis techniques to be applied to the data. A similarity based patient retrieval tool has been designed and developed to assist in treatment planning and outcome estimation for epileptic patients. …

Go to article

Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han Jan 2012

Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han

Theses and Dissertations--Computer Science

This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions.

It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important …

Go to article

Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu Jan 2012

Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu

Open Access Theses & Dissertations

Nowadays, data mining is more widely used than ever before; not only by the academic area, but also in the industry and business area. Apart from execution of business processes, the creation of knowledge base and its utilization for the benefit of the organization is becoming a strategy tool to compete. Despite of having ever growing data bases, the problem is that the finance company fails to fully capitalize the true benefits which can be gained from this great wealth of information. The data mining technology instead of classic statistical analysis is developed to help the people to discover the …

Go to article

Computer Sciences Commons^™

Full-Text Articles in Computer Sciences

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini

Doctoral Dissertations

Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin

Computational Modeling & Simulation Engineering Theses & Dissertations

A Confidence-Prioritization Approach To Data Processing In Noisy Data Sets And Resulting Estimation Models For Predicting Streamflow Diel Signals In The Pacific Northwest, Nathaniel Lee Gustafson

Theses and Dissertations

Data Mining Of Tetraloop-Tetraloop Receptors In Rna Xml Files, Sinan Ramazanoglu

Theses

Analysis And Characterization Of Author Contribution Patterns In Open Source Software Development, Quinn Carlson Taylor

Theses and Dissertations

Medical Data Analysis Method For Epilepsy, Ameen Eetemadi

Wayne State University Theses

Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han

Theses and Dissertations--Computer Science

Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu

Open Access Theses & Dissertations