Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Bayesian Test Analytics For Document Collections, Daniel David Walker Nov 2012

Bayesian Test Analytics For Document Collections, Daniel David Walker

Theses and Dissertations

Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved …


Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini Oct 2012

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini

Doctoral Dissertations

Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …


Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman Apr 2012

Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman

Theses and Dissertations

The dissertation deals with clustering algorithms and transforming regression prob-lems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learn-ing environment for solving regression problems as classification tasks by using support vector machines (SVMs). An extension to the most popular unsupervised clustering meth-od, k-means algorithm, is proposed, dubbed k-means2 (k-means squared) algorithm, appli-cable to ultra large datasets. The main idea is based on using a small portion of the dataset in the first stage of the clustering. Thus, the centers of such a smaller …


Detecting Surface Oil Using Unsupervised Learning Techniques On Modis Satellite Data, Joshua Kidd Mar 2012

Detecting Surface Oil Using Unsupervised Learning Techniques On Modis Satellite Data, Joshua Kidd

USF Tampa Graduate Theses and Dissertations

The release of crude oil or other petroleum based products into marine habitats can have a devastating impact on the environment as well as the local economies that rely on these waters for commercial fishing and tourism. The Deepwater Horizon catastrophe that started on April 20th 2010 leaked an estimated 4.4 million barrels of crude oil into the Gulf of Mexico over a 3 month period threatening thousands of species and crippling the gulf coast. The National Oceanic and Atmospheric Administration (NOAA) used several satellite remote sensing technologies to manually track and predict the extent and location of oil on …


Approximate String Matching Methods For Duplicate Detection And Clustering Tasks, Oleksandr Rudniy Jan 2012

Approximate String Matching Methods For Duplicate Detection And Clustering Tasks, Oleksandr Rudniy

Dissertations

Approximate string matching methods are utilized by a vast number of duplicate detection and clustering applications in various knowledge domains. The application area is expected to grow due to the recent significant increase in the amount of digital data and knowledge sources. Despite the large number of existing string similarity metrics, there is a need for more precise approximate string matching methods to improve the efficiency of computer-driven data processing, thus decreasing labor-intensive human involvement.

This work introduces a family of novel string similarity methods, which outperform a number of effective well-known and widely used string similarity functions. The new …


Semantic Preserving Text Tepresentation And Its Applications In Text Clustering, Michael Howard Jan 2012

Semantic Preserving Text Tepresentation And Its Applications In Text Clustering, Michael Howard

Masters Theses

Text mining using the vector space representation has proven to be an valuable tool for classification, prediction, information retrieval and extraction. The nature of text data presents several issues to these tasks, including large dimension and the existence of special polysemous and synonymous words. A variety of techniques have been devised to overcome these shortcomings, including feature selection and word sense disambiguation. Privacy preserving data mining is also an area of emerging interest. Existing techniques for privacy preserving data mining require the use of secure computation protocols, which often incur a greatly increased computational cost. In this paper, a generalization-based …


Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han Jan 2012

Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han

Theses and Dissertations--Computer Science

This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions.

It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important …