Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 16 of 16

Full-Text Articles in Physical Sciences and Mathematics

A Remote Sensing And Machine Learning-Based Approach To Forecast The Onset Of Harmful Algal Bloom (Red Tides), Moein Izadi Apr 2022

A Remote Sensing And Machine Learning-Based Approach To Forecast The Onset Of Harmful Algal Bloom (Red Tides), Moein Izadi

Dissertations

In the last few decades, harmful algal blooms (HABs, also known as “red tides”) have become one of the most detrimental natural phenomena all around the world especially in Florida’s coastal areas due to local environmental factors and global warming in a larger scale. Karenia brevis produces toxins that have harmful effects on humans, fisheries, and ecosystems. In this study, I developed and compared the efficiency of state-of-the-art machine learning models (e.g., XGBoost, Random Forest, and Support Vector Machine) in predicting the occurrence of HABs. In the proposed models, the K. brevis abundance is used as the target, and 10 …


Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi Dec 2020

Hierarchical Aggregation Of Multidimensional Data For Efficient Data Mining, Safaa Khalil Alwajidi

Dissertations

Big data analysis is essential for many smart applications in areas such as connected healthcare, intelligent transportation, human activity recognition, environment, and climate change monitoring. Traditional data mining algorithms do not scale well to big data due to the enormous number of data points and the velocity of their generation. Mining and learning from big data need time and memory efficiency techniques, albeit the cost of possible loss in accuracy. This research focuses on the mining of big data using aggregated data as input. We developed a data structure that is to be used to aggregate data at multiple resolutions. …


Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou Aug 2020

Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou

Dissertations

In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.

The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and …


Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang Aug 2019

Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang

Dissertations

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


Classification Using Association Rules, Colin Kane Jan 2018

Classification Using Association Rules, Colin Kane

Dissertations

This research investigates the use of an unsupervised learning technique, association rules, to make class predictions. The use of association rules to make class predictions is a growing area of focus within data mining research. The research to date has focused predominately on balanced datasets or synthetized imbalanced datasets. There have been concerns raised that the algorithms using association rules to make classifications do not perform well on imbalanced datasets. This research comprehensively evaluates the accuracy of a number of association rule classifiers in predicting home loan sales in an Irish retail banking context. The experiments designed test three associative …


Clicking Into Mortgage Arrears: A Study Into Arrears Prediction With Clickstream Data, Gavin O'Brien Jan 2018

Clicking Into Mortgage Arrears: A Study Into Arrears Prediction With Clickstream Data, Gavin O'Brien

Dissertations

This research project investigates the predictive capability of clickstream data when used for the purpose of mortgage arrears prediction. With an ever growing number of people switching to digital channels to handle their daily banking requirements, there is a wealth of ever increasing online usage data, otherwise known as clickstream data. If leveraged correctly, this clickstream data can be a powerful data source for organisations as it provides detailed information about how their customers are interacting with their digital channels. Much of the current literature associated with clickstream data relates to organisations employing it within their customer relationship management mechanisms …


Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki Apr 2017

Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki

Dissertations

Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.

This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques …


Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron Apr 2017

Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron

Dissertations

Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference.

The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a …


Statistical Learning Methods For Mining Marketing And Biological Data, Jie Zhang Apr 2017

Statistical Learning Methods For Mining Marketing And Biological Data, Jie Zhang

Dissertations

Nowadays, the value of data has been broadly recognized and emphasized. More and more decisions are made based on data and analysis rather than solely on experience and intuition. With the fast development of networking, data storage, and data collection capacity, data have increased dramatically in industry, science and engineering domains, which brings both great opportunities and challenges. To take advantage of the data flood, new computational methods are in demand to process, analyze and understand these datasets.

This dissertation focuses on the development of statistical learning methods for online advertising and bioinformatics to model real world data with temporal …


Data Mining In Computational Proteomics And Genomics, Yang Song May 2015

Data Mining In Computational Proteomics And Genomics, Yang Song

Dissertations

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification.

The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a …


Enhancing Web Marketing By Using Ontology, Xuan Zhou May 2006

Enhancing Web Marketing By Using Ontology, Xuan Zhou

Dissertations

The existence of the Web has a major impact on people's life styles. Online shopping, online banking, email, instant messenger services, search engines and bulletin boards have gradually become parts of our daily life. All kinds of information can be found on the Web. Web marketing is one of the ways to make use of online information. By extracting demographic information and interest information from the Web, marketing knowledge can be augmented by applying data mining algorithms. Therefore, this knowledge which connects customers to products can be used for marketing purposes and for targeting existing and potential customers. The Web …


Text Mining With Exploitation Of User's Background Knowledge : Discovering Novel Association Rules From Text, Xin Chen Jan 2006

Text Mining With Exploitation Of User's Background Knowledge : Discovering Novel Association Rules From Text, Xin Chen

Dissertations

The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments.

This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two …


Pattern Discovery In Structural Databases With Applications To Bioinformatics, Sen Zhang Jan 2005

Pattern Discovery In Structural Databases With Applications To Bioinformatics, Sen Zhang

Dissertations

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this thesis, two new FSM techniques are proposed for finding patterns in unordered labeled trees. Such trees can be used to model evolutionary histories of different species, among others.

The first FSM technique finds cousin pairs in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our …


New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert May 2004

New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert

Dissertations

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardized, methods and frameworks must be developed to handle both structural and traditional data.

The BIG-AJAX framework has been developed for solving these problems through both data cleaning and data integration. This framework exploits declarative data cleaning and exploratory data mining …


Knowledge Discovery In Biological Databases : A Neural Network Approach, Qicheng Ma Aug 2000

Knowledge Discovery In Biological Databases : A Neural Network Approach, Qicheng Ma

Dissertations

Knowledge discovery, in databases, also known as data mining, is aimed to find significant information from a set of data. The knowledge to be mined from the dataset may refer to patterns, association rules, classification and clustering rules, and so forth. In this dissertation, we present a neural network approach to finding knowledge in biological databases. Specifically, we propose new methods to process biological sequences in two case studies: the classification of protein sequences and the prediction of E. Coli promoters in DNA sequences. Our proposed methods, based oil neural network architectures combine techniques ranging from Bayesian inference, coding theory, …