Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Improving Xrd Analysis With Machine Learning, Rachel E. Drapeau Aug 2023

Improving Xrd Analysis With Machine Learning, Rachel E. Drapeau

Theses and Dissertations

X-ray diffraction analysis (XRD) is an inexpensive method to quantify the relative proportions of mineral phases in a rock or soil sample. However, the analytical software available for XRD requires extensive user input to choose phases to include in the analysis. Consequently, analysis accuracy depends greatly on the experience of the analyst, especially as the number of phases in a sample increases (Raven & Self, 2017; Omotoso, 2006). The purpose of this project is to test whether incorporating machine learning methods into XRD software can improve the accuracy of analyses by assisting in the phase-picking process. In order to provide …


Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch Dec 2021

Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch

Theses and Dissertations

Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition.

A variety of machine learning models combined with different features and text processingare tested against training data that mentions …


Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern Aug 2021

Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern

Theses and Dissertations

Classification and clustering of music genres has become an increasingly prevalent focusin recent years, prompting a push for research into relevant algorithms. The most successful algorithms have typically applied the Naive Bayes or k-Nearest Neighbors algorithms, or used Neural Networks to perform classification. This thesis seeks to investigate the use of unsupervised clustering algorithms such as K-Means or Hierarchical clustering, and establish their usefulness in comparison to or conjunction with established methods.


Bayesian Nonparametric Model For Functional Data Analysis, Tahmidul Islam Apr 2021

Bayesian Nonparametric Model For Functional Data Analysis, Tahmidul Islam

Theses and Dissertations

Functional data analysis (FDA) experienced a burst of growth after Ramsay and Silverman published their textbook in 1997. Functional data analysis interests researchers because of the challenges it adds to well-established multivariate analysis. Unlike finite dimensional random vectors, we visualize infinite dimensional random functions; for example, curves, images, brain scans, etc. A vast amount of literature have been dedicated to developing models for functional data. The ideas are mostly based on basis function representations and kernel-based nonparametric methods. In this dissertation, we propose a Bayesian treatment of nonparametric functional data analysis by introducing a Gaussian process (GP) over the space …


A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley Jul 2020

A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley

Theses and Dissertations

According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …


Multi-Label Classification Models For Heterogeneous Data: An Ensemble-Based Approach., Jose Maria Moyano Murillo Jan 2020

Multi-Label Classification Models For Heterogeneous Data: An Ensemble-Based Approach., Jose Maria Moyano Murillo

Theses and Dissertations

In recent years, the multi-label classification gained attention of the scientific community given its ability to solve real-world problems where each instance of the dataset may be associated with several class labels simultaneously, such as multimedia categorization or medical problems.

The first objective of this dissertation is to perform a thorough review of the state-of-the-art ensembles of multi-label classifiers (EMLCs). Its aim is twofold: 1) study state-of-the-art ensembles of multi-label classifiers and categorize them proposing a novel taxonomy; and 2) perform an experimental study to give some tips and guidelines to select the method that perform the best according to …


The Application Of Synthetic Signals For Ecg Beat Classification, Elliot Morgan Brown Sep 2019

The Application Of Synthetic Signals For Ecg Beat Classification, Elliot Morgan Brown

Theses and Dissertations

A brief overview of electrocardiogram (ECG) properties and the characteristics of various cardiac conditions is given. Two different models are used to generate synthetic ECG signals. Domain knowledge is used to create synthetic examples of 16 different heart beat types with these models. Other techniques for synthesizing ECG signals are explored. Various machine learning models with different combinations of real and synthetic data are used to classify individual heart beats. The performance of the different methods and models are compared, and synthetic data is shown to be useful in beat classification.


Text Classification Of Installation Support Contract Topic Models For Category Management, William C. Sevier Mar 2018

Text Classification Of Installation Support Contract Topic Models For Category Management, William C. Sevier

Theses and Dissertations

Air Force Installation Contracting Agency manages nearly 18 percent of total Air Force spend, equating to approximately 57 billion dollars. To improve strategic sourcing, the organization is beginning to categorize installation-support spend and assign accountable portfolio managers to respective spend categories. A critical task in this new strategic environment includes the appropriate categorization of Air Force contracts into newly created, manageable spend categories. It has been recognized that current composite categories have the opportunity to be further distinguished into sub-categories leveraging text analytics on the contract descriptions. Furthermore, upon establishing newly constructed categories, future contracts must be classified into these …


Classification Of High-Dimensional Data Based On Multiple Testing Methods, Chong Ma Jan 2018

Classification Of High-Dimensional Data Based On Multiple Testing Methods, Chong Ma

Theses and Dissertations

Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using …


Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand Aug 2017

Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand

Theses and Dissertations

Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain …


Classification Of Natural Phytoplankton Populations With Fluorescence Excitation-Based Imaging Multivariate Optical Computing, Shawna Kathleen Tazik Jan 2016

Classification Of Natural Phytoplankton Populations With Fluorescence Excitation-Based Imaging Multivariate Optical Computing, Shawna Kathleen Tazik

Theses and Dissertations

Phytoplankton account for the majority of the primary productivity in the ocean and contribute significantly to the global carbon cycle through photosynthesis. A quantitative characterization of phytoplankton cell size and taxonomic composition is essential for understanding marine biogeochemical cycles, quantifying carbon export, and for predicting the ocean’s response to future climate change. Our labs have developed a new instrument for this purpose that combines fluorescence excitation spectroscopy with an all-optical approach to multivariate statistics called multivariate optical computing (MOC). The instrument, known as the Shipboard Streak Imaging Multivariate Optical Computing (SSIMOC) photometer, is a simple filter photometer that images the …


Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith Apr 2015

Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith

Theses and Dissertations

As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of …


Ramp Loss Svm With L1-Norm Regularizaion, Eric Hess Jan 2014

Ramp Loss Svm With L1-Norm Regularizaion, Eric Hess

Theses and Dissertations

The Support Vector Machine (SVM) classification method has recently gained popularity due to the ease of implementing non-linear separating surfaces. SVM is an optimization problem with the two competing goals, minimizing misclassification on training data and maximizing a margin defined by the normal vector of a learned separating surface. We develop and implement new SVM models based on previously conceived SVM with L_1-Norm regularization with ramp loss error terms. The goal being a new SVM model that is both robust to outliers due to ramp loss, while also easy to implement in open source and off the shelf mathematical programming …


Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass Dec 2013

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass

Theses and Dissertations

Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of …


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …


Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou Nov 2013

Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou

Theses and Dissertations

Ordinal scales are commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. Notable examples include cancer staging, which is a five-category ordinal scale indicating tumor size, node involvement, and likelihood of metastasizing. Glasgow Coma Scale (GCS), which gives a reliable and objective assessment of conscious status of a patient, is an ordinal scaled measure. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical ordinal modeling methods based on the likelihood approach have contributed to the analysis of data in …


Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya Jul 2013

Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya

Theses and Dissertations

In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic …


Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack May 2013

Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack

Theses and Dissertations

The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) …


Instrument And Method Development For Single-Cell Classification Using Fluorescence Imaging Multivariate Optical Computing, Joseph Swanstrom Jan 2013

Instrument And Method Development For Single-Cell Classification Using Fluorescence Imaging Multivariate Optical Computing, Joseph Swanstrom

Theses and Dissertations

Multivariate optical computing (MOC) is an all-optical approach of predictive spectroscopy that utilizes multivariate calibration and spectral pattern recognition techniques while operating in a simple filter photometer instrument, removing the need for expensive instrumentation and post-processing of spectral data. This is accomplished with specially designed interference filters called multivariate optical elements (MOEs). MOC can provide analytical solutions for applications requiring low cost, rugged, and simple to operate instrumentation for use in remote and hazardous environments such as open ocean waters. These instrument specifications are central for developing a method for classifying phytoplankton in their natural environment. Phytoplankton are photosynthetic single …


Fast Neural Network Algorithm For Solving Classification Tasks, Noor Albarakati Apr 2012

Fast Neural Network Algorithm For Solving Classification Tasks, Noor Albarakati

Theses and Dissertations

Classification is one-out-of several applications in the neural network (NN) world. Multilayer perceptron (MLP) is the common neural network architecture which is used for classification tasks. It is famous for its error back propagation (EBP) algorithm, which opened the new way for solving classification problems given a set of empirical data. In the thesis, we performed experiments by using three different NN structures in order to find the best MLP neural network structure for performing the nonlinear classification of multiclass data sets. A developed learning algorithm used here is the batch EBP algorithm which uses all the data as a …


Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman Apr 2012

Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman

Theses and Dissertations

The dissertation deals with clustering algorithms and transforming regression prob-lems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learn-ing environment for solving regression problems as classification tasks by using support vector machines (SVMs). An extension to the most popular unsupervised clustering meth-od, k-means algorithm, is proposed, dubbed k-means2 (k-means squared) algorithm, appli-cable to ultra large datasets. The main idea is based on using a small portion of the dataset in the first stage of the clustering. Thus, the centers of such a smaller …


Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul Apr 2011

Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul

Theses and Dissertations

Over the last century, physiological signals have been broadly analyzed and processed not only to assess the function of the human physiology, but also to better diagnose illnesses or injuries and provide treatment options for patients. In particular, Electrocardiogram (ECG), blood pressure (BP) and impedance are among the most important biomedical signals processed and analyzed. The majority of studies that utilize these signals attempt to diagnose important irregularities such as arrhythmia or blood loss by processing one of these signals. However, the relationship between them is not yet fully studied using computational methods. Therefore, a system that extract and combine …


An Empirical Approach To Evaluating Sufficient Similarity: Utilization Of Euclidean Distance As A Similarity Measure, Scott Marshall May 2010

An Empirical Approach To Evaluating Sufficient Similarity: Utilization Of Euclidean Distance As A Similarity Measure, Scott Marshall

Theses and Dissertations

Individuals are exposed to chemical mixtures while carrying out everyday tasks, with unknown risk associated with exposure. Given the number of resulting mixtures it is not economically feasible to identify or characterize all possible mixtures. When complete dose-response data are not available on a (candidate) mixture of concern, EPA guidelines define a similar mixture based on chemical composition, component proportions and expert biological judgment (EPA, 1986, 2000). Current work in this literature is by Feder et al. (2009), evaluating sufficient similarity in exposure to disinfection by-products of water purification using multivariate statistical techniques and traditional hypothesis testing. The work of …


Cluster And Classification Analysis Of Fossil Invertebrates Within The Bird Spring Formation, Arrow Canyon, Nevada: Implications For Relative Rise And Fall Of Sea-Level, Scott L. Morris Apr 2010

Cluster And Classification Analysis Of Fossil Invertebrates Within The Bird Spring Formation, Arrow Canyon, Nevada: Implications For Relative Rise And Fall Of Sea-Level, Scott L. Morris

Theses and Dissertations

Carbonate strata preserve indicators of local marine environments through time. Such indicators often include microfossils that have relatively unique conditions under which they can survive, including light, nutrients, salinity, and especially water temperature. As such, microfossils are environmental proxies. When these microfossils are preserved in the rock record, they constitute key components of depositional facies. Spence et al. (2004, 2007) has proposed several approaches for determining the facies of a given stratigraphic succession based upon these proxies. Cluster analysis can be used to determine microfossil groups that represent specific environmental conditions. Identifying which microfossil groups exist through time can indicate …


Improving Neural Network Classification Training, Michael Edwin Rimer Sep 2007

Improving Neural Network Classification Training, Michael Edwin Rimer

Theses and Dissertations

The following work presents a new set of general methods for improving neural network accuracy on classification tasks, grouped under the label of classification-based methods. The central theme of these approaches is to provide problem representations and error functions that more directly improve classification accuracy than conventional learning and error functions. The CB1 algorithm attempts to maximize classification accuracy by selectively backpropagating error only on misclassified training patterns. CB2 incorporates a sliding error threshold to the CB1 algorithm, interpolating between the behavior of CB1 and standard error backpropagation as training progresses in order to avoid prematurely saturated network weights. CB3 …


Ip Algorithm Applied To Proteomics Data, Christopher Lee Green Nov 2004

Ip Algorithm Applied To Proteomics Data, Christopher Lee Green

Theses and Dissertations

Mass spectrometry has been used extensively in recent years as a valuable tool in the study of proteomics. However, the data thus produced exhibits hyper-dimensionality. Reducing the dimensionality of the data often requires the imposition of many assumptions which can be harmful to subsequent analysis. The IP algorithm is a dimension reduction algorithm, similar in purpose to latent variable analysis. It is based on the principle of maximum entropy and therefore imposes a minimum number of assumptions on the data. Partial Least Squares (PLS) is an algorithm commonly used with proteomics data from mass spectrometry in order to reduce the …


Radial Complexity Estimation For Improved Generalization In Artificial Neural Networks, Lemuel R. Myers Jr. Sep 1998

Radial Complexity Estimation For Improved Generalization In Artificial Neural Networks, Lemuel R. Myers Jr.

Theses and Dissertations

When training an artificial neural network (ANN) for classification using backpropagation of error, the weights are usually updated by minimizing the sum-squared error on the training set. As training ensues, overtraining may be observed as the network begins to memorize the training data. This occurs because, as the magnitude of the weight vector, W, grows, the decision boundaries become overly complex in much the same way as a too-high order polynomial approximation can overfit a data set in a regression problem. Since w grows during standard backpropagation, it is important to initialize the weights with consideration to the importance of …