Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch Dec 2021

Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch

Theses and Dissertations

Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition.

A variety of machine learning models combined with different features and text processingare tested against training data that mentions …


Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern Aug 2021

Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern

Theses and Dissertations

Classification and clustering of music genres has become an increasingly prevalent focusin recent years, prompting a push for research into relevant algorithms. The most successful algorithms have typically applied the Naive Bayes or k-Nearest Neighbors algorithms, or used Neural Networks to perform classification. This thesis seeks to investigate the use of unsupervised clustering algorithms such as K-Means or Hierarchical clustering, and establish their usefulness in comparison to or conjunction with established methods.


Multi-Label Classification Models For Heterogeneous Data: An Ensemble-Based Approach., Jose Maria Moyano Murillo Jan 2020

Multi-Label Classification Models For Heterogeneous Data: An Ensemble-Based Approach., Jose Maria Moyano Murillo

Theses and Dissertations

In recent years, the multi-label classification gained attention of the scientific community given its ability to solve real-world problems where each instance of the dataset may be associated with several class labels simultaneously, such as multimedia categorization or medical problems.

The first objective of this dissertation is to perform a thorough review of the state-of-the-art ensembles of multi-label classifiers (EMLCs). Its aim is twofold: 1) study state-of-the-art ensembles of multi-label classifiers and categorize them proposing a novel taxonomy; and 2) perform an experimental study to give some tips and guidelines to select the method that perform the best according to …


Text Classification Of Installation Support Contract Topic Models For Category Management, William C. Sevier Mar 2018

Text Classification Of Installation Support Contract Topic Models For Category Management, William C. Sevier

Theses and Dissertations

Air Force Installation Contracting Agency manages nearly 18 percent of total Air Force spend, equating to approximately 57 billion dollars. To improve strategic sourcing, the organization is beginning to categorize installation-support spend and assign accountable portfolio managers to respective spend categories. A critical task in this new strategic environment includes the appropriate categorization of Air Force contracts into newly created, manageable spend categories. It has been recognized that current composite categories have the opportunity to be further distinguished into sub-categories leveraging text analytics on the contract descriptions. Furthermore, upon establishing newly constructed categories, future contracts must be classified into these …


Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand Aug 2017

Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand

Theses and Dissertations

Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain …


Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith Apr 2015

Using Instance-Level Meta-Information To Facilitate A More Principled Approach To Machine Learning, Michael Reed Smith

Theses and Dissertations

As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of …


Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass Dec 2013

Data Mining Revision Controlled Document History Metadata For Automatic Classification, Dustin Maass

Theses and Dissertations

Version controlled documents provide a complete history of the changes to the document, including everything from what was changed to who made the change and much more. Through the use of cluster analysis and several sets of manipulated data, this research examines the revision history of Wikipedia in an attempt to find language-independent patterns that could assist in automatic page classification software. Utilizing two sample data sets and applying the aforementioned cluster analysis, no conclusive evidence was found that would indicate that such patterns exist. Our work on the software, however, does provide a foundation for more possible types of …


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …


Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya Jul 2013

Assessment And Prediction Of Cardiovascular Status During Cardiac Arrest Through Machine Learning And Dynamical Time-Series Analysis, Sharad Shandilya

Theses and Dissertations

In this work, new methods of feature extraction, feature selection, stochastic data characterization/modeling, variance reduction and measures for parametric discrimination are proposed. These methods have implications for data mining, machine learning, and information theory. A novel decision-support system is developed in order to guide intervention during cardiac arrest. The models are built upon knowledge extracted with signal-processing, non-linear dynamic and machine-learning methods. The proposed ECG characterization, combined with information extracted from PetCO2 signals, shows viability for decision-support in clinical settings. The approach, which focuses on integration of multiple features through machine learning techniques, suits well to inclusion of multiple physiologic …


Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack May 2013

Geometric Approach To Support Vector Machines Learning For Large Datasets, Robert Strack

Theses and Dissertations

The dissertation introduces Sphere Support Vector Machines (SphereSVM) and Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithms that use geometrical properties of the underlying classification problems to efficiently obtain models describing training data. SphereSVM is based on combining minimal enclosing ball approach, state of the art nearest point problem solvers and probabilistic techniques. The blending of the three speeds up the training phase of SVMs significantly and reaches similar (i.e., practically the same) accuracy as the other classification models over several big and large real data sets within the strict validation frame of a double (nested) …


Fast Neural Network Algorithm For Solving Classification Tasks, Noor Albarakati Apr 2012

Fast Neural Network Algorithm For Solving Classification Tasks, Noor Albarakati

Theses and Dissertations

Classification is one-out-of several applications in the neural network (NN) world. Multilayer perceptron (MLP) is the common neural network architecture which is used for classification tasks. It is famous for its error back propagation (EBP) algorithm, which opened the new way for solving classification problems given a set of empirical data. In the thesis, we performed experiments by using three different NN structures in order to find the best MLP neural network structure for performing the nonlinear classification of multiclass data sets. A developed learning algorithm used here is the batch EBP algorithm which uses all the data as a …


Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman Apr 2012

Contributions To K-Means Clustering And Regression Via Classification Algorithms, Raied Salman

Theses and Dissertations

The dissertation deals with clustering algorithms and transforming regression prob-lems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learn-ing environment for solving regression problems as classification tasks by using support vector machines (SVMs). An extension to the most popular unsupervised clustering meth-od, k-means algorithm, is proposed, dubbed k-means2 (k-means squared) algorithm, appli-cable to ultra large datasets. The main idea is based on using a small portion of the dataset in the first stage of the clustering. Thus, the centers of such a smaller …


Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul Apr 2011

Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul

Theses and Dissertations

Over the last century, physiological signals have been broadly analyzed and processed not only to assess the function of the human physiology, but also to better diagnose illnesses or injuries and provide treatment options for patients. In particular, Electrocardiogram (ECG), blood pressure (BP) and impedance are among the most important biomedical signals processed and analyzed. The majority of studies that utilize these signals attempt to diagnose important irregularities such as arrhythmia or blood loss by processing one of these signals. However, the relationship between them is not yet fully studied using computational methods. Therefore, a system that extract and combine …


Improving Neural Network Classification Training, Michael Edwin Rimer Sep 2007

Improving Neural Network Classification Training, Michael Edwin Rimer

Theses and Dissertations

The following work presents a new set of general methods for improving neural network accuracy on classification tasks, grouped under the label of classification-based methods. The central theme of these approaches is to provide problem representations and error functions that more directly improve classification accuracy than conventional learning and error functions. The CB1 algorithm attempts to maximize classification accuracy by selectively backpropagating error only on misclassified training patterns. CB2 incorporates a sliding error threshold to the CB1 algorithm, interpolating between the behavior of CB1 and standard error backpropagation as training progresses in order to avoid prematurely saturated network weights. CB3 …


Radial Complexity Estimation For Improved Generalization In Artificial Neural Networks, Lemuel R. Myers Jr. Sep 1998

Radial Complexity Estimation For Improved Generalization In Artificial Neural Networks, Lemuel R. Myers Jr.

Theses and Dissertations

When training an artificial neural network (ANN) for classification using backpropagation of error, the weights are usually updated by minimizing the sum-squared error on the training set. As training ensues, overtraining may be observed as the network begins to memorize the training data. This occurs because, as the magnitude of the weight vector, W, grows, the decision boundaries become overly complex in much the same way as a too-high order polynomial approximation can overfit a data set in a regression problem. Since w grows during standard backpropagation, it is important to initialize the weights with consideration to the importance of …