Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- University of South Florida (4)
- Utah State University (4)
- University of South Carolina (3)
- Brigham Young University (2)
- Virginia Commonwealth University (2)
-
- City University of New York (CUNY) (1)
- Georgia Southern University (1)
- Illinois State University (1)
- Louisiana Tech University (1)
- Minnesota State University, Mankato (1)
- Purdue University (1)
- The Texas Medical Center Library (1)
- The University of Southern Mississippi (1)
- University of Central Florida (1)
- University of Southern Maine (1)
- University of Tennessee, Knoxville (1)
- University of Texas at El Paso (1)
- Washington University in St. Louis (1)
- Western University (1)
- Publication Year
- Publication
-
- Theses and Dissertations (7)
- USF Tampa Graduate Theses and Dissertations (4)
- All Graduate Plan B and other Reports, Spring 1920 to Spring 2023 (3)
- Doctoral Dissertations (2)
- Electronic Theses and Dissertations (2)
-
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (1)
- All Graduate Theses, Dissertations, and Other Capstone Projects (1)
- Arts & Sciences Electronic Theses and Dissertations (1)
- Dissertations (1)
- Dissertations & Theses (Open Access) (1)
- Dissertations, Theses, and Capstone Projects (1)
- Electronic Thesis and Dissertation Repository (1)
- Open Access Dissertations (1)
- Open Access Theses & Dissertations (1)
- Senior Theses (1)
- Thinking Matters Symposium Archive (1)
Articles 1 - 29 of 29
Full-Text Articles in Physical Sciences and Mathematics
Comparative Study Of Supervised Classification Techniques With A Modified Knn Algorithm, Noah Owusu
Comparative Study Of Supervised Classification Techniques With A Modified Knn Algorithm, Noah Owusu
Open Access Theses & Dissertations
The goal of classification is to develop a model that can be used to accurately assign new observations to labeled classes based on the patterns learned from the training data. K-nearest Neighbors algorithm (KNN) is a popular and widely used algorithm for classification, however, its performance can be adversely affected by the presence of outliers in a dataset. In this study we have modified this existing KNN algorithm that can alleviate the effect of outliers in a dataset, thereby improving the performance of the KNN algorithm. We compared the performances of the Modified KNN method and the Existing KNN algorithm …
Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley
Modeling The Probability Of A Successful Stolen Base Attempt In Major League Baseball, Cade Stanley
Senior Theses
In Major League Baseball (MLB), the outcome of a stolen base attempt has important implications. Success moves the runner closer to scoring, while failure records an out and removes the runner from the basepaths altogether. Therefore, it is important that the decision by a coach or player to steal a base is well-informed. In this thesis, I explore a statistical approach to making this decision. I train logistic regression and random forest models, using data about the game situation and about the runner, pitcher, and catcher involved in the stolen base attempt, to estimate the probability that a stolen base …
Analyzing Relationships With Machine Learning, Oscar Ko
Analyzing Relationships With Machine Learning, Oscar Ko
Dissertations, Theses, and Capstone Projects
Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.
The dataset is from a Stanford University survey, “How Couples …
Bayesian Nonparametric Model For Functional Data Analysis, Tahmidul Islam
Bayesian Nonparametric Model For Functional Data Analysis, Tahmidul Islam
Theses and Dissertations
Functional data analysis (FDA) experienced a burst of growth after Ramsay and Silverman published their textbook in 1997. Functional data analysis interests researchers because of the challenges it adds to well-established multivariate analysis. Unlike finite dimensional random vectors, we visualize infinite dimensional random functions; for example, curves, images, brain scans, etc. A vast amount of literature have been dedicated to developing models for functional data. The ideas are mostly based on basis function representations and kernel-based nonparametric methods. In this dissertation, we propose a Bayesian treatment of nonparametric functional data analysis by introducing a Gaussian process (GP) over the space …
Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo
Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo
Dissertations
In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies.
First, to improve the prediction accuracy of learning …
A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley
A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley
Theses and Dissertations
According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …
Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen
Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen
USF Tampa Graduate Theses and Dissertations
Cancer is one of the most deadly diseases that the world has been fighting against over decades. An enormous number of research has been conducted, via a wide scale of approaches, raging from genetic analysis to mathematical modeling. Survival analysis is a well-performed methodology frequently used to estimate the survival probability of a patient. Although there has been a large number of methods for survival analysis, efficient exploration of a high-dimensional feature space has been challenging due to its computational cost and complexity. This thesis adapts the component-wise gradient boosting algorithms for cancer survival analysis, and also proposes a new …
Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter
Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter
USF Tampa Graduate Theses and Dissertations
Ensemble methods are commonly used for building predictive models for classification. Models that are unstable to perturbations in the training set, such as the decision tree, often see considerable reductions in error when grouped, using bootstrapped resamples of the training data to train many models. The non-parametric bootstrap, however, has limited efficacy when used on severely imbalanced data, especially when the number of observations of one or more classes is exceptionally small. We explore the fractional random weighted bootstrap, which randomly assigns fractional weights to observations, as an alternative resampling pro cedure in training machine learning ensembles, particularly decision tree …
Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa
Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa
Doctoral Dissertations
Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …
Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo
Electronic Thesis and Dissertation Repository
The research is motivated by the prostate cancer imaging study conducted at the University of Western Ontario to classify cancer status using multiple in-vivo images. The prostate cancer histological image and the in-vivo images are subject to misalignment in the co-registration procedure, which can be viewed as measurement error in covariates or response. We investigate methods to correct this problem.
The first proposed method corrects the predicted class probability when the data has misclassified labels. The correction equation is derived from the relationship between the true response and the error-prone response. The probability for the observed class label is adjusted …
Deep Learning Analysis Of Limit Order Book, Xin Xu
Deep Learning Analysis Of Limit Order Book, Xin Xu
Arts & Sciences Electronic Theses and Dissertations
In this paper, we build a deep neural network for modeling spatial structure in limit order book and make prediction for future best ask or best bid price based on ideas of (Sirignano 2016). We propose an intuitive data processing method to approximate the data is non-available for us based only on level I data that is more widely available. The model is based on the idea that there is local dependence for best ask or best bid price and sizes of related orders. First we use logistic regression to prove that this approach is reasonable. To show the advantages …
Multiclass Classification Using Support Vector Machines, Duleep Prasanna W. Rathgamage Don
Multiclass Classification Using Support Vector Machines, Duleep Prasanna W. Rathgamage Don
Electronic Theses and Dissertations
In this thesis, we discuss different SVM methods for multiclass classification and introduce the Divide and Conquer Support Vector Machine (DCSVM) algorithm which relies on data sparsity in high dimensional space and performs a smart partitioning of the whole training data set into disjoint subsets that are easily separable. A single prediction performed between two partitions eliminates one or more classes in a single partition, leaving only a reduced number of candidate classes for subsequent steps. The algorithm continues recursively, reducing the number of classes at each step until a final binary decision is made between the last two classes …
Classification Of High-Dimensional Data Based On Multiple Testing Methods, Chong Ma
Classification Of High-Dimensional Data Based On Multiple Testing Methods, Chong Ma
Theses and Dissertations
Supervised and unsupervised classification are common topics in machine learning in both scientific and industrial fields, which usually involve three tasks: prediction, exploration, and explanation. False discovery rate (FDR) theory has a close connection to classical classification theory, which must be employed in a sophisticated way to achieve good performance in various contexts. The study aims to explore novel supervised classifiers and unsupervised classification approaches for functional data and high-dimensional data in genome study by using FDR, respectively. One work develops a novel classifier for functional data by casting the classification problem into a multiple testing task, which involves using …
Data Analysis Methods Using Persistence Diagrams, Andrew Marchese
Data Analysis Methods Using Persistence Diagrams, Andrew Marchese
Doctoral Dissertations
In recent years, persistent homology techniques have been used to study data and dynamical systems. Using these techniques, information about the shape and geometry of the data and systems leads to important information regarding the periodicity, bistability, and chaos of the underlying systems. In this thesis, we study all aspects of the application of persistent homology to data analysis. In particular, we introduce a new distance on the space of persistence diagrams, and show that it is useful in detecting changes in geometry and topology, which is essential for the supervised learning problem. Moreover, we introduce a clustering framework directly …
Real-Time Classification Of Biomedical Signals, Parkinson’S Analytical Model, Abolfazl Saghafi
Real-Time Classification Of Biomedical Signals, Parkinson’S Analytical Model, Abolfazl Saghafi
USF Tampa Graduate Theses and Dissertations
The reach of technological innovation continues to grow, changing all industries as it evolves. In healthcare, technology is increasingly playing a role in almost all processes, from patient registration to data monitoring, from lab tests to self-care tools. The increase in the amount and diversity of generated clinical data requires development of new technologies and procedures capable of integrating and analyzing the BIG generated information as well as providing support in their interpretation.
To that extent, this dissertation focuses on the analysis and processing of biomedical signals, specifically brain and heart signals, using advanced machine learning techniques. That is, the …
Statistical Methods For Assessing Individual Oocyte Viability Through Gene Expression Profiles, Michael O. Bishop
Statistical Methods For Assessing Individual Oocyte Viability Through Gene Expression Profiles, Michael O. Bishop
All Graduate Plan B and other Reports, Spring 1920 to Spring 2023
Abstract
Statistical Methods for Assessing Individual Oocyte Viability Through Gene Expression Profiles
By
Michael O. Bishop
Utah State University, 2017
Major Professor: Dr. John R. Stevens
Department: Mathematics and Statistics
Oocytes are the precursor cells to the female gamete, or egg. While reproduction may vary from species to species, within humans and most domesticated animals, the oocyte maturation process is fairly similar. As an oocyte matures, there are various processes that take place, all of which have an effect on the viability of the individual oocyte. Barring outside damage that may come to the oocyte, one of the primary reasons …
A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis
A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis
Open Access Dissertations
Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to newly …
Identification Of Slums In Mumbai, India: Unsupervised Classification Techniques, Frankie St. Amand
Identification Of Slums In Mumbai, India: Unsupervised Classification Techniques, Frankie St. Amand
Thinking Matters Symposium Archive
Slums are contiguous settlements. Inhabitants lack access to safe water, sanitation and sewage infrastructure, secure housing tenure, uncrowded living space, and permanent, durable housing. Addressing these problematic trends begins with identifying contiguous settlements within Mumbai’s urban fabric. Classifications can be performed using satellite images and remote sensing techniques to yield accurate results. Through literature reviews, socio-cultural analysis, and examination of high resolution satellite imagery, this project aims to develop a systematic, accessible, and reproducible method of classifying Mumbai’s slums.
Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou
Theses and Dissertations
Ordinal scales are commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. Notable examples include cancer staging, which is a five-category ordinal scale indicating tumor size, node involvement, and likelihood of metastasizing. Glasgow Coma Scale (GCS), which gives a reliable and objective assessment of conscious status of a patient, is an ordinal scaled measure. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical ordinal modeling methods based on the likelihood approach have contributed to the analysis of data in …
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
Integrative Biomarker Identification And Classification Using High Throughput Assays, Pan Tong
Dissertations & Theses (Open Access)
It is well accepted that tumorigenesis is a multi-step procedure involving aberrant functioning of genes regulating cell proliferation, differentiation, apoptosis, genome stability, angiogenesis and motility. To obtain a full understanding of tumorigenesis, it is necessary to collect information on all aspects of cell activity. Recent advances in high throughput technologies allow biologists to generate massive amounts of data, more than might have been imagined decades ago. These advances have made it possible to launch comprehensive projects such as (TCGA) and (ICGC) which systematically characterize the molecular fingerprints of cancer cells using gene expression, methylation, copy number, microRNA and SNP microarrays …
Enhancement Of Random Forests Using Trees With Oblique Splits, Andrejus Parfionovas
Enhancement Of Random Forests Using Trees With Oblique Splits, Andrejus Parfionovas
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Statistical classification is widely used in many areas where there is a need to make a data-driven decision, or to classify complicated cases or objects. For instance: disease diagnostics (is a patient sick or healthy, based on the blood test results?); weather forecasting (will there be a storm tomorrow, based on today's atmospheric pressure, air temperature, and wind velocity?); speech recognition (what was said over the phone, based on the caller's voice level and articulation); spam detection (can the unsolicited commercial e-mails be identified by their content?); and so on.
Classification trees …
Class Discovery And Prediction Of Tumor With Microarray Data, Bo Liu
Class Discovery And Prediction Of Tumor With Microarray Data, Bo Liu
All Graduate Theses, Dissertations, and Other Capstone Projects
Current microarray technology is able take a single tissue sample to construct an Affymetrix oglionucleotide array containing (estimated) expression levels of thousands of different genes for that tissue. The objective is to develop a more systematic approach to cancer classification based on Affymetrix oglionucleotide microarrays. For this purpose, I studied published colon cancer microarray data. Colon cancer, with 655,000 deaths worldwide per year, has become the fourth most common form of cancer in the United States and the third leading cause of cancer - related death in the Western world. This research has been focuses in two areas: class discovery, …
An Empirical Approach To Evaluating Sufficient Similarity: Utilization Of Euclidean Distance As A Similarity Measure, Scott Marshall
An Empirical Approach To Evaluating Sufficient Similarity: Utilization Of Euclidean Distance As A Similarity Measure, Scott Marshall
Theses and Dissertations
Individuals are exposed to chemical mixtures while carrying out everyday tasks, with unknown risk associated with exposure. Given the number of resulting mixtures it is not economically feasible to identify or characterize all possible mixtures. When complete dose-response data are not available on a (candidate) mixture of concern, EPA guidelines define a similar mixture based on chemical composition, component proportions and expert biological judgment (EPA, 1986, 2000). Current work in this literature is by Feder et al. (2009), evaluating sufficient similarity in exposure to disinfection by-products of water purification using multivariate statistical techniques and traditional hypothesis testing. The work of …
Cluster And Classification Analysis Of Fossil Invertebrates Within The Bird Spring Formation, Arrow Canyon, Nevada: Implications For Relative Rise And Fall Of Sea-Level, Scott L. Morris
Theses and Dissertations
Carbonate strata preserve indicators of local marine environments through time. Such indicators often include microfossils that have relatively unique conditions under which they can survive, including light, nutrients, salinity, and especially water temperature. As such, microfossils are environmental proxies. When these microfossils are preserved in the rock record, they constitute key components of depositional facies. Spence et al. (2004, 2007) has proposed several approaches for determining the facies of a given stratigraphic succession based upon these proxies. Cluster analysis can be used to determine microfossil groups that represent specific environmental conditions. Identifying which microfossil groups exist through time can indicate …
Statistical Learning And Behrens-Fisher Distribution Methods For Heteroscedastic Data In Microarray Analysis, Nabin K. Manandhr-Shrestha
Statistical Learning And Behrens-Fisher Distribution Methods For Heteroscedastic Data In Microarray Analysis, Nabin K. Manandhr-Shrestha
USF Tampa Graduate Theses and Dissertations
The aim of the present study is to identify the di®erentially expressed genes be- tween two di®erent conditions and apply it in predicting the class of new samples using the microarray data. Microarray data analysis poses many challenges to the statis- ticians because of its high dimensionality and small sample size, dubbed as "small n large p problem". Microarray data has been extensively studied by many statisticians and geneticists. Generally, it is said to follow a normal distribution with equal vari- ances in two conditions, but it is not true in general. Since the number of replications is very small, …
Data Mining Methods For Malware Detection, Muazzam Siddiqui
Data Mining Methods For Malware Detection, Muazzam Siddiqui
Electronic Theses and Dissertations
This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval …
Special Classification Models For Lichens In The Pacific Northwest, Janeen Ardito
Special Classification Models For Lichens In The Pacific Northwest, Janeen Ardito
All Graduate Plan B and other Reports, Spring 1920 to Spring 2023
A common problem in ecological studies is that of determining where to look for rare species. This paper shows how statistical models, such as classification trees, may be used to assist in the design of probability-based surveys for rare species using information on more abundant species that are associated with the rare species. This model assisted approach to survey design involves first building models for the more abundant species. The models are then used to determine stratifications for the rare species that are associated with the more abundant species. The goal of this approach is to increase the number of …
Ip Algorithm Applied To Proteomics Data, Christopher Lee Green
Ip Algorithm Applied To Proteomics Data, Christopher Lee Green
Theses and Dissertations
Mass spectrometry has been used extensively in recent years as a valuable tool in the study of proteomics. However, the data thus produced exhibits hyper-dimensionality. Reducing the dimensionality of the data often requires the imposition of many assumptions which can be harmful to subsequent analysis. The IP algorithm is a dimension reduction algorithm, similar in purpose to latent variable analysis. It is based on the principle of maximum entropy and therefore imposes a minimum number of assumptions on the data. Partial Least Squares (PLS) is an algorithm commonly used with proteomics data from mass spectrometry in order to reduce the …
Discriminant Function Analysis, Kuo Hsiung Su
Discriminant Function Analysis, Kuo Hsiung Su
All Graduate Plan B and other Reports, Spring 1920 to Spring 2023
The technique of discriminant function analysis was originated by R.A. Fisher and first applied by Barnard (1935). Two very useful summaries of the recent work in this technique can be found in Hodges (1950) and in Tosuoka and Tiedeman (1954). The techniques have been used primarily in the fields of anthropology, psychology, biology, medicine, and education, and have only begun to be applied to other fields in recent years.
Classification and discriminant function analyses are two phases in the attempt to predict which of several populations an observation might be a member of, on the basis of multivariate measurements. Both …