Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

Application Of Urinary Metabolites For Cancer Detection, Qin Gao Jan 2019

Application Of Urinary Metabolites For Cancer Detection, Qin Gao

Open Access Theses & Dissertations

Prostate cancer (PCa) is the 3rd most common cause of male cancer mortality in the US. Early diagnosis and treatment of PCa will improve the quality of care and reduce mortality. The prostate specific antigen (PSA) is commonly used in the current PCa screening, but its limitation has resulted in an intense search for more reliable biomarkers. Studies showed that dogs could differentiate PCa patients from negative control by sniffing their urine. As the odor profiles are generated by volatile organic compounds (VOCs), the finding suggests that particular VOCs could be linked to PCa, PCa risk levels and other cancers. …


Time-Reflective Text Representations For Semantic Evolution Tracking And Trend Analytics, Roberto Camacho Barranco Jan 2019

Time-Reflective Text Representations For Semantic Evolution Tracking And Trend Analytics, Roberto Camacho Barranco

Open Access Theses & Dissertations

The extraction of significant, relevant, and useful trends from massive document collections, such as a streaming newswire or scientific publications, is a challenging and significant problem in many different fields, including intelligence analysis, recommendation systems, and scientific research. However, techniques that tackle trend analytics of such large text corpora are limited because research that addresses the temporal nature of these publications is still in its early stages. In this work, we first show that it is possible to capture the evolution of a story (or trend) by connecting the dots between different documents in a text corpus. The observed results …


Modeling Correlated Data Via Copulas, Panfeng Liang Jan 2019

Modeling Correlated Data Via Copulas, Panfeng Liang

Open Access Theses & Dissertations

Copulas are widely used to model the dependency structure among components of multi- variate data sets. Elliptical copulas, such as Gaussian copula, are most popular copulas being used since many data sets follow elliptical distributions or meta-elliptical distribu- tions (Fang et al. (2002)). However, today's approaches and software packages require us to assume the specific category, such as Gaussian or Student's T, of the elliptical cop- ula before estimating it. In this Thesis, we will propose a Bayesian method using Markov chain Monte Carlo (MCMC) methods to estimate the density function of elliptical copulas without specifying it is the copula …


Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models, Abhijeet R. Patil Jan 2019

Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models, Abhijeet R. Patil

Open Access Theses & Dissertations

In high-dimensional data, the performance of various classiers is largely dependent on the selection of important features. Most of the individual classiers using existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important

features using the FS method and selecting the best performing classier is a challenging task in high throughput data. In this research, we propose a combination of resampling based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS)

and ensembles of regularized regression models (ERRM) capable of handling data with the high correlation structures. The ERRM boosts the prediction accuracy with …


Robust Statistical Inference For The Gaussian Distribution, Andrews Tawiah Anum Jan 2019

Robust Statistical Inference For The Gaussian Distribution, Andrews Tawiah Anum

Open Access Theses & Dissertations

The aim of robust statistics is to develop statistical procedures which are not unduly influenced by outliers or observations that are not representative of the underlying "true" data generating process. This thesis focuses on an estimator with this characteristic. The divergence function is introduced in Chapter 2 with the sole aim of taking the function f to be the univariate normal distribution and α - [0, 1]. The estimator fails when we rely on the classic Newton's method to converge to the minimum of the density power divergence (MDPD) function. There is a tendency of such estimator never to approach …


Confidence Intervals For The Expected P-Value, Emmanuel Kofi Abrefa Jan 2019

Confidence Intervals For The Expected P-Value, Emmanuel Kofi Abrefa

Open Access Theses & Dissertations

The p-value is widely used in many application fields. In common practice, a scientific finding is deemed statistically significant if its resultant p-value is less than a pre-specified significance level, for example α = 0.05, albeit many statistically significant results are not reproducible in new studies. Mixed reasons including misuses, abuses, misunderstanding and misinterpretation arouse intensive debates and conservatives around the p-value from time to time over the years. Yet no reasonable solutions have been proposed. In this research, we make efforts to close the gap by advocating the use of confidence level for the expected p-value p0. This allows …


On The Performance Of Variable Selection And Classification Via Rank-Based Classifier, Md Showaib Rahman None Sarker Jan 2019

On The Performance Of Variable Selection And Classification Via Rank-Based Classifier, Md Showaib Rahman None Sarker

Open Access Theses & Dissertations

In high-dimensional gene expression data analysis, the accuracy and reliability of cancer classification and selection of important genes play a very crucial role. To identify these important genes and predict future outcomes (tumor vs. non-tumor), various methods have been proposed in the literature. But only few of them take into account correlation patterns and grouping effects among the genes. In this article, we propose a rank-based modification of the popular penalized logistic regression procedure based on a combination of l1 and l2 penalties capable of handling possible correlation among genes in different groups. While the l1 penalty maintains sparsity, the …


Bayesian Analysis Of Variable-Stress Accelerated Life Testing, Richard Okine Jan 2019

Bayesian Analysis Of Variable-Stress Accelerated Life Testing, Richard Okine

Open Access Theses & Dissertations

Several authors have over the years studied the art of modeling data from accelerated life testing and making inferences from such data. In this study, we consider a continuously varying stress accelerated life testing procedure which is the limiting case of the multiple stress-level discussed by Doksum and H´oyland [1]. We derive the likelihood function for the life distribution of the continuously increasing stress accelerated life testing model and consequently the Fisher's Information Matrix. We propose a Bayesian analysis for this distribution using the Gibbs Sampling Procedure. We conduct simulation studies and real data analysis to demonstrate the efficiency of …


Forecasting Crashes, Credit Card Default, And Imputation Analysis On Missing Values By The Use Of Neural Networks, Jazmin Quezada Jan 2019

Forecasting Crashes, Credit Card Default, And Imputation Analysis On Missing Values By The Use Of Neural Networks, Jazmin Quezada

Open Access Theses & Dissertations

A neural network is a system of hardware and/or software patterned after the operation of neurons in the human brain. Neural networks,- also called Artificial Neural Networks - are a variety of deep learning technology, which also falls under the umbrella of artificial intelligence, or AI. Recent studies shows that Artificial Neural Network has the highest coefficient of determination (i.e. measure to assess how well a model explains and predicts future outcomes.) in comparison to the K-nearest neighbor classifiers, logistic regression, discriminant analysis, naive Bayesian classifier, and classification trees. In this work, the theoretical description of the neural network methodology …