Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics and Probability

2019

Institution
Keyword
Publication

Articles 31 - 60 of 253

Full-Text Articles in Physical Sciences and Mathematics

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku Aug 2019

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku

Master of Science in Computer Science Theses

Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …


Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo Aug 2019

Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo

Electronic Thesis and Dissertation Repository

The research is motivated by the prostate cancer imaging study conducted at the University of Western Ontario to classify cancer status using multiple in-vivo images. The prostate cancer histological image and the in-vivo images are subject to misalignment in the co-registration procedure, which can be viewed as measurement error in covariates or response. We investigate methods to correct this problem.

The first proposed method corrects the predicted class probability when the data has misclassified labels. The correction equation is derived from the relationship between the true response and the error-prone response. The probability for the observed class label is adjusted …


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir Aug 2019

Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir

Theses and Dissertations

This thesis conducts Value at Risk (VaR) and Expected Shortfall (ES) estimation using GARCH modeling and Bayesian Model Averaging (BMA). BMA considers multiple models weighted by some information criterion. Through BMA, this thesis finds that VaR and ES estimates can be improved through enhanced modeling of the data generation process.


Beta Regression Models For Repeated-Measures Data Analysis, Nicholas A. Hein Aug 2019

Beta Regression Models For Repeated-Measures Data Analysis, Nicholas A. Hein

Theses & Dissertations

Bounded data often give rise to uncorrectable skew and heteroscedasticity. Bounded data are a relatively frequent occurrence in clinical and research settings. For example, in neuropsychology, most neurocognitive tests are bounded, and subjects are repeatedly measured over time. The statistician needs to choose a model that accounts for the correlated nature of the repeated measures. The Beta distribution is a natural choice for modeling bounded data. Currently, generalized linear mixed models (GLMM) and generalized estimating equations (GEE) are two methods that can be used to model Beta distributed data with repeated measures. However, GLMMs and GEEs have limitations, i.e., GLMMs …


Advances In Moment-Based Distributional Methodologies, Yishan Zang Aug 2019

Advances In Moment-Based Distributional Methodologies, Yishan Zang

Electronic Thesis and Dissertation Repository

This thesis comprises various results that rely on the moments of a distribution or the sample moments associated with a set of observations. Since a sample of size n is uniquely specified by its first n moments, it is pertinent to make use of sample moments for modeling, classification or inference purposes. Three density mixtures are approximated by adjusting in various ways an initial density approximation referred to a base density by means certain moment-based functions, and the accuracy of the resulting density approximants are compared. A similar study is carried out in the context of density estimation. Moreover, it …


Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li Aug 2019

Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li

Statistical Science Theses and Dissertations

In this thesis, we investigate sample size calculation for three kinds of clinical trials: (1). Randomized controlled trials (RCTs) with longitudinal count outcomes; (2). Cluster randomized trials (CRTs) with count outcomes; (3). CRTs with multiple binary co-primary endpoints.


Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra Aug 2019

Effective Statistical Energy Function Based Protein Un/Structure Prediction, Avdesh Mishra

University of New Orleans Theses and Dissertations

Proteins are an important component of living organisms, composed of one or more polypeptide chains, each containing hundreds or even thousands of amino acids of 20 standard types. The structure of a protein from the sequence determines crucial functions of proteins such as initiating metabolic reactions, DNA replication, cell signaling, and transporting molecules. In the past, proteins were considered to always have a well-defined stable shape (structured proteins), however, it has recently been shown that there exist intrinsically disordered proteins (IDPs), which lack a fixed or ordered 3D structure, have dynamic characteristics and therefore, exist in multiple states. Based on …


Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu Aug 2019

Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu

Electronic Thesis and Dissertation Repository

Mark-recapture studies are often used to estimate the survival of individuals in a population and identify factors that affect survival in order to understand how the population might be affected by changing conditions. Factors that vary between individuals and over time, like body mass, present a challenge because they can only be observed when an individual is captured. Several models have been proposed to deal with the missing-covariate problem and commonly impose a logit link function which implies that the survival probability varies between 0 and 1. In this thesis I explore the estimability of four possible models when survival …


Split Credibility: A Two-Dimensional Semi-Linear Credibility Model, Jingbing Qiu Aug 2019

Split Credibility: A Two-Dimensional Semi-Linear Credibility Model, Jingbing Qiu

Electronic Thesis and Dissertation Repository

In the thesis, we introduce a two-dimensional semi-linear credibility model, which is an extension of the classical credibility or split credibility models used by practicing actuaries. Our model predicts the future expected losses of a policyholder by considering its historical primary and excess losses. The optimal split point is derived based on the mean squared error criterion. We show when and why splitting a policyholder’s historical losses into primary and excess parts work analytically. In addition, we derived formulas for estimating our model parameters nonparametrically. Finally, we show the application of our model through three examples.


Factors Associated With Eosinophilic Esophagitis In Nevada, Julia Lorraine Anderson Aug 2019

Factors Associated With Eosinophilic Esophagitis In Nevada, Julia Lorraine Anderson

UNLV Theses, Dissertations, Professional Papers, and Capstones

Eosinophilic esophagitis (EoE) is a rare immune-mediated illness with symptoms that range from difficulty swallowing to food impaction of the esophagus. Most published studies have been documented among patients residing in cool regions with significant annual rainfall. No published studies to our knowledge have been performed examining the healthcare utilization trends of EoE in Nevada. Utilizing two unique databases, the factors associated with EoE healthcare utilization patterns in Nevada were examined. All analyses were performed in R version 3.5.1. This study included a demographic and regional analysis identifying risk factors associated with having an EoE healthcare visit in Nevada. Several …


Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden Aug 2019

Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden

Electronic Theses and Dissertations

This thesis looks at the corequisite developmental math program at East Tennessee State University (ETSU) and compares the effectiveness to the previous developmental math program by comparing the student outcomes in MATH 1530. MATH 1530 is a non-calculus based statistic and probability course that satisfies most majors’ general education math requirements. ETSU sees approximately 1,000 students a year pass through MATH 1530 which is around 6.7% of the total enrollment at ETSU[9]. We are interested in the last five years of the developmental math program before it was changed to corequisite developmental math and the first five years of corequisite …


Tuning Hyperparameters In Supervised Learning Models And Applications Of Statistical Learning In Genome-Wide Association Studies With Emphasis On Heritability, Jill F. Lundell Aug 2019

Tuning Hyperparameters In Supervised Learning Models And Applications Of Statistical Learning In Genome-Wide Association Studies With Emphasis On Heritability, Jill F. Lundell

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Machine learning is a buzz word that has inundated popular culture in the last few years. This is a term for a computer method that can automatically learn and improve from data instead of being explicitly programmed at every step. Investigations regarding the best way to create and use these methods are prevalent in research. Machine learning models can be difficult to create because models need to be tuned. This dissertation explores the characteristics of tuning three popular machine learning models and finds a way to automatically select a set of tuning parameters. This information was used to create an …


Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin Aug 2019

Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin

Graduate Theses and Dissertations

Tree ring chronology data is known to reflect regional climate due to the strong impact of rainfall and temperature. Therefore, tree ring data can be used to reconstruct historical climate in order to understand how climate changed in the past and make prediction about the future behavior of the climate. For simplicity, this research only considers the influence of precipitation on tree ring growth within the New England area. A total of 94 measurement sites are used to record tree ring width over 881 years and corresponding precipitation data are given at some locations for 121 years. We developed a …


Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava Aug 2019

Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava

Electronic Theses and Dissertations

The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney …


Effect Of Cross-Validation On The Output Of Multiple Testing Procedures, Josh Dallas Price Aug 2019

Effect Of Cross-Validation On The Output Of Multiple Testing Procedures, Josh Dallas Price

Graduate Theses and Dissertations

High dimensional data with sparsity is routinely observed in many scientific disciplines. Filtering out the signals embedded in noise is a canonical problem in such situations requiring multiple testing. The Benjamini--Hochberg procedure using False Discovery Rate control is the gold standard in large scale multiple testing. In Majumder et al. (2009) an internally cross-validated form of the procedure is used to avoid a costly replicate study and the complications that arise from population selection in such studies (i.e. extraneous variables). I implement this procedure and run extensive simulation studies under increasing levels of dependence among parameters and different data generating …


Novel Bayesian Methodology In Multivariate Problems., Debamita Kundu Aug 2019

Novel Bayesian Methodology In Multivariate Problems., Debamita Kundu

Electronic Theses and Dissertations

This dissertation involves developing novel Bayesian methodology for multivariate problems. In particular, it focuses on two contexts: shrinkage based variable selection in multivariate regression and simultaneous covariance estimation of multiple groups. Both these projects are centered around fully Bayesian inference schemes based on hierarchical modeling to capture context-specific features of the data and the development of computationally efficient estimation algorithm. Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) …


Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris Aug 2019

Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris

Graduate Theses and Dissertations

Our goal is to create spatio-temporal models for predicting future gubernatorial elections. For a concrete example of how well our models work we use past data to predict the 2018 Arkansas gubernatorial election and use the existing 2018 election data to check our models predictive accuracy. Gubernatorial election data was collected from the Arkansas Secretary of State website while related covariate data was collected from the website for the Federal Reserve Bank of St. Louis. The data we collect is on the county level. For predictive purposes we fit multiple models to the data using Markov chain Monte Carlo and …


Prediction Of High School Graduation With Decision Trees, Andrea M. Lee Aug 2019

Prediction Of High School Graduation With Decision Trees, Andrea M. Lee

MSU Graduate Theses

While working as an educator for the past fourteen years, we are always looking at data and determining ways to help our students. Graduation status is one area of interest. I wanted to apply statistical methods to try and find early indicators of those students who may drop out, thus being able to provide early intervention to those students. With early intervention, we may be able to lower our dropout rate. While studying different methods of pattern recognition, I found that the decision tree method in machine learning was the best for the data that I had collected. Decision trees …


Development Of A Statistical Shape-Function Model Of The Implanted Knee For Real-Time Prediction Of Joint Mechanics, Kalin Gibbons Aug 2019

Development Of A Statistical Shape-Function Model Of The Implanted Knee For Real-Time Prediction Of Joint Mechanics, Kalin Gibbons

Boise State University Theses and Dissertations

Outcomes of total knee arthroplasty (TKA) are dependent on surgical technique, patient variability, and implant design. Non-optimal design or alignment choices may result in undesirable contact mechanics and joint kinematics, including poor joint alignment, instability, and reduced range of motion. Implant design and surgical alignment are modifiable factors with potential to improve patient outcomes, and there is a need for robust implant designs that can accommodate patient variability. Our objective was to develop a statistical shape-function model (SFM) of a posterior stabilized implant knee to instantaneously predict output mechanics in an efficient manner. Finite element methods were combined with Latin …


Probabilistic Models For Order-Picking Operations With Multiple In-The-Aisle Pick Positions, Jingming Liu Aug 2019

Probabilistic Models For Order-Picking Operations With Multiple In-The-Aisle Pick Positions, Jingming Liu

Graduate Theses and Dissertations

The development of probability density functions (pdfs) for travel time of a narrow aisle lift truck (NALT) and an automated storage and retrieval (AS/R) machine is the focus of the dissertation. The multiple in-the-aisle pick positions (MIAPP) order picking system can be modeled as an M/G/1 queueing problem in which storage and retrieval requests are the customers and the vehicle (NALT or AS/R machine) is the server. Service time is the sum of travel time and the deterministic time to pick up and deposit a pallet (TPD).

Our first contribution is the development of travel time pdfs for retrieval operations …


Association Of Copy Number Variations With Chronic Hepatitis B In Chinese Population, Fang Niu Aug 2019

Association Of Copy Number Variations With Chronic Hepatitis B In Chinese Population, Fang Niu

Capstone Experience

With one third of the Hepatitis B virus (HBV) infection population of the world, chronic Hepatitis B (CHB) has become a top burden in China. CHB is a lifelong infection with HBV which can cause serious health problems, like cirrhosis, liver cancer or even death. HBV infection is known to result in various clinical conditions, including asymptomatic HBV carriers to chronic hepatitis and primary hepatocellular carcinoma. Several studies have shown that host genetic susceptibility could be an important factor that determines these various outcomes of HBV infection. Many Single Nucleotide Polymorphisms (SNPs) and Copy Number Variations (CNVs) have been associated …


Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu Aug 2019

Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu

Electronic Theses and Dissertations

An efficient way of analyzing survival clinical data such as cancer data is a great concern to health experts. In this study, we investigate and propose an efficient way of handling survival clinical data. Simulation studies were conducted to compare performances of various forms of survival model techniques using an R package ``survsim". Models performance was conducted with varying sample sizes as small ($n5000$). For small and mild samples, the performance of the semi-parametric outperform or approximate the performance of the parametric model. However, for large samples, the parametric model outperforms the semi-parametric model. We compared the effectiveness and reliability …


Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin Aug 2019

Successful Shot Locations And Shot Types Used In Ncaa Men’S Division I Basketball, Olivia D. Perrin

All NMU Master's Theses

The primary purpose of the current study was to investigate the effect of court location (distance and angle from basket) and shot types used on shot success in NCAA Men’s DI basketball during the 2017-18 season. A secondary purpose was to further expand the analysis based on two additional factors: player position (guard, forward, or center) and team ranking. All statistical analyses were completed in RStudio and three binomial logistic regression analyses were performed to evaluate factors that influence shot success; one for all two and three point shot attempts, one for only two point attempts, and one for only …


Development Of A 1-Dimensional Data Assimilation To Determine Temperature And Relative Humidity Combining Raman Lidar Backscatter Measurements And A Reanalysis Model, Shayamila N. Mahagammulla Gamage Jul 2019

Development Of A 1-Dimensional Data Assimilation To Determine Temperature And Relative Humidity Combining Raman Lidar Backscatter Measurements And A Reanalysis Model, Shayamila N. Mahagammulla Gamage

Electronic Thesis and Dissertation Repository

Water vapor is the most dominant greenhouse gas in Earth's atmosphere. It is highly variable and its variations strongly depend on changes in temperature. Atmospheric water vapor can be expressed as relative humidity (RH), the ratio of the partial pressure of water vapor in the mixture to the equilibrium vapor pressure of water over a flat surface of pure water at a given temperature. Liquid water can exist as super-cooled water for temperatures between 0C to -38C. Thus, RH can be measured either relative to water (RHw) or to ice (RHi). RHi measurements are important in the upper tropospheric region, …


One And Two-Step Estimation Of Time Variant Parameters And Nonparametric Quantiles, Bogdan Gadidov Jul 2019

One And Two-Step Estimation Of Time Variant Parameters And Nonparametric Quantiles, Bogdan Gadidov

Doctor of Data Science and Analytics Dissertations

This dissertation develops and discusses several one-step and two-step smoothing methods of time variant nonparametric quantiles and time variant parameters from probability models. First, we investigate and develop nonparametric techniques for measuring extreme quantiles. The method involves aggregating data by an explanatory variable such as time and smoothing the resulting data with a nonparametric method like kernel, local polynomial or spline smoothing. We demonstrate both in application and simulation that this two-step procedure of quantile estimation is superior to the parametric quantile regression. We then develop a one-step method which combines the strength of maximum likelihood estimation with a local …


Some Recent Developments On Pareto-Optimal Reinsurance, Wenjun Jiang Jul 2019

Some Recent Developments On Pareto-Optimal Reinsurance, Wenjun Jiang

Electronic Thesis and Dissertation Repository

This thesis focuses on developing Pareto-optimal reinsurance policy which considers the interests of both the insurer and the reinsurer. The optimal insurance/reinsurance design has been extensively studied in actuarial science literature, while in early years most studies were concentrated on optimizing the insurer’s interests. However, as early as 1960s, Borch argued that “an agreement which is quite attractive to one party may not be acceptable to its counterparty” and he pioneered the study on “fair” risk sharing between the insurer and the reinsurer. Quite recently, the question of how to strike a balance in risk sharing between an insurer and …


Estimation Of Association Between A Longitudinal Marker And Interval-Censored Progression Times, Naghmeh Daneshi Jul 2019

Estimation Of Association Between A Longitudinal Marker And Interval-Censored Progression Times, Naghmeh Daneshi

Dissertations and Theses

In longitudinal studies, we observe the subjects who are likely to progress to a new state during the study time. For example, in clinical trials the stage of a progressing disease is recorded at each follow-up visit. The primary goal is to estimate the relationship between the attributes and the subject's progression state. In such studies, some subjects complete all their follow-up visits and their progression state are observed without any missingness. However, others miss their follow-up visits and when they come back, they learn that they have progressed to a new state. In this case, not only are their …


Constraining The Oxygen Values Of The Late Cretaceous Western Interior Seaway Using Marine Bivalves, Camille H. Dwyer Jul 2019

Constraining The Oxygen Values Of The Late Cretaceous Western Interior Seaway Using Marine Bivalves, Camille H. Dwyer

Earth and Planetary Sciences ETDs

The Western Interior Seaway (WIS) remains an oceanographic enigma, including its circulation, similarity to the open ocean, and the fidelity of geochemical proxies to reconstruct paleoenvironments. Across the late Campanian and early Maastrichtian I test whether: 1) the WIS had unique δ18OVPDB compared to other marine settings, 2) increasing oceanographic restriction changed the stable isotope composition, and 3) biases, e.g., taxonomy or diagenesis, influenced stable isotope compositions. Results indicate distinct δ18OVPDB in the WIS compared to other marine settings. δ18OVPDB values were stable through time, suggesting insignificant oceanographic restriction and a …


A Deep Learning Approach To Uncertainty Quantification, Mst Afroja Akter Jul 2019

A Deep Learning Approach To Uncertainty Quantification, Mst Afroja Akter

Mathematics & Statistics ETDs

In this thesis we consider ordinary differential equations (ODEs) with random parameters. We focus on Monte Carlo (MC) sampling for computing the statistics of some quantities of interest (QoIs) given by the solution of the ODE problems. We use the 4th order accurate Runge-Kutta (RK4) method as the deterministic ODE solver. We then develop a hybrid MC sampling method that combines RK4 with neural network models to efficiently compute the statistics of QoIs within a desired accuracy. We present several numerical examples to verify the accuracy and efficiency of the proposed hybrid method compared to classical MC sampling. The hybrid …