Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 96

Full-Text Articles in Physical Sciences and Mathematics

Development Of An App For The Kalamazoo Nature Center, Ernest Au Dec 2023

Development Of An App For The Kalamazoo Nature Center, Ernest Au

Honors Theses

Kalamazoo Nature Center (KNC), which has been recognized by its peers as one of the top nature centers in the country, is home to over 14 miles of hiking trails winding through woods, wetlands, and prairies. There are numerous places/plots in KNC that have an interesting and impressive history besides being home to a variety of animals and hundreds of wildflowers and other plant life. To improve the visitor’s experience at KNC, we will design a software app via the senior capstone project at the department of Computer Science at WMU. As the first step towards establishing a reference model …


Nonparametric Tests For Replicated Latin Squares, Joseph Yang Jun 2023

Nonparametric Tests For Replicated Latin Squares, Joseph Yang

Dissertations

Two classes of nonparametric procedures for a replicated Latin square design that test for both general and increasing alternatives are developed. The two classes of procedures are similar in the sense that both transform the data so that existing well-known tests for randomized complete block designs can be utilized. On the other hand, the two classes differ in the way that the data is transformed - one class essentially aggregates the data while the other class aligns the data. Within these contexts, the exact distributions and asymptotic distributions are discussed, when applicable. The exact distributions are easily computed using the …


Learning Finite Mixture Of Ising Graphical Models, Chong Gu Jun 2023

Learning Finite Mixture Of Ising Graphical Models, Chong Gu

Dissertations

The Ising model is valuable in examining complex interactions within a system, but its estimation is challenging. In this work, we proposed penalized likelihood procedures to infer conditional dependence structure when observed data come from heterogeneous resources in high-dimensional setting. The proposed method can be efficiently implemented by taking advantage of coordinate-ascent, minorization–maximization principles and EM algorithm. A BIC-type criterion will be utilized for the selection of the tuning parameter in the penalized likelihood approaches. The effectiveness of the proposed method is supported by simulation studies and a real-world example.


Evaluating The Performance Of Estimators In Sem And Irt With Ordinal Variables, Bo Klauth Jun 2023

Evaluating The Performance Of Estimators In Sem And Irt With Ordinal Variables, Bo Klauth

Dissertations

In conducting confirmatory factor analysis with ordered response items, the literature suggests that when the number of responses is five and item skewness (IS) is approximately normal, researchers can employ maximum likelihood with robust standard errors (MLR). However, MLR can yield biased factor loadings (FL) and FL standard errors (FLSE) when the variables are ordinal. Other estimators are available. Unweighted least squares and weighted least squares with adjusted mean and variance (ULSMV and WLSMV) are known as the estimators for CFA with ordinal variables (CFA-OV). Another estimator, marginal maximum likelihood (MML), is used in the item response theory (IRT), specifically …


Functional Generalized Linear Mixed Models, Harmony Luce Jun 2023

Functional Generalized Linear Mixed Models, Harmony Luce

Dissertations

With the advancements in data collection technologies, researchers in various fields such as epidemiology, chemometrics, and environmental science face the challenges of obtaining useful information from more detailed, complex, and intricately-structured data. Since the existing methods often are not suitable for such data, new statistical methods are developed to accommodate the complicated data structures.

As a part of such efforts, this dissertation proposes Functional Generalized Linear Mixed Model (FGLMM), which extends classical generalized linear mixed models to include functional covariates. Functional Data Analysis (FDA) is a rapidly developing area of statistics for data which can be naturally viewed as smooth …


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …


Statistical Clustering Of Networks With Additional Information, Paul Atandoh Apr 2023

Statistical Clustering Of Networks With Additional Information, Paul Atandoh

Dissertations

As the online market grows rapidly, many companies and researchers are interested in analyzing product review dataset which includes ratings and text review data. In the first project, we mainly focus on analyzing the text review data. In the current literature, it is common to use only text analysis tools to analyze review dataset. But in our work, we propose a method that utilizes both a text analysis method such as topic modeling and a statistical network model to build network among individuals and find interesting communities. We introduce a promising framework that incorporates topic modeling technique to define the …


Active Vs Passive Investing, Garret Buchheit Dec 2022

Active Vs Passive Investing, Garret Buchheit

Honors Theses

With the increased popularity of passive investing, the long-term investment success of active management is being questioned more frequently. For this reason, this research seeks to find whether actively managed funds produce sufficient returns that cover the fees and management costs associated with them. A comparative analysis was made with 5401 actively managed U.S. mutual funds and several common market indices over three, five, and ten-year time spans ranging from 2012 to 2021. Additionally, an analysis was made comparing active and passive management in the recessionary period of 2007 to 2009. Finally, analysis was conducted on annual holdings turnover rates …


Mixture Of Functional Graphical Models, Qihai Liu Jun 2022

Mixture Of Functional Graphical Models, Qihai Liu

Dissertations

With the development of data collection technologies that use powerful monitoring devices and computational tools, many scientific fields are now obtaining more detailed and more complicatedly structured data, e.g., functional data. This leads to increasing challenges of extracting information from the large complex data. Making use of these data to gain insight into complex phenomena requires characterizing the relationships among a large number of functional variables. Functional data analysis (FDA) is a rapidly developing area of statistics for data which can be naturally viewed as a smooth curve or function. It is a method that changes the frame of data …


Estimation Of Odds Ratio In 2 X 2 Contingency Tables With Small Cell Counts, Guohao Zhu Oct 2021

Estimation Of Odds Ratio In 2 X 2 Contingency Tables With Small Cell Counts, Guohao Zhu

Dissertations

This study is focusing on properties of estimators of odds ratio or its logarithm in case of 2x2 tables with small counts. The odds ratio represents the odds that an outcome of interest will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. Both parameters are often used to quantify the strength of association of two binary variables and are common measurements reported in case-control, cohort, and cross-sectional studies.

Because of their wide applicability, both parameters, odds ratio, and its logarithm, have been intensively studied in the literature. However, most …


On Simes’S Second Conjecture: An Extended Single-Step Simes Test Procedure For Multiple Testing, Matthew G. Hudson Dec 2020

On Simes’S Second Conjecture: An Extended Single-Step Simes Test Procedure For Multiple Testing, Matthew G. Hudson

Dissertations

One of the major concerns with multiple tests of significance is controlling the family wise error rate. Various methods have been developed to ensure that the false positive rate be maintained at some prespecified level. One of the most well know being the Bonferroni procedure. Simes presented an improved Bonferroni procedure for testing the global hypothesis that is more powerful and less conservative, especially with positively correlated tests. While Simes’s procedure is more powerful, it does not allow for making inferences on the individual hypotheses. However, the Simes procedure has since become the foundation of many p-value based multiple testing …


Statistical Properties And Applications Of Press Statistic, Ida Marie Alcantara Jun 2020

Statistical Properties And Applications Of Press Statistic, Ida Marie Alcantara

Dissertations

The most popularly used statistic R2 has a fundamental weakness in model building: it favors adding more predictors to the model because R2 can only increase. In effect, the additional predictors start fitting the noise in data. Other criterion in selecting a regression model such as R2 adj , AIC, SBC, and Mallow’s Cp does not guarantee the model selected will also make better prediction of future values. To avoid this, data scientists withhold a percentage of the data for validation purposes. The PRESS statistic does something similar by withholding each observation in calculating its own …


Statistical Models For Correlated Data, Xiaomeng Niu Apr 2018

Statistical Models For Correlated Data, Xiaomeng Niu

Dissertations

Correlated data arise frequently in many studies where multiple response variables or repeatedly measured responses within subjects are correlated. My dissertation topic lies broadly in developing various statistical methodologies for correlated types of data such as longitudinal data, clustered data, and multivariate data.

Multiple response variables might be relevant within subjects. A univariate procedure fitting each response separately does not take into account the correlation among responses. To improve estimation efficiency for the regression parameter, this study proposes two estimation procedures by accommodating correlations among the response variables. The proposed procedures do not require knowledge of the true correlation structure …


Statistical Properties Of Population Stability Index, Bilal Yurdakul Apr 2018

Statistical Properties Of Population Stability Index, Bilal Yurdakul

Dissertations

Population stability is an important concept in model management. It is crucial to monitor whether the current population has changed from the population used during development of a model. For example, has the distribution of credit scores changed, and is the existing credit score model still valid? Population change may occur for many reasons–change in the economic environment, strategic change in the business, policy changes within the company, or changes in regulatory environment.

The population stability index (PSI) is a statistic that measures how much a variable has shifted over time, and is used to monitor applicability of a statistical …


Denoising Large Neuroimage Mri Data Using Spatial Random Effect Models, Leonard Chukuma Johnson Apr 2018

Denoising Large Neuroimage Mri Data Using Spatial Random Effect Models, Leonard Chukuma Johnson

Dissertations

Spatial smoothing in Magnetic Resonance image (MRI) involves applying a filter to remove high frequency information and consequently improves signal-to-noise ratio that can greatly aid neurosurgeons in pre-surgical planning stages of tumor resection. This immensely reduces the time spent on Electrical stimulation mapping (ESM) prior to surgery. MRI's three-dimensional data provides voxel intensities with complex spatial relationship. The standard de facto spatial smoothing method, Gaussian Kernel smoothing, is satisfactory since a uniform smoothing is done for the whole brain. Secondly, the kernel smoothing technique assumes normality for the voxel intensity, but there is ample evidence in current research that indicates …


Statistical And Clinical Equivalence Of Measurements, Puntipa Wanitjirattikal Dec 2017

Statistical And Clinical Equivalence Of Measurements, Puntipa Wanitjirattikal

Dissertations

This study proposes a test for statistical equivalence of two measurements. Typically, a new measurement process Υ is compared to an existing or standard measurement process Χ. We are assuming that Χ and Υ are measurements on the same scale. The paired t-test may be used to check for significant difference between (Χ, Υ) pairs. However, the paired t-test is intended to detect shift-type relationships of the form Υ=Χ+δ1 and may have low power for scale-type relations of the form ΥΧ.

We propose a test that has reasonable power to …


Diagnostics For Choosing Between Stratified Logrank And Stratified Wilcoxon, Jhoanne Marsh C. Gatpatan Dec 2017

Diagnostics For Choosing Between Stratified Logrank And Stratified Wilcoxon, Jhoanne Marsh C. Gatpatan

Dissertations

Martinez and Naranjo (2010) proposed a pretest for choosing between Logrank or Wilcoxon test in a two - sample case. However, in the presence of covariates, comparing two populations without adjusting for covariates would yield misleading results. In this study, we propose several pretests that will help the analyst decide to use stratified Logrank or stratified Wilcoxon tests in comparing two survival curves after covariates have been taken into account. Power performance of each adaptive test was done through simulations under PH and non-PH cases.


Development Of Traditional And Rank-Based Algorithms For Linear Models With Autoregressive Errors And Multivariate Logistic Regression With Spatial Random Effects, Shaofeng Zhang Jun 2017

Development Of Traditional And Rank-Based Algorithms For Linear Models With Autoregressive Errors And Multivariate Logistic Regression With Spatial Random Effects, Shaofeng Zhang

Dissertations

Linear models are the most commonly used statistical methods in many disciplines. One of the model assumptions is that the error terms (residuals) are independent and identically distributed. This assumption is often violated and autoregressive error terms are often encountered by researchers. The most popular technique to deal with linear models with autoregressive errors is perhaps the autoregressive integrated moving average (ARIMA). Another common approach is generalized least squares, such as Cochrane-Orcutt estimation and Prais-Winsten estimation. However, these usually have poor behaviors when fitting small samples. To address this problem, a double bootstrap method was proposed by McKnight et al. …


Spatial Analysis Of Time Between Two Consecutive Dental And Two Consecutive Well-Child Visits For Foster Care Youth, Chenyang Shi Jun 2017

Spatial Analysis Of Time Between Two Consecutive Dental And Two Consecutive Well-Child Visits For Foster Care Youth, Chenyang Shi

Dissertations

Foster care youth is a medically vulnerable population. Poor dental health and irregular well-child visits may cause serious health-related issues, such as mental disorder, nutrition imbalance, tooth damage, etc. Michigan requires all youth in foster care to receive annual dental and well-child visits. Usually, the study of foster care well-child and dental visits include two parts: time between two consecutive visits (gap time) and number of visits. For this study, a longitudinal-spatial model that has the flexibility to analyze the well-child/dental gap times and number of visits was developed. The longitudinal data (2009-2012) on Michigan foster care youth from 10 …


The Technological Revolution And Data Science, Leslie Walcott Apr 2017

The Technological Revolution And Data Science, Leslie Walcott

Honors Theses

What was once only depicted in science fiction is now a reality: computers are taking jobs from humans. As technology improves, automation is transforming the workplace. They say a “fourth industrial revolution” is inevitable within the next ten years. In the industrial revolution, the jobs lost were unskilled laborers, such as coal miners, textiles manufacturers, or cotton workers. There was no argument for whether or not a machine could do the jobs more efficiently--it was fact. The term technological unemployment means the loss of jobs caused by technological change. The headline, “Factory workers replaced by automation,” is not particularly startling …


Subgroup Analysis And Growth Curve Models For Longitudinal Data, Nichole Andrews Apr 2017

Subgroup Analysis And Growth Curve Models For Longitudinal Data, Nichole Andrews

Dissertations

In clinical trials and biomedical studies, treatments are compared to determine which one is effective against illness. Growth curve analysis can be beneficial in longitudinal biomedical studies, as we can evaluate the treatment effect on the response over time. The generalized growth curve model using polynomial regression is proposed for longitudinal data. An optimal degree for the polynomial is obtained using the BIQIF, an adaptation of the Bayesian information criterion. Quadratic inference functions are used to estimate the parameters of the model, which takes into account the fact that repeated measurements from the same subject are more likely to be …


Some Nonparametric Ordered Restricted Inference Problems In The Context Of A Statistical Education Study, Bradford M. Dykes Aug 2016

Some Nonparametric Ordered Restricted Inference Problems In The Context Of A Statistical Education Study, Bradford M. Dykes

Dissertations

Over the past 10 years, the Department of Statistics at Western Michigan University has developed a question generating system that can be used for creating multiple forms of exams, quizzes and homework for online and face-to-face use. This system can also be used to provide students with a form of instantaneous feedback. With the goal of analyzing how different levels of feedback in an online learning environment impacts students' performance on assignments, this study presents data collected on two semesters of students enrolled in three different meeting types (strictly online, typical face-to-face, and honors face-to-face) of an introductory Statistics course. …


Statistical Methodology For Data With Multiple Limits Of Detection, Robert M. Flikkema Jun 2016

Statistical Methodology For Data With Multiple Limits Of Detection, Robert M. Flikkema

Dissertations

Limitations of instruments used to collect continuous data sometimes lead to obtaining observations lower than a limit of detection. These observations are known as nondetects. They could be zeroes, or positive numbers, but they are too small to be recorded by a measuring device. Nondetects frequently occur in environmental data. Trace amounts of chemicals can exist in soil or groundwater and are undetectable by a machine reading. These observations pose a problem to researchers since the true values are unknown.

Simulations in the literature have led to inconsistent conclusions regarding what estimation technique to use with nondetect data when estimating …


Bivariate Negative Binomial Hurdle With Random Spatial Effects, Robert Mcnutt Apr 2016

Bivariate Negative Binomial Hurdle With Random Spatial Effects, Robert Mcnutt

Dissertations

Count data with excess zeros widely occur in ecology, epidemiology, marketing, and many other disciplines. Mixture distributions consisting of a point mass at zero and a separate discrete distribution are often employed in regression models to account for excessive zero observations in the data. While Poisson models are very popular for count data, Negative Binomial models provide greater flexibility due to their ability to account for overdispersion.

This research focuses on developing a method for analyzing bivariate count data with excess zeros collected over a lattice. A bivariate Zero-Inflated Negative Binomial Hurdle (ZINBH) regression model with spatial random effects is …


Empirical Evaluation Of Different Features Of Design In Confirmatory Factor Analysis, Deyab Almaleki Apr 2016

Empirical Evaluation Of Different Features Of Design In Confirmatory Factor Analysis, Deyab Almaleki

Dissertations

Factor analysis (FA) is the study of variance within a group. Within-subject variance (WSV) is affected by multiple features in a study context, such as: the study experimental design (ED) and sampling design (SD), thus anything that influences or changes variance may affect the conclusions related to FA.

The aim of this study was to provide empirical evaluation of the influence of different aspects of ED and SD on WSV in the context of FA in terms of model precision and model estimate stability. Four Monte Carlo population correlation matrices were hypothesized based on different communality magnitudes (high, moderate, low, …


Rank Based Procedures For Ordered Alternative Models, Yuanyuan Shao Dec 2015

Rank Based Procedures For Ordered Alternative Models, Yuanyuan Shao

Dissertations

The ordered alternatives in a one-way layout with k ordered treatment levels are appropriate for many applications, especially in psychology and medicine. There is extensive literature in this area, and many parametric and nonparametric approaches have been introduced. Abelson-Tukey (AT) test is a frequently used parametric method. Its coefficients provide an ideal way of combining means for the purpose of detecting a monotonic relationship between the independent and dependent variables. The AT method, though, is not robust. Furthermore, our initial empirical studies show that it is not more powerful than the Jonckheere-Terpstra (JT) and the Hettmansperger- Norton (HN) nonparametric tests …


Failing To Replicate: Hypothesis Testing As A Crucial Key To Make Direct Replications More Credible And Predictable, Pedro Fernando Mateu Bullón May 2015

Failing To Replicate: Hypothesis Testing As A Crucial Key To Make Direct Replications More Credible And Predictable, Pedro Fernando Mateu Bullón

Dissertations

Theory cannot be fully validated unless the original results have been replicated, resulting in conclusion consistency. Replications are the strongest source to verify research findings and knowledge claims. Sciences such as medicine, chemistry, physics, genetics, and biology, are considered successful because their knowledge claims are buttressed by a large set of replications of original studies. Unfortunately in the social sciences many attempts to replicate fail and thus there is a continuing need for replication studies to confirm facts, expand knowledge to gain new understanding, and verify hypotheses. Two plausible explanations for the failure to replicate in the social sciences could …


Three Essays On Panel Data Estimation, Alexander Houser May 2015

Three Essays On Panel Data Estimation, Alexander Houser

Dissertations

This work discusses various aspects of panel data estimation. In chapter one, an algorithm for semiparametric random effects estimation is proposed. The performance of bootstrap-based confidence intervals for the proposed estimators are examined and found reasonable. The algorithm is also applied to a set of U.S. state level medical expenditure data to estimate the medical Engel curve. In the second chapter, the predictive performance of various parametric and semiparametric panel data estimators is compared on the same dataset of U.S. state level medical expenditures as well as out of sample forecast performance and bootstrap bias-corrected mean square errors of the …


Lnference On Differences In K Means For Data With Excess Zeros And Detection Limits, Haolai Jiang Dec 2014

Lnference On Differences In K Means For Data With Excess Zeros And Detection Limits, Haolai Jiang

Dissertations

Many data have excess zeros or unobservable values falling below detection limit. For example, data on hospitalization costs incurred by members of a health insurance plan will have zeros for the percentage who did not get sick. Benzene exposure measurements on petroleum re nery workers have some exposures fall below the limit of detection. Traditional methods of inference like one-way ANOVA are not appropriate to analyze such data since the point mass at zero violates typical distribution assumptions.

For testing for equality of means of k distributions, we will propose a likelihood ratio test that accounts for excess zeros or …


Comparison Of Hazard, Odds And Risk Ratio In The Two-Sample Survival Problem, Benedict P. Dormitorio Aug 2014

Comparison Of Hazard, Odds And Risk Ratio In The Two-Sample Survival Problem, Benedict P. Dormitorio

Dissertations

Cox proportional hazards is the standard method for analyzing treatment efficacy when time-to-event data is available. In the absence of time-to-event, investigators may use logistic regression which only requires relative frequencies of events, or Poisson regression which requires only interval-summarized frequency tables of time-to-event. When event frequencies are used instead of time-to-events, does it always result in a loss in power?

We investigate the relative performance of the three methods. In particular, we compare the power of tests based on the respective effect-size estimates (1)hazard ratio (HR), (2)odds ratio (OR), and (3)risk ratio (RR). We use a variety of survival …