Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis

PDF

Theses/Dissertations

Institution
Keyword
Publication Year
Publication

Articles 1 - 30 of 76

Full-Text Articles in Statistical Models

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang May 2024

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang

Computational and Data Sciences (PhD) Dissertations

This research introduces an analytical improvement to the Multivariate Ljung-Box test that addresses significant deviations of the original test from the nominal Type I error rates under almost all scenarios. Prior attempts to mitigate this issue have been directed at modification of the test statistics or correction of the test distribution to achieve precise results in finite samples. In previous studies, focused on designing corrections to the univariate Ljung-Box, a method that specifically adjusts the test rejection region has been the most successful of attaining the best Type I error rates. We adopt the same approach for the more complex, …


Imputation Strategies For Different Categories Of Missing Data, Karthik Chalumuri Jan 2024

Imputation Strategies For Different Categories Of Missing Data, Karthik Chalumuri

Honors Theses and Capstones

Addressing missing data in research is crucial for ensuring the reliability and validity of study findings, yet it remains a significant challenge. This study investigates the impact of missing data on research outcomes and explores the underutilization of existing tools for managing missingness, potentially leading to gaps in critical information with tangible implications for decision-making processes (Dziura et al.).

Focusing on the different categories of missing data—Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR)—this research examines various imputation strategies tailored to each category. Specifically, we compare the efficacy of several model-based imputation methods, …


The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin Dec 2023

The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin

Theses and Dissertations

This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …


Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Parameter Estimation For Normally Distributed Grouped Data And Clustering Single-Cell Rna Sequencing Data Via The Expectation-Maximization Algorithm, Zahra Aghahosseinalishirazi Sep 2023

Parameter Estimation For Normally Distributed Grouped Data And Clustering Single-Cell Rna Sequencing Data Via The Expectation-Maximization Algorithm, Zahra Aghahosseinalishirazi

Electronic Thesis and Dissertation Repository

The Expectation-Maximization (EM) algorithm is an iterative algorithm for finding the maximum likelihood estimates in problems involving missing data or latent variables. The EM algorithm can be applied to problems consisting of evidently incomplete data or missingness situations, such as truncated distributions, censored or grouped observations, and also to problems in which the missingness of the data is not natural or evident, such as mixed-effects models, mixture models, log-linear models, and latent variables. In Chapter 2 of this thesis, we apply the EM algorithm to grouped data, a problem in which incomplete data are evident. Nowadays, data confidentiality is of …


On Image Response Regression With High-Dimensional Data, Noah Fuerth Jun 2023

On Image Response Regression With High-Dimensional Data, Noah Fuerth

Major Papers

A recent issue in statistical analysis is modelling data when the effect variable

changes at different locations. This can be difficult to accomplish when the dimensions

of the covariates are very high, and when the domain of the varying coefficient

functions of predictors are not necessarily regular. This research paper will investigate

a method to overcome these challenges by approximating the varying coefficient

functions using bivariate splines. We do this by splitting the domain of the varying

coefficient functions into a number of triangles, and build the bivariate spline functions

based on this triangulation. This major paper will outline detailed …


Comparing Hierarchical Data Structures And Hierarchical Data Analysis, Halley Jeanne Dante, Robert Rovetti May 2023

Comparing Hierarchical Data Structures And Hierarchical Data Analysis, Halley Jeanne Dante, Robert Rovetti

Honors Thesis

Real world data is inherently noisy and data analysis can be especially complex when noise is compounded in hierarchical and multilevel data structures. Since such data structures can be described using multiple approaches, the way data is collapsed and grouped within these structures can influence its resulting interpretation and analyses. To avoid discrepancies in data collapsing and grouping, multiple statistical approaches have been developed specifically to analyze multilevel data structures. Examples of multilevel statistical models are the two-factor ANOVA and the general linear model with repeated-measures (GLM-RR) which is typically used in the context of looking at change over time. …


A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly May 2023

A Probabilistic Exploration Of Food Supplementation And Assistance, Logan Mattingly

Honors College Theses

Food insecurity is a stark threat that grips our country and affects households throughout our country. Dietary insufficiency manifests itself in ways that affect health and public safety. According to researchers, individuals who suffer from food insecurity have a higher risk of aggression, anxiety, suicide ideation and depression. These problems tend to occur unequally distributed among those households with lower income. In this work, an exploratory analysis within these data sets will be performed to examine the socio-economic, biographical, nutritional, and geographical principal components of food insecurity among survey participants and how the US Supplemental Nutrition Assistance Program (SNAP) effects …


Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell May 2023

Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell

Doctoral Dissertations

Preserving existing trees in urban areas and properly cultivating urban forest conservation and management opportunities is valuable to the ever-growing urban environment and necessary for creating optimal experiences and educational tools to meet the needs of increasing urban populations. This dissertation contains studies investigating several facets of the urban forest, including environmental effects of deforestation and urbanization, tree equity, and urban forest facility management and accessibility. Community education and outreach at arboreta about the importance of the tree canopy can help promote environmental stewardship. A digital questionnaire was electronically distributed to representatives of arboreta certified through the Tennessee Division of …


Analyzing Relationships With Machine Learning, Oscar Ko Feb 2023

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


The Impact Of Subjective Risk Analysis On Real Estate Prices In The Nisqually Region Following The 2001 Nisqually Earthquake, Ryan Espedal Jan 2023

The Impact Of Subjective Risk Analysis On Real Estate Prices In The Nisqually Region Following The 2001 Nisqually Earthquake, Ryan Espedal

All Master's Theses

Earthquakes are an environmental hazard that pose great risks to communities almost every day. With earthquakes, the main cause of concern is physical destruction of property, however, there are also psychological effects that are researched and discussed much less. In 2001, the Nisqually area of western Washington experienced a substantial earthquake that produced minimal physical damage but caused a significant decrease in real estate prices. Studying single-family homes from 1986-2012, this research utilizes hedonic property models to measure the change in consumer’s subjective risk calculations with reference to real estate purchases after the Nisqually earthquake, measure the relationship between earthquake …


Modeling Growth And Stress Factors For Converted Silvopasture Systems In The Missouri Ozarks, Bailee N. Suedmeyer Jan 2023

Modeling Growth And Stress Factors For Converted Silvopasture Systems In The Missouri Ozarks, Bailee N. Suedmeyer

MSU Graduate Theses

Silvopasture systems are becoming increasingly popular among sustainable agriculture ranchers, due to the increase in knowledge of benefits to the cattle and ability to grow cool season grasses beneath the canopy. This project focuses on the forest crop aspect of silvopasture systems from monitoring of the health of the trees over time to recommendations for thinning management to keep it functioning as viable silvopasture. The study site consists of five acres of upland hardwood forest area in Southern Missouri with 18 monumented fixed area plots. Arial and ground data was collected at each plot throughout the growing season, along with …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin Aug 2021

Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin

Electronic Theses and Dissertations

In this work, we seek to develop a variable screening and selection method for Bayesian mixture models with longitudinal data. To develop this method, we consider data from the Health and Retirement Survey (HRS) conducted by University of Michigan. Considering yearly out-of-pocket expenditures as the longitudinal response variable, we consider a Bayesian mixture model with $K$ components. The data consist of a large collection of demographic, financial, and health-related baseline characteristics, and we wish to find a subset of these that impact cluster membership. An initial mixture model without any cluster-level predictors is fit to the data through an MCMC …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Evaluating The Efficiency Of Markov Chain Monte Carlo Algorithms, Thuy Scanlon Jul 2021

Evaluating The Efficiency Of Markov Chain Monte Carlo Algorithms, Thuy Scanlon

Graduate Theses and Dissertations

Markov chain Monte Carlo (MCMC) is a simulation technique that produces a Markov chain designed to converge to a stationary distribution. In Bayesian statistics, MCMC is used to obtain samples from a posterior distribution for inference. To ensure the accuracy of estimates using MCMC samples, the convergence to the stationary distribution of an MCMC algorithm has to be checked. As computation time is a resource, optimizing the efficiency of an MCMC algorithm in terms of effective sample size (ESS) per time unit is an important goal for statisticians. In this paper, we use simulation studies to demonstrate how the Gibbs …


Machine Learning Based Restaurant Sales Forecasting, Austin B. Schmidt May 2021

Machine Learning Based Restaurant Sales Forecasting, Austin B. Schmidt

University of New Orleans Theses and Dissertations

To encourage proper employee scheduling for managing crew load, restaurants have a need for accurate sales forecasting. We predict partitions of sales days, so each day is broken up into three sales periods: 10:00 AM-1:59 PM, 2:00 PM-5:59 PM, and 6:00 PM-10:00 PM. This study focuses on the middle timeslot, where sales forecasts should extend for one week. We gather three years of sales between 2016-2019 from a local restaurant, to generate a new dataset for researching sales forecasting methods.

Outlined are methodologies used when going from raw data to a workable dataset. We test many machine learning models on …


A Transdisciplinary Analysis Of Just Transition Pathways To 100% Renewable Electricity, Adewale Aremu Adesanya Jan 2021

A Transdisciplinary Analysis Of Just Transition Pathways To 100% Renewable Electricity, Adewale Aremu Adesanya

Dissertations, Master's Theses and Master's Reports

The transition to using clean, affordable, and reliable electrical energy is critical for enhancing human opportunities and capabilities. In the United States, many states and localities are engaging in this transition despite the lack of ambitious federal policy support. This research builds on the theoretical framework of the multilevel perspective (MLP) of sociotechnical transitions as well as the concept of energy justice to investigate potential pathways to 100 percent renewable energy (RE) for electricity provision in the U.S. This research seeks to answer the question: what are the technical, policy, and perceptual pathways, barriers, and opportunities for just transition to …


Statistical Methods In Genetic Studies, Cheng Gao Jan 2021

Statistical Methods In Genetic Studies, Cheng Gao

Dissertations, Master's Theses and Master's Reports

This dissertation includes three Chapters. A brief description of each chapter is organized as follows.

In Chapter 1, we proposed a new method, called MF-TOWmuT, for genome-wide association studies with multiple genetic variants and multiple phenotypes using family samples. MF-TOWmuT uses kinship matrix to account for sample relatedness. It is worth mentioning that in simulations, we considered hidden polygenic effects and varied the proportion of variance contributed by it to generate phenotypes. Simulation studies show that MF-TOWmuT can preserve the type I error rates and is more powerful than several existing methods in different simulation scenarios, MFTOWmuT is also quite …


Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao Jan 2021

Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao

Theses and Dissertations

Drug addiction can lead to many health-related problems and social concerns. Functional connectivity obtained from functional magnetic resonance imaging (fMRI) data promotes a variety of fundamental understandings in such association. Due to its complex correlation structure and large dimensionality, the modeling and analysis of the functional connectivity from neuroimage are challenging. By proposing a spatio-temporal model for multi-subject neuroimage data, we incorporate voxel-level spatio-temporal dependencies of whole-brain measurements to improve the accuracy of statistical inference. To tackle large-scale spatio-temporal neuroimage data, we develop a computationally efficient algorithm to estimate the parameters. Our method is used to identify functional connectivity and …


Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye Jan 2021

Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye

Theses and Dissertations--Statistics

Clinical trials are often used to assess drug efficacy and safety. Participants are sometimes pre-stratified into different groups by diagnostic tools. However, these diagnostic tools are fallible. The traditional method ignores this problem and assumes the diagnostic devices are perfect. This assumption will lead to inefficient and biased estimators. In this era of personalized medicine and measurement-based care, the issues of bias and efficiency are of paramount importance. Despite the prominence, only few researches evaluated the treatment effect in the presence of misclassifications in some special cases and most others focus on assessing the accuracy of the diagnostic devices. In …


Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman Jan 2021

Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman

Pitzer Senior Theses

This thesis investigates the unique interactions between pregnancy, substance involvement, and race as they relate to the War on Drugs and the hyper-incarceration of women. Using ordinary least square regression analyses and data from the Bureau of Justice Statistics’ 2016 Survey of Prison Inmates, I examine if (and how) pregnancy status, drug use, race, and their interactions influence two length of incarceration outcomes: sentence length and amount of time spent in jail between arrest and imprisonment. The results collectively indicate that pregnancy decreases length of incarceration outcomes for those offenders who are not substance-involved but not evenhandedly -- benefitting white …


Satellite-Based Phenology Analysis In Evaluating The Response Of Puerto Rico And The United States Virgin Islands' Tropical Forests To The 2017 Hurricanes, Melissa Collin Jan 2021

Satellite-Based Phenology Analysis In Evaluating The Response Of Puerto Rico And The United States Virgin Islands' Tropical Forests To The 2017 Hurricanes, Melissa Collin

Cal Poly Humboldt theses and projects

The functionality of tropical forest ecosystems and their productivity is highly related to the timing of phenological events. Understanding forest responses to major climate events is crucial for predicting the potential impacts of climate change. This research utilized Landsat satellite data and ground-based Forest Inventory and Analysis (FIA) plot data to investigate the dynamics of Puerto Rico and the U.S. Virgin Islands’ (PRVI) tropical forests after two major hurricanes in 2017. Analyzing these two datasets allowed for validation of the remote sensing methodology with field data and for the investigation of whether this is an appropriate approach for estimating forest …


Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake Dec 2020

Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake

Theses & Dissertations

Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …


Gene Set Testing By Distance Correlation, Sho-Hsien Su Dec 2020

Gene Set Testing By Distance Correlation, Sho-Hsien Su

Graduate Theses and Dissertations

Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …


A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega Aug 2020

A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega

MSU Graduate Theses

The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero Jan 2020

Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero

Theses and Dissertations

Compositional data (CD) is mostly analyzed as relative data, using ratios of components, and log-ratio transformations to be able to use known multivariable statistical methods. Therefore, CD where some components equal zero represent a problem. Furthermore, when the data is measured longitudinally, observations are spatially related and appear to come from a mixture population, the analysis becomes highly complex. For this matter, a two-part model was proposed to deal with structural zeros in longitudinal CD using a mixed-effects model. Furthermore, the model has been extended to the case where the non-zero components of the vector might a two component mixture …