Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis

PDF

Institution
Keyword
Publication Year
Publication
Publication Type

Articles 31 - 60 of 166

Full-Text Articles in Statistical Models

Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao Jan 2021

Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao

Theses and Dissertations

Drug addiction can lead to many health-related problems and social concerns. Functional connectivity obtained from functional magnetic resonance imaging (fMRI) data promotes a variety of fundamental understandings in such association. Due to its complex correlation structure and large dimensionality, the modeling and analysis of the functional connectivity from neuroimage are challenging. By proposing a spatio-temporal model for multi-subject neuroimage data, we incorporate voxel-level spatio-temporal dependencies of whole-brain measurements to improve the accuracy of statistical inference. To tackle large-scale spatio-temporal neuroimage data, we develop a computationally efficient algorithm to estimate the parameters. Our method is used to identify functional connectivity and …


Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye Jan 2021

Estimating And Testing Treatment Effects With Misclassified Multivariate Data, Zi Ye

Theses and Dissertations--Statistics

Clinical trials are often used to assess drug efficacy and safety. Participants are sometimes pre-stratified into different groups by diagnostic tools. However, these diagnostic tools are fallible. The traditional method ignores this problem and assumes the diagnostic devices are perfect. This assumption will lead to inefficient and biased estimators. In this era of personalized medicine and measurement-based care, the issues of bias and efficiency are of paramount importance. Despite the prominence, only few researches evaluated the treatment effect in the presence of misclassifications in some special cases and most others focus on assessing the accuracy of the diagnostic devices. In …


Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman Jan 2021

Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman

Pitzer Senior Theses

This thesis investigates the unique interactions between pregnancy, substance involvement, and race as they relate to the War on Drugs and the hyper-incarceration of women. Using ordinary least square regression analyses and data from the Bureau of Justice Statistics’ 2016 Survey of Prison Inmates, I examine if (and how) pregnancy status, drug use, race, and their interactions influence two length of incarceration outcomes: sentence length and amount of time spent in jail between arrest and imprisonment. The results collectively indicate that pregnancy decreases length of incarceration outcomes for those offenders who are not substance-involved but not evenhandedly -- benefitting white …


Satellite-Based Phenology Analysis In Evaluating The Response Of Puerto Rico And The United States Virgin Islands' Tropical Forests To The 2017 Hurricanes, Melissa Collin Jan 2021

Satellite-Based Phenology Analysis In Evaluating The Response Of Puerto Rico And The United States Virgin Islands' Tropical Forests To The 2017 Hurricanes, Melissa Collin

Cal Poly Humboldt theses and projects

The functionality of tropical forest ecosystems and their productivity is highly related to the timing of phenological events. Understanding forest responses to major climate events is crucial for predicting the potential impacts of climate change. This research utilized Landsat satellite data and ground-based Forest Inventory and Analysis (FIA) plot data to investigate the dynamics of Puerto Rico and the U.S. Virgin Islands’ (PRVI) tropical forests after two major hurricanes in 2017. Analyzing these two datasets allowed for validation of the remote sensing methodology with field data and for the investigation of whether this is an appropriate approach for estimating forest …


Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake Dec 2020

Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake

Theses & Dissertations

Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …


Gene Set Testing By Distance Correlation, Sho-Hsien Su Dec 2020

Gene Set Testing By Distance Correlation, Sho-Hsien Su

Graduate Theses and Dissertations

Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …


A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega Aug 2020

A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega

MSU Graduate Theses

The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …


Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater Apr 2020

Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater

SMU Data Science Review

In this paper, historical data from a wholesale alcoholic beverage distributor was used to forecast sales demand. Demand forecasting is a vital part of the sale and distribution of many goods. Accurate forecasting can be used to optimize inventory, improve cash ow, and enhance customer service. However, demand forecasting is a challenging task due to the many unknowns that can impact sales, such as the weather and the state of the economy. While many studies focus effort on modeling consumer demand and endpoint retail sales, this study focused on demand forecasting from the distributor perspective. An ensemble approach was applied …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


Interdependence Across Foreign Exchange Rate Markets- A Mixed Copula Approach, Richard Adjei-Boateng Apr 2020

Interdependence Across Foreign Exchange Rate Markets- A Mixed Copula Approach, Richard Adjei-Boateng

Masters Theses & Specialist Projects

The purpose of this thesis is to study the dependence structure of exchange rate pairs using a mixture of copula as opposed to a single copula approach. Mixed copula models have the ability to generate dependence structures that do not belong to existing copula families. The flexibility in choosing component copulas in this mixture model aids the construction of a system that is simultaneously parsimonious and flexible enough to generate most dependence patterns in exchange rate data. Furthermore, the method of mixture copulas facilitates the separation of both the structure and degree of dependence, concepts that are respectively embodied in …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Development Of Gaussian Learning Algorithms For Early Detection Of Alzheimer's Disease, Chen Fang Mar 2020

Development Of Gaussian Learning Algorithms For Early Detection Of Alzheimer's Disease, Chen Fang

FIU Electronic Theses and Dissertations

Alzheimer’s disease (AD) is the most common form of dementia affecting 10% of the population over the age of 65 and the growing costs in managing AD are estimated to be $259 billion, according to data reported in the 2017 by the Alzheimer's Association. Moreover, with cognitive decline, daily life of the affected persons and their families are severely impacted. Taking advantage of the diagnosis of AD and its prodromal stage of mild cognitive impairment (MCI), an early treatment may help patients preserve the quality of life and slow the progression of the disease, even though the underlying disease cannot …


Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace Jan 2020

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace

SMU Data Science Review

In this paper, we present a quantitative approach to model the manufacturer’s suggested retail price (MSRP) for children’s doll- houses and establish relationships among key features that contribute most to establishing MSRP. Determination of the MSRP is a critical step in how consumers respond with their wallets when purchasing an item. KidKraft, a global leader in toys and juvenile products, sets MSRP subjectively using product experts. The process is arduous and time consuming requiring the focus of specialized resources and knowledge of the interaction between key attributes and their impact on consumer value. An accurate prediction of MSRP during the …


Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero Jan 2020

Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero

Theses and Dissertations

Compositional data (CD) is mostly analyzed as relative data, using ratios of components, and log-ratio transformations to be able to use known multivariable statistical methods. Therefore, CD where some components equal zero represent a problem. Furthermore, when the data is measured longitudinally, observations are spatially related and appear to come from a mixture population, the analysis becomes highly complex. For this matter, a two-part model was proposed to deal with structural zeros in longitudinal CD using a mixed-effects model. Furthermore, the model has been extended to the case where the non-zero components of the vector might a two component mixture …


Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …


Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller Nov 2019

Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller

LSU Doctoral Dissertations

Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …


Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley Oct 2019

Classification Of Coronary Artery Disease In Non-Diabetic Patients Using Artificial Neural Networks, Demond Handley

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Analyzing Two-Year College Student Success Using Structural Equation Modeling, Jessica Taylor May 2019

Analyzing Two-Year College Student Success Using Structural Equation Modeling, Jessica Taylor

Graduate Theses, Dissertations, and Capstones

The goal of this study is to more fully understand the scope of community college student success using the principles of mindset, engagement, and college readiness. Using structural equation modeling ensures this study is able to measure the combined effects these concepts have on student success, group differences, and the combined model of student success. Findings suggest student success can be significantly impacted by self-belief and mindset behaviors that can outweigh the initial effect of academically under-prepared students. Groups included in this study are non-traditional students, minority populations, first generation students, and Pell eligible students.


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson Feb 2019

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le Jan 2019

Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le

Honors Theses

Harrington Sound, Bermuda, is a nearly enclosed lagoon acting as a subtropical/tropical, carbonate-rich basin in which carbonate sediments, reef patches, and carbonate-producing organisms accumulate. Here, one of the most important calcareous groups is the Foraminifera. Analyses of common benthic orders, including miliolids (Quinqueloculina and Triloculina spp.) and rotaliids (Homotrema rubrum, Elphidium spp., and Ammonia beccarii), are essential in understanding past and present environmental conditions affecting the island's coastal environment. These taxa have been studied previously; however, factors explaining their individual patterns of abundance in the Sound are not well detailed. The goal of this study is …


Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng Jan 2019

Transforms In Sufficient Dimension Reduction And Their Applications In High Dimensional Data, Jiaying Weng

Theses and Dissertations--Statistics

The big data era poses great challenges as well as opportunities for researchers to develop efficient statistical approaches to analyze massive data. Sufficient dimension reduction is such an important tool in modern data analysis and has received extensive attention in both academia and industry.

In this dissertation, we introduce inverse regression estimators using Fourier transforms, which is superior to the existing SDR methods in two folds, (1) it avoids the slicing of the response variable, (2) it can be readily extended to solve the high dimensional data problem. For the ultra-high dimensional problem, we investigate both eigenvalue decomposition and minimum …


Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos Jan 2019

Composite Nonparametric Tests In High Dimension, Alejandro G. Villasante Tezanos

Theses and Dissertations--Statistics

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Essays On Mixture Models, Trevor R. Camper Jan 2019

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric issue known as insufficient …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …