Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

Correlation

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 31

Full-Text Articles in Physical Sciences and Mathematics

Significant Predictors Of Suicide Rates In The United States: A Multiple Regression Analysis, Alexa L. Darak, Gary Popoli May 2024

Significant Predictors Of Suicide Rates In The United States: A Multiple Regression Analysis, Alexa L. Darak, Gary Popoli

Undergraduate Research Journal for the Human Sciences

Inspired by Stack's (2021) research, this study investigated the influence of 19 variables on suicide rates across all 50 United States. The variables included political party, gun ownership, registered guns, religion, alcohol consumption, state safety, depression, marriage, divorce, domestic violence, race, mean elevation, and region. Regression analyses revealed that gun ownership significantly impacts suicide rates, with stricter firearm laws correlating with lower suicide rates. Other crucial contributors to suicide risk were alcohol consumption, domestic violence, marital status, divorce, mean elevation, and political party affiliation. The five most statistically significant predictor variables were gun ownership, divorce rates, percentage of White individuals, …


Multicollinearity Applied Stepwise Stochastic Imputation: A Large Dataset Imputation Through Correlation‑Based Regression, Benjamin D. Leiby, Darryl K. Ahner Feb 2023

Multicollinearity Applied Stepwise Stochastic Imputation: A Large Dataset Imputation Through Correlation‑Based Regression, Benjamin D. Leiby, Darryl K. Ahner

Faculty Publications

This paper presents a stochastic imputation approach for large datasets using a correlation selection methodology when preferred commercial packages struggle to iterate due to numerical problems. A variable range-based guard rail modification is proposed that benefits the convergence rate of data elements while simultaneously providing increased confidence in the plausibility of the imputations. A large country conflict dataset motivates the search to impute missing values well over a common threshold of 20% missingness. The Multicollinearity Applied Stepwise Stochastic imputation methodology (MASS-impute) capitalizes on correlation between variables within the dataset and uses model residuals to estimate unknown values. Examination of the …


Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts Jan 2023

Assessing Spurious Correlations In Big Search Data, Jesse T. Richman, Ryan J. Roberts

Political Science & Geography Faculty Publications

Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random …


Spline Modeling And Localized Mutual Information Monitoring Of Pairwise Associations In Animal Movement, Andrew Benjamin Whetten May 2022

Spline Modeling And Localized Mutual Information Monitoring Of Pairwise Associations In Animal Movement, Andrew Benjamin Whetten

Theses and Dissertations

to a new era of remote sensing and geospatial analysis. In environmental science and conservation ecology, biotelemetric data recorded is often high-dimensional, spatially and/or temporally, and functional in nature, meaning that there is an underlying continuity to the biological process of interest. GPS-tracking of animal movement is commonly characterized by irregular time-recording of animal position, and the movement relationships between animals are prone to sudden change. In this dissertation, I propose a spline modeling approach for exploring interactions and time-dependent correlation between the movement of apex predators exhibiting territorial and territory-sharing behavior. A measure of localized mutual information (LMI) is …


Generalized Ratio-Cum-Product Estimator For Finite Population Mean Under Two-Phase Sampling Scheme, Gajendra Kumar Vishwakarma, Sayed Mohammed Zeeshan Jun 2021

Generalized Ratio-Cum-Product Estimator For Finite Population Mean Under Two-Phase Sampling Scheme, Gajendra Kumar Vishwakarma, Sayed Mohammed Zeeshan

Journal of Modern Applied Statistical Methods

A method to lower the MSE of a proposed estimator relative to the MSE of the linear regression estimator under two-phase sampling scheme is developed. Estimators are developed to estimate the mean of the variate under study with the help of auxiliary variate (which are unknown but it can be accessed conveniently and economically). The mean square errors equations are obtained for the proposed estimators. In addition, optimal sample sizes are obtained under the given cost function. The comparison study has been done to set up conditions for which developed estimators are more effective than other estimators with novelty. The …


An Assessment Of Terrestrial Decapoda Diversity Across Three Ecological Zones In Mida Creek, Kenya, Reese Yount Apr 2021

An Assessment Of Terrestrial Decapoda Diversity Across Three Ecological Zones In Mida Creek, Kenya, Reese Yount

Independent Study Project (ISP) Collection

Mangroves make up one of the most effective natural remedies at combating climate change today. They represent great commercial interest worldwide and yet, are being degraded at an unsustainable rate. If successful mangrove conservation plans are to be implemented for our posterity, mangrove ecosystems need to be better understood at the community level. Mangrove crabs make up the most diverse and populace mangrove inhabitants. They are classified as ecosystem engineers and their potential for being used as bioindicators makes them integral to assessing mangrove health. Yet, their diversity and distribution patterns are not well understood. The aim of this study …


Taking Multiple Regression Analysis To Task: A Review Of Mindware: Tools For Smart Thinking, By Richard Nisbett (2015), Jason Makansi Jul 2019

Taking Multiple Regression Analysis To Task: A Review Of Mindware: Tools For Smart Thinking, By Richard Nisbett (2015), Jason Makansi

Numeracy

Richard Nisbett. 2015. Mindware: Tools for Smart Thinking.(New York, NY: Farrar, Strauss, and Giroux). 336 pp. ISBN: 9780374536244

Nisbett, a psychologist, may not achieve his stated goal of teaching readers to “effortlessly” extend their common sense when it comes to quantitative analysis applied to everyday issues, but his critique of multiple regression analysis (MRA) in the middle chapters of Mindware is worth attention from, and contemplation by, the QL/QR and Numeracy community. While in at least one other source, Nisbett’s critique has been called a “crusade” against MRA, what he really advocates is that it not be used as …


Lack Of Vaccination Risks, Abigail Sebunia Jan 2019

Lack Of Vaccination Risks, Abigail Sebunia

Williams Honors College, Honors Research Projects

This paper is a study regarding how vaccination rates are related to the number of measles cases that occur in a particular state. First, I will review the history of vaccines and the motivations for the refusals of this medical procedure. In addition, I will examine the various regulations regarding vaccinations and which states allow non-medical exemptions for religious or personal reasons. Within my analysis, I will provide examples of recent outbreaks to represent the extent of this current dilemma. Furthermore, I will offer potential solutions to mitigate measles outbreaks and the science regarding Herd Immunity Thresholds (HIT) to limit …


On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar Mar 2018

On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar

FIU Electronic Theses and Dissertations

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …


A Monte Carlo Study Of The Effects Of Variability And Outliers On The Linear Correlation Coefficient, Hussein Yousif Eledum Dec 2017

A Monte Carlo Study Of The Effects Of Variability And Outliers On The Linear Correlation Coefficient, Hussein Yousif Eledum

Journal of Modern Applied Statistical Methods

Monte Carlo simulations are used to investigate the effect of two factors, the amount of variability and an outlier, on the size of the Pearson correlation coefficient. Some simulation algorithms are developed, and two theorems for increasing or decreasing the amount of variability are suggested.


Annuity Product Valuation And Risk Measurement Under Correlated Financial And Longevity Risks, Soohong Park Aug 2017

Annuity Product Valuation And Risk Measurement Under Correlated Financial And Longevity Risks, Soohong Park

Electronic Thesis and Dissertation Repository

Longevity risk is a non-diversifiable risk and regarded as a pressing socio-economic challenge of the century. Its accurate assessment and quantification is therefore critical to enable pension-fund companies provide sustainable old-age security and maintain a resilient global insurance market. Fluctuations and a decreasing trend in mortality rates, which give rise to longevity risk, as well as the uncertainty in interest-rate dynamics constitute the two fundamental determinants in pricing and risk management of longevity-dependent products. We also note that historical data reveal some evidence of strong correlation between mortality and interest rates and must be taken into account when modelling their …


Multivariate Rank Outlyingness And Correlation Effects, Olusola Samuel Makinde May 2017

Multivariate Rank Outlyingness And Correlation Effects, Olusola Samuel Makinde

Journal of Modern Applied Statistical Methods

The effect of correlation on multivariate rank outlyingness, a result of deviation of multivariate rank functions from property of spherical symmetry, is examined. Possible affine invariant versions of this multivariate rank are surveyed, and outlyingness of affine invariant and non-invariant spatial rank functions under general affine transformation are compared.


Control Charts For Mean For Non-Normally Correlated Data, J. R. Singh, Ab Latif Dar May 2017

Control Charts For Mean For Non-Normally Correlated Data, J. R. Singh, Ab Latif Dar

Journal of Modern Applied Statistical Methods

Traditionally, quality control methodology is based on the assumption that serially-generated data are independent and normally distributed. On the basis of these assumptions the operating characteristic (OC) function of the control chart is derived after setting the control limits. But in practice, many of the basic industrial variables do not satisfy both the assumptions and hence one may doubt the validity of the inferences drawn from the control charts. In this paper the power of the control chart for the mean is examined when both the assumptions of independence and normality are not tenable. The OC function is calculated and …


Effect Of Correlations On Type 1 Error Rates Of Some Multivariate Normality Tests, Gbenga Sunday Solomon, Kayode Ayinde, Nurudeen Abiodun Alao Jan 2017

Effect Of Correlations On Type 1 Error Rates Of Some Multivariate Normality Tests, Gbenga Sunday Solomon, Kayode Ayinde, Nurudeen Abiodun Alao

Conference on Applied Statistics in Agriculture

Normality assumption of multivariate data is a prerequisite to the use of multivariate statistical data analysis methods before inference could be valid and reliable. Tests developed to validate this assumption including Doornik-Harsen (DH), Shapiro-Francia (SF), Mardia Skewness (MS), Mardia Skewness for small sample (MSS) and Kurtosis (MK), Skewness (S) and Kurtosis(K), Shapiro-Wilk(SW), Royston (R), Desgagne-Micheaux (DM), Henze-Zirkler (HZ), Energy (E), Gel-Gastwirth (GG) and Bontemps-Meddahi (BM) tests often result into different conclusions. These differences can be misleading. Consequently, this paper examined the effect of correlations on the Type 1 error rates of multivariate tests of normality. Monte Carlo experiments were conducted …


A New Test For Correlation On Bivariate Nonnormal Distributions, Ping Wang, Ping Sa Nov 2016

A New Test For Correlation On Bivariate Nonnormal Distributions, Ping Wang, Ping Sa

Journal of Modern Applied Statistical Methods

A new method to conduct a right-tailed test for the correlation on bivariate non-normal distribution is proposed. The comparative simulation study shows that the new test controls the type I error rates well for all the distributions considered. An investigation of the power performance is also provided.


Recent Periods Of Financial Turbulence On The Russian Stock Market And Their Effect On Price Correlation And Value At Risk, Alexander Logoveev, Gregory Cherinko Apr 2015

Recent Periods Of Financial Turbulence On The Russian Stock Market And Their Effect On Price Correlation And Value At Risk, Alexander Logoveev, Gregory Cherinko

Undergraduate Economic Review

The aim of this article is to observe and analyze the recent periods of financial turbulence on the Russian stock market and determine their influence on the correlation coefficients between asset prices and the Value at Risk measure for a portfolio. Our task was to describe the previously observed phenomenon of correlation enlargement during times of financial crises deemed in our research as separate Black Swans. Based on up-to-date financial data analysis we determined correlation trends that can be useful in risk management and applied the Value at Risk method.


An L-Moment Based Characterization Of The Family Of Dagum Distributions, Mohan D. Pant, Todd C. Headrick Sep 2013

An L-Moment Based Characterization Of The Family Of Dagum Distributions, Mohan D. Pant, Todd C. Headrick

Mohan Dev Pant

This paper introduces a method for simulating univariate and multivariate Dagum distributions through the method of L-moments and L-correlation. A method is developed for characterizing non-normal Dagum distributions with controlled degrees of L-skew, L-kurtosis, and L-correlations. The procedure can be applied in a variety of contexts such as statistical modeling (e.g., income distribution, personal wealth distributions, etc.) and Monte Carlo or simulation studies. Numerical examples are provided to demonstrate that -moment-based Dagum distributions are superior to their conventional moment-based analogs in terms of estimation and distribution fitting. Evaluation of the proposed method also demonstrates that the estimates of L-skew, L-kurtosis, …


An L-Moment Based Characterization Of The Family Of Dagum Distributions, Mohan D. Pant, Todd C. Headrick Jan 2013

An L-Moment Based Characterization Of The Family Of Dagum Distributions, Mohan D. Pant, Todd C. Headrick

Todd Christopher Headrick

This paper introduces a method for simulating univariate and multivariate Dagum distributions through the method of 𝐿-moments and 𝐿-correlations. A method is developed for characterizing non-normal Dagum distributions with controlled degrees of 𝐿-skew, 𝐿-kurtosis, and 𝐿-correlations. The procedure can be applied in a variety of contexts such as statistical modeling (e.g., income distribution, personal wealth distributions, etc.) and Monte Carlo or simulation studies. Numerical examples are provided to demonstrate that 𝐿-moment-based Dagum distributions are superior to their conventional moment-based analogs in terms of estimation and distribution fitting. Evaluation of the proposed method also demonstrates that the estimates of 𝐿-skew, 𝐿-kurtosis, …


Covariance-Enhanced Discriminant Analysis, Peirong Xu, Ji Zhu, Lixing Zhu, Yi Li Jan 2013

Covariance-Enhanced Discriminant Analysis, Peirong Xu, Ji Zhu, Lixing Zhu, Yi Li

The University of Michigan Department of Biostatistics Working Paper Series

Linear discriminant analysis (LDA), a classical method in pattern recognition and machine learning, has been widely used to characterize or separate multiple classes via linear combinations of features. However, the high-dimensionality of the high-throughput features obtained from modern biological experiments, for example, microarray or proteomics, defies traditional discriminant analysis techniques. The possible interfeature correlations present additional challenges and are often under-utilized in modeling. In this paper, by incorporating the possible inter-feature correlations, we propose a Covariance-Enhanced Discriminant Analysis (CEDA) method that simultaneously and consistently selects informative features and identifies the corresponding discriminable classes. We show that, under mild regularity conditions, …


Multivariate Generalized Poisson Distribution For Interference On Selected Non-Communicable Diseases In Lagos State, Nigeria, Adewara Johnson Ademola, Mbata Ugochuckwu Ahamefula Nov 2012

Multivariate Generalized Poisson Distribution For Interference On Selected Non-Communicable Diseases In Lagos State, Nigeria, Adewara Johnson Ademola, Mbata Ugochuckwu Ahamefula

Journal of Modern Applied Statistical Methods

Multivariate Generalized Poisson Distribution (MGPD) models are applied to make inferences regarding non-communicable diseases, diabetes, hypertension, stroke and ulcer in Lagos State, Nigeria. The generalized Poisson distribution is employed due to its usefulness in modeling count data in the presence of either over- or under- dispersion. Results show that the correlation between ulcer and stroke is not significant. Other pairwise comparisons of diseases are significant, thus implying that a patient who suffers from diabetes or stroke has a high propensity to also be hypertensive.


Spatial And Temporal Correlations Of Freeway Link Speeds: An Empirical Study, Piotr J. Rachtan Jan 2012

Spatial And Temporal Correlations Of Freeway Link Speeds: An Empirical Study, Piotr J. Rachtan

Masters Theses 1911 - February 2014

Congestion on roadways and high level of uncertainty of traffic conditions are major considerations for trip planning. The purpose of this research is to investigate the characteristics and patterns of spatial and temporal correlations and also to detect other variables that affect correlation in a freeway setting. 5-minute speed aggregates from the Performance Measurement System (PeMS) database are obtained for two directions of an urban freeway – I-10 between Santa Monica and Los Angeles, California. Observations are for all non-holiday weekdays between January 1st and June 30th, 2010. Other variables include traffic flow, ramp locations, number of lanes and the …


Manova: Type I Error Rate Analysis, Christopher Dau Wei Ling Aug 2011

Manova: Type I Error Rate Analysis, Christopher Dau Wei Ling

Statistics

No abstract provided.


Statistical Learning And Behrens-Fisher Distribution Methods For Heteroscedastic Data In Microarray Analysis, Nabin K. Manandhr-Shrestha Mar 2010

Statistical Learning And Behrens-Fisher Distribution Methods For Heteroscedastic Data In Microarray Analysis, Nabin K. Manandhr-Shrestha

USF Tampa Graduate Theses and Dissertations

The aim of the present study is to identify the di®erentially expressed genes be- tween two di®erent conditions and apply it in predicting the class of new samples using the microarray data. Microarray data analysis poses many challenges to the statis- ticians because of its high dimensionality and small sample size, dubbed as "small n large p problem". Microarray data has been extensively studied by many statisticians and geneticists. Generally, it is said to follow a normal distribution with equal vari- ances in two conditions, but it is not true in general. Since the number of replications is very small, …


Intermediate R Values For Use In The Fleishman Power Method, Julie M. Smith Nov 2009

Intermediate R Values For Use In The Fleishman Power Method, Julie M. Smith

Journal of Modern Applied Statistical Methods

Several intermediate r values are calculated at three different correlations for use in the Fleishman Power Method for generating correlated data from normal and non-normal populations.


Characterizing The Statistical Properties And Global Distribution Of Dansgaard-Oeschger Events, Andrea Michelle Thomas Mar 2009

Characterizing The Statistical Properties And Global Distribution Of Dansgaard-Oeschger Events, Andrea Michelle Thomas

Theses and Dissertations

Ice core records from Greenland have shown times of rapid warming during the most recent glacial period, called Dansgaard-Oeschger (D-O) events. D-O events are important to our understanding of both past climate systems and modern climate volatility. In this paper, we present new approaches for statistically evaluating the existence of cyclicity in D-O events and the possible lagged correlation between the Greenland and Antarctica temperature records. Specifically, we consider permutation testing and bootstrapping methodologies for assessing the cyclicity of D-O events and the correlation between the Greenland and Antarctica records. We find that there is not enough evidence to conclude …


The Most Representative Composite Rank Ordering Of Multi-Attribute Objects By The Particle Swarm Optimization, Sudhanshu K. Mishra Jan 2009

The Most Representative Composite Rank Ordering Of Multi-Attribute Objects By The Particle Swarm Optimization, Sudhanshu K. Mishra

Sudhanshu K Mishra

Rank-ordering of individuals or objects on multiple criteria has many important practical applications. A reasonably representative composite rank ordering of multi-attribute objects/individuals or multi-dimensional points is often obtained by the Principal Component Analysis, although much inferior but computationally convenient methods also are frequently used. However, such rank ordering – even the one based on the Principal Component Analysis – may not be optimal. This has been demonstrated by several numerical examples. To solve this problem, the Ordinal Principal Component Analysis was suggested some time back. However, this approach cannot deal with various types of alternative schemes of rank ordering, mainly …


The Correlation Coefficients, Rudy A. Gideon Nov 2007

The Correlation Coefficients, Rudy A. Gideon

Journal of Modern Applied Statistical Methods

A generalized method of defining and interpreting correlation coefficients is given. Seven correlation coefficients are defined — three for continuous data and four on the ranks of the data. A quick calculation of the rank based correlation coefficients using a 0-1 graph-matrix is shown. Examples and comparisons are given.


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


Comparing Correlated Parameter Estimates For Nonlinear Pet Model, J. Wu, A. Parkhurst, K. Eskridge, D. Travnicek, T. Brown-Brandi, R. Eigenberg, G. L. Hahn, J. Nienaber, T. Mader, D. Spiers Apr 2003

Comparing Correlated Parameter Estimates For Nonlinear Pet Model, J. Wu, A. Parkhurst, K. Eskridge, D. Travnicek, T. Brown-Brandi, R. Eigenberg, G. L. Hahn, J. Nienaber, T. Mader, D. Spiers

Conference on Applied Statistics in Agriculture

The nonlinear PET model based on Newton's law of cooling can be used to estimate body temperature in cattle, T b challenged by hot cyclic chamber temperatures, T a . The PET model has four biologically meaningful parameters: K, the thermal constant; Δ, the difference between T b and adjusted T a ; Υ the proportion of variation in T b comparable to variation in Ta ; T bini, the initial body temperature. The two parameters Y and Δ are highly correlated in the current version of the model. This study looks at other ways to parameterize …


Accounting For Non-Independent Observations In 2×2 Tables, With Application To Correcting For Family Clustering In Exposure-Risk Relationship Studies, Leslie A. Kalsih, Katherine A. Riester, Stuart J. Pocock Nov 2002

Accounting For Non-Independent Observations In 2×2 Tables, With Application To Correcting For Family Clustering In Exposure-Risk Relationship Studies, Leslie A. Kalsih, Katherine A. Riester, Stuart J. Pocock

Journal of Modern Applied Statistical Methods

Participants in epidemiologic studies may not represent statistically independent observations. We consider modifications to conventional analyses of 2×2 tables, including Fisher’s exact test and confidence intervals, to account for correlated observations in this setting. An example is provided, assessing the robustness of conclusions from a published analysis.