Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 61 - 78 of 78

Full-Text Articles in Applied Statistics

Weighting Large Datasets With Complex Sampling Designs: Choosing The Appropriate Variance Estimation Method, Sara Mann, James Chowhan May 2011

Weighting Large Datasets With Complex Sampling Designs: Choosing The Appropriate Variance Estimation Method, Sara Mann, James Chowhan

Journal of Modern Applied Statistical Methods

Using the Canadian Workplace and Employee Survey (WES), three variance estimation methods for weighting large datasets with complex sampling designs are compared: simple final weighting, standard bootstrapping and mean bootstrapping. Using a logit analysis, it is shown - depending on which weighting method is used - different predictor variables are significant. The potential lack of independence inherent in a multi-stage cluster sample design, as in the WES, results in a downward bias in the variance when conducting statistical inference (using the simple final weight), which in turn results in increased Type I errors. Bootstrap methods can account for the survey’s …


Model Diagnostics For Proportional And Partial Proportional Odds Models, Ann A. O'Connell, Xing Liu May 2011

Model Diagnostics For Proportional And Partial Proportional Odds Models, Ann A. O'Connell, Xing Liu

Journal of Modern Applied Statistical Methods

Although widely used to assist in evaluating the prediction quality of linear and logistic regression models, residual diagnostic techniques are not well developed for regression analyses where the outcome is treated as ordinal. The purpose of this article is to review methods of model diagnosis that may be useful in investigating model assumptions and in identifying unusual cases for PO and PPO models, and provide a corresponding application of these diagnostic methods to the prediction of proficiency in early literacy for children drawn from the kindergarten cohort of the Early Childhood Longitudinal Study (ECLS-K; NCES, 2000).


Using Finite Mixture Modeling To Deal With Systematic Measurement Error: A Case Study, Min Liu, Gregory R. Hancock, Jeffrey R. Harring May 2011

Using Finite Mixture Modeling To Deal With Systematic Measurement Error: A Case Study, Min Liu, Gregory R. Hancock, Jeffrey R. Harring

Journal of Modern Applied Statistical Methods

Conventional methods and analyses view measurement error as random. A scenario is presented where a variable was measured with systematic error. Mixture models with systematic parameter constraints were used to test hypotheses in the context of general linear models; this accommodated the heterogeneity arising due to systematic measurement error.


Logistic Regression Models For Higher Order Transition Probabilities Of Markov Chain For Analyzing The Occurrences Of Daily Rainfall Data, Narayan Chanra Sinha, M. Ataharul Islam, Kazi Saleh Ahamed May 2011

Logistic Regression Models For Higher Order Transition Probabilities Of Markov Chain For Analyzing The Occurrences Of Daily Rainfall Data, Narayan Chanra Sinha, M. Ataharul Islam, Kazi Saleh Ahamed

Journal of Modern Applied Statistical Methods

Logistic regression models for transition probabilities of higher order Markov models are developed for the sequence of chain dependent repeated observations. To identify the significance of these models and their parameters a test procedure for a likelihood ratio criterion is developed. A method of model selection is suggested on the basis of AIC and BIC procedures. The proposed models and test procedures are applied to analyze the occurrences of daily rainfall data for selected stations in Bangladesh. Based on results from these models, the transition probabilities of first order Markov model for temperature and humidity provided the most suitable option …


Type I Error Inflation Of The Separate-Variances Welch T Test With Very Small Sample Sizes When Assumptions Are Met, Albert K. Adusah, Gordon P. Brooks May 2011

Type I Error Inflation Of The Separate-Variances Welch T Test With Very Small Sample Sizes When Assumptions Are Met, Albert K. Adusah, Gordon P. Brooks

Journal of Modern Applied Statistical Methods

This Monte Carlo study shows that the separate-variances Welch t test has inflated Type I error rates at very small sample sizes, especially when sample sizes are very small in one group and larger in the second group – even when all assumptions for the statistical test are met.


Bias In Monte Carlo Simulations Due To Pseudo-Random Number Generator Initial Seed Selection, Jack C. Hill, Shlomo S. Sawilowsky May 2011

Bias In Monte Carlo Simulations Due To Pseudo-Random Number Generator Initial Seed Selection, Jack C. Hill, Shlomo S. Sawilowsky

Journal of Modern Applied Statistical Methods

Pseudo-random number generators can bias Monte Carlo simulations of the standard normal probability distribution function with initial seeds selection. Five generator designs were initial-seeded with values from 10000HEX to 1FFFFHEX, estimates of the mean were calculated for each seed, the distribution of mean estimates was determined for each generator and simulation histories were graphed for selected seeds.


One Is Not Enough: The Need For Multiple Respondents In Survey Research Of Organizations, Joseph L. Balloun, Hilton Barrett, Art Weinstein May 2011

One Is Not Enough: The Need For Multiple Respondents In Survey Research Of Organizations, Joseph L. Balloun, Hilton Barrett, Art Weinstein

Journal of Modern Applied Statistical Methods

The need for multiple respondents per organization in organizational survey research is supported. Leadership teams’ ratings of their implementations of market orientation are examined, along with learning orientation, entrepreneurial management, and organizational flexibility. Sixty diverse organizations, including not-for-profit organizations in education and healthcare as well as manufacturing and service businesses, were included. The major finding was the large rating variance within the leadership teams of each organization. The results are enlightening and have definite implications for improved design of survey research on organizations.


Maximum Likelihood Solution For The Linear Structural Relationship With Three Parameters Known, Androulla Michaeloudis May 2011

Maximum Likelihood Solution For The Linear Structural Relationship With Three Parameters Known, Androulla Michaeloudis

Journal of Modern Applied Statistical Methods

A maximum likelihood solution is obtained for the simple linear structural relation model where the underlying incidental distribution and one error variance are assumed known. Expressions for the asymptotic standard errors of the maximum likelihood estimates are obtained and these are verified using a simulation study.


Is Next Twelve Months Period Tumor Recurrence Free Under Restricted Rate Due To Medication? A Probabilistic Warning, Ramalingam Shanmugam May 2011

Is Next Twelve Months Period Tumor Recurrence Free Under Restricted Rate Due To Medication? A Probabilistic Warning, Ramalingam Shanmugam

Journal of Modern Applied Statistical Methods

A methodology is formulated to analyze tumor recurrence data when its incidence rate is restricted due to medication. Analytic results are derived to make a probabilistic early warning of tumor recurrence free period of length τ; that is, the chance for a safe period of lengthτ is estimated. The captured data are length biased. Expressions are developed to extract and relate to counterparts of the non-length biased data. Three data sets are considered as illustrations: (1) patients who are given a placebo, (2) patients who are given the medicine pyridoxine and (3) patients who are given the medicine thiotepa.


The Likelihood Of Choosing The Borda-Winner With Partial Preference Rankings Of The Electorate, Ömer Eğecioğlu, Ayça Ebru Giritligil May 2011

The Likelihood Of Choosing The Borda-Winner With Partial Preference Rankings Of The Electorate, Ömer Eğecioğlu, Ayça Ebru Giritligil

Journal of Modern Applied Statistical Methods

Given that n voters report only the first r (1 r < m) ranks of their linear preference rankings over m alternatives, the likelihood of implementing Borda outcome is investigated. The information contained in the first r ranks is aggregated through a Borda-like method, namely the r-Borda rule. Monte-Carlo simulations are run to detect changes in the likelihood of r-Borda winner(s) to coincide with the original Borda winner(s) as a function of m, n and r. The voters’ preferences are generated through the Impartial Anonymous and Neutral Culture Model, where both the names of the …


Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard May 2011

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard

Economics Faculty Publications

The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing …


Empirical Methods-A Review: With An Introduction To Data Mining And Machine Learning, Matt Bogard May 2011

Empirical Methods-A Review: With An Introduction To Data Mining And Machine Learning, Matt Bogard

Economics Faculty Publications

This presentation was part of a staff workshop focused on empirical methods and applied research. This includes a basic overview of regression with matrix algebra, maximum likelihood, inference, and model assumptions. Distinctions are made between paradigms related to classical statistical methods and algorithmic approaches. The presentation concludes with a brief discussion of generalization error, data partitioning, decision trees, and neural networks.


Counting The Impossible: Sampling And Modeling To Achieve A Large State Homeless Count, Jennifer L. Priestley, Jane Massey Apr 2011

Counting The Impossible: Sampling And Modeling To Achieve A Large State Homeless Count, Jennifer L. Priestley, Jane Massey

Faculty Articles

Objective: Using inferential statistics, we develop estimates of the homeless population of a geographically large and economically diverse state -- Georgia.

Methods: Multiple independent data sources (2000 U.S. Census, the 2006 Georgia County Guide, Georgia Chamber of Commerce) were used to develop Clusters of the 150 Georgia Counties. These clusters were used as "strata" to then execute traified sampling. Homeless counts were conducted within the sample counties, allowing for multiple regression models to be developed to generate predictions of homeless persons by county.

Results: In response to a mandate from the US Department of Housing and Urban Development, the State …


Cv, Lorán Chollete Jan 2011

Cv, Lorán Chollete

Lorán Chollete

No abstract provided.


International Diversification: An Extreme Value Approach, Lorán Chollete, Victor De La Peña, Ching-Chih Lu Jan 2011

International Diversification: An Extreme Value Approach, Lorán Chollete, Victor De La Peña, Ching-Chih Lu

Lorán Chollete

No abstract provided.


Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb Dec 2010

Accurately Sized Test Statistics With Misspecified Conditional Homoskedasticity, Douglas Steigerwald, Jack Erb

Douglas G. Steigerwald

We study the finite-sample performance of test statistics in linear regression models where the error dependence is of unknown form. With an unknown dependence structure there is traditionally a trade-off between the maximum lag over which the correlation is estimated (the bandwidth) and the amount of heterogeneity in the process. When allowing for heterogeneity, through conditional heteroskedasticity, the correlation at far lags is generally omitted and the resultant inflation of the empirical size of test statistics has long been recognized. To allow for correlation at far lags we study test statistics constructed under the possibly misspecified assumption of conditional homoskedasticity. …


The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna Dec 2010

The Underground Economy Of Fake Antivirus Software, Douglas Steigerwald, Brett Stone-Gross, Ryan Abman, Richard Kemmerer, Christopher Kruegel, Giovanni Vigna

Douglas G. Steigerwald

Fake antivirus (AV) programs have been utilized to defraud millions of computer users into paying as much as one hundred dollars for a phony software license. As a result, fake AV software has evolved into one of the most lucrative criminal operations on the Internet. In this paper, we examine the operations of three large-scale fake AV businesses, lasting from three months to more than two years. More precisely, we present the results of our analysis on a trove of data obtained from several backend servers that the cybercriminals used to drive their scam operations. Our investigations reveal that these …


Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann Dec 2010

Bicycle Commuting In Melbourne During The 2000s Energy Crisis: A Semiparametric Analysis Of Intraday Volumes, Michael S. Smith, Goeran Kauermann

Michael Stanley Smith

Cycling is attracting renewed attention as a mode of transport in western urban environments, yet the determinants of usage are poorly understood. In this paper we investigate some of these using intraday bicycle volumes collected via induction loops located at ten bike paths in the city of Melbourne, Australia, between December 2005 and June 2008. The data are hourly counts at each location, with temporal and spatial disaggregation allowing for the impact of meteorology to be measured accurately for the first time. Moreover, during this period petrol prices varied dramatically and the data also provide a unique opportunity to assess …