Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

2015

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 30 of 159

Full-Text Articles in Applied Statistics

The Fraud Detection Triangle: A New Framework For Selecting Variables In Fraud Detection Research, Adrian Gepp, Kuldeep Kumar, Sukanto Bhattacharya Feb 2016

The Fraud Detection Triangle: A New Framework For Selecting Variables In Fraud Detection Research, Adrian Gepp, Kuldeep Kumar, Sukanto Bhattacharya

Kuldeep Kumar

The selection of explanatory (independent) variables is crucial to developing a fraud detection model. However, the selection process in prior financial statement fraud detection studies is not standardized. Furthermore, the categories of variables differ between studies. Consequently, the new Fraud Detection Triangle framework is proposed as an overall theory to assist in guiding the selection of variables for future fraud detection research. This new framework adapts and extends Cressey’s (1953) well-known and widely-used fraud triangle to make it more suited for use in fraud detection research. While the new framework was developed for financial statement fraud detection, it is more …


Applying Bayesian Machine Learning Methods To Theoretical Surface Science, Shane Carr Dec 2015

Applying Bayesian Machine Learning Methods To Theoretical Surface Science, Shane Carr

McKelvey School of Engineering Theses & Dissertations

Machine learning is a rapidly evolving field in computer science with increasingly many applications to other domains. In this thesis, I present a Bayesian machine learning approach to solving a problem in theoretical surface science: calculating the preferred active site on a catalyst surface for a given adsorbate molecule. I formulate the problem as a low-dimensional objective function. I show how the objective function can be approximated into a certain confidence interval using just one iteration of the self-consistent field (SCF) loop in density functional theory (DFT). I then use Bayesian optimization to perform a global search for the solution. …


Characterizing The Statistical Distribution Of Organic Carbon And Extractable Phosphorus At A Regional Scale, John J. Brejda, David W. Meek, Douglas L. Karlen Dec 2015

Characterizing The Statistical Distribution Of Organic Carbon And Extractable Phosphorus At A Regional Scale, John J. Brejda, David W. Meek, Douglas L. Karlen

Douglas L Karlen

Greater awareness of potential environmental problems has created the need to monitor total organic carbon (TOC) and extractable phosphorus (P) concentrations at a regional scale. The probability distribution of these soil properties can have a significant effect on the power of statistical tests and the quality of inferences applied to these properties. The objectives of this study were to: (1) evaluate the probability distribution of TOC and extractable P at the regional scale in three Major Land Resource Areas (MLRA), and (2) identify appropriate transformations that will result in a normal distribution. Both TOC and extractable P were non-normally distributed …


Calorimetry And Body Composition Research In Broilers And Broiler Breeders, Justina Victoria Caldas Cueva Dec 2015

Calorimetry And Body Composition Research In Broilers And Broiler Breeders, Justina Victoria Caldas Cueva

Graduate Theses and Dissertations

Indirect calorimetry to study heat production (HP) and dual energy X-ray absorptiometry (DEXA) for body composition (BC) are powerful techniques to study the dynamics of energy and protein utilization in poultry. The first two chapters present the BC (dry matter, lean, protein, and fat, bone mineral, calcium and phosphorus) of modern broilers from 1 – 60 d of age analyzed by chemical analysis and DEXA. DEXA has been validated for precision, standardized for position, and equations and validations developed for chickens under two different feeding levels. These equations are unique to the machine and software in use. Research in broilers …


Meta-Analysis Of Lapatinib Plus Capecitabine Versus Capecitabine In The Treatment Of Her2 Positive Breast Cancer, Lynda Smith Dec 2015

Meta-Analysis Of Lapatinib Plus Capecitabine Versus Capecitabine In The Treatment Of Her2 Positive Breast Cancer, Lynda Smith

Culminating Projects in Applied Statistics

BACKGROUND:

Breast cancer is the most common type of cancer in women despite advances in research and detection methods. Approximately 25 to 30 percent of newly diagnosed cases of breast cancer will overexpress HER2, human epidermal growth factor receptor 2, and are at a greater risk for disease progression and poorer clinical outcomes. The traditional treatment is associated with irreversible cardiac dysfunction. An alternative treatment involving lapatinib plus capecitabine has been reported in some randomized controlled clinical trials comparing treatment outcomes. To quantify the effectiveness of lapatinib plus capecitabine combination therapy versus capecitabine monotherapy in treating metastatic breast cancer, a …


Probabilistic Graphical Modeling On Big Data, Ming-Hua Chung Dec 2015

Probabilistic Graphical Modeling On Big Data, Ming-Hua Chung

Graduate Theses and Dissertations

The rise of Big Data in recent years brings many challenges to modern statistical analysis and modeling. In toxicogenomics, the advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on key word search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past …


Hemodynamic Analysis Of Fast And Slow Aneurysm Occlusions By Flow Diversion In Rabbits, Bong Jae Chung, Fernando Mut, Ramanathan Kadirvel, Ravi Lingineni, David F. Kallmes, Juan R. Cebral Dec 2015

Hemodynamic Analysis Of Fast And Slow Aneurysm Occlusions By Flow Diversion In Rabbits, Bong Jae Chung, Fernando Mut, Ramanathan Kadirvel, Ravi Lingineni, David F. Kallmes, Juan R. Cebral

Department of Applied Mathematics and Statistics Faculty Scholarship and Creative Works

Purpose: To assess hemodynamic differences between aneurysms that occlude rapidly and those occluding in delayed fashion after flow diversion in rabbits. Methods: Thirty-six elastase-induced aneurysms in rabbits were treated with flow diverting devices. Aneurysm occlusion was assessed angiographically immediately before they were sacrificed at 1 (n=6), 2 (n=4), 4 (n=8) or 8 weeks (n=18) after treatment. The aneurysms were classified into a fast occlusion group if they were completely or near completely occluded at 4 weeks or earlier and a slow occlusion group if they remained incompletely occluded at 8 weeks. The immediate post-treatment flow conditions in aneurysms of each …


Oriented Object Proposals, Shengfeng He, Rynson W. H. Lau Dec 2015

Oriented Object Proposals, Shengfeng He, Rynson W. H. Lau

Research Collection School Of Computing and Information Systems

In this paper, we propose a new approach to generate oriented object proposals (OOPs) to reduce the detection error caused by various orientations of the object. To this end, we propose to efficiently locate object regions according to pixelwise object probability, rather than measuring the objectness from a set of sampled windows. We formulate the proposal generation problem as a generative probabilistic model such that object proposals of different shapes (i.e., sizes and orientations) can be produced by locating the local maximum likelihoods. The new approach has three main advantages. First, it helps the object detector handle objects of different …


Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao Nov 2015

Estimation Of Reliability In Multicomponent Stress-Strength Based On Generalized Rayleigh Distribution, Gadde Srinivasa Rao

Srinivasa Rao Gadde Dr.

A multicomponent system of k components having strengths following k- independently and identically distributed random variables x1, x2, ..., xk and each component experiencing a random stress Y is considered. The system is regarded as alive only if at least s out of k (s < k) strengths exceed the stress. The reliability of such a system is obtained when strength and stress variates are given by a generalized Rayleigh distribution with different shape parameters. Reliability is estimated using the maximum likelihood (ML) method of estimation in samples drawn from strength and stress distributions; the reliability estimators are compared asymptotically. Monte-Carlo …


Gis-Integrated Mathematical Modeling Of Social Phenomena At Macro- And Micro- Levels—A Multivariate Geographically-Weighted Regression Model For Identifying Locations Vulnerable To Hosting Terrorist Safe-Houses: France As Case Study, Elyktra Eisman Nov 2015

Gis-Integrated Mathematical Modeling Of Social Phenomena At Macro- And Micro- Levels—A Multivariate Geographically-Weighted Regression Model For Identifying Locations Vulnerable To Hosting Terrorist Safe-Houses: France As Case Study, Elyktra Eisman

FIU Electronic Theses and Dissertations

Adaptability and invisibility are hallmarks of modern terrorism, and keeping pace with its dynamic nature presents a serious challenge for societies throughout the world. Innovations in computer science have incorporated applied mathematics to develop a wide array of predictive models to support the variety of approaches to counterterrorism. Predictive models are usually designed to forecast the location of attacks. Although this may protect individual structures or locations, it does not reduce the threat—it merely changes the target. While predictive models dedicated to events or social relationships receive much attention where the mathematical and social science communities intersect, models dedicated to …


Wind Power Capacity Value Metrics And Variability: A Study In New England, Frederick W. Letson Nov 2015

Wind Power Capacity Value Metrics And Variability: A Study In New England, Frederick W. Letson

Doctoral Dissertations

Capacity value is the contribution of a power plant to the ability of the power system to meet high demand. As wind power penetration in New England, and worldwide, increases so does the importance of identifying the capacity contribution made by wind power plants. It is critical to accurately characterize the capacity value of these wind power plants and the variability of the capacity value over the long term. This is important in order to avoid the cost of keeping extra power plants operational while still being able to cover the demand for power reliably. This capacity value calculation is …


Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang Nov 2015

Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang

Doctoral Dissertations

Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we …


Approaches For Detection Of Unstable Processes: A Comparative Study, Yerriswamy Wooluru, D. R. Swamy, P. Nagesh Nov 2015

Approaches For Detection Of Unstable Processes: A Comparative Study, Yerriswamy Wooluru, D. R. Swamy, P. Nagesh

Journal of Modern Applied Statistical Methods

A process is stable only when parameters of the distribution of a process or product characteristic remain same over time. Only a stable process has the ability to perform in a predictable manner over time. Statistical analysis of process data usually assume that data are obtained from stable process. In the absence of control charts, the hypothesis of process stability is usually assessed by visual examination of the pattern in the run chart. In this paper appropriate statistical approaches have been adopted to detect instability in the process and compared their performance with the run chart of considerably shorter length …


Contrails: Causal Inference Using Propensity Scores, Dean S. Barron Nov 2015

Contrails: Causal Inference Using Propensity Scores, Dean S. Barron

Journal of Modern Applied Statistical Methods

Contrails are clouds caused by airplane exhausts, which geologists contend decrease daily temperature ranges on Earth. Following the 2001 World Trade Center attack, cancelled domestic flights triggered the first absence of contrails in decades. Resultant exceptional data capacitated causal inference analysis by propensity score matching. Estimated contrail effect was 6.8981°F.


The Bayes Factor For Case-Control Studies With Misclassified Data, Tzesan Lee Nov 2015

The Bayes Factor For Case-Control Studies With Misclassified Data, Tzesan Lee

Journal of Modern Applied Statistical Methods

The question of how to test if collected data for a case-control study are misclassified was investigated. A mixed approach was employed to calculate the Bayes factor to assess the validity of the null hypothesis of no-misclassification. A real-world data set on the association between lung cancer and smoking status was used as an example to illustrate the proposed method.


Statistical Modeling Of Migration Attractiveness Of The Eu Member States, Tatiana Tikhomirova, Yulia Lebedeva Nov 2015

Statistical Modeling Of Migration Attractiveness Of The Eu Member States, Tatiana Tikhomirova, Yulia Lebedeva

Journal of Modern Applied Statistical Methods

Identifying the relationship between the migration attractiveness of the European Union countries and their level of socio-economic development is investigated. An approach is proposed identify influences on migration socio-economic characteristics, by aggregating and reducing their diversity, and substantiating the cause-and-effect relationships of the studied phenomenon. A stable classification of countries scheme is developed according to the attractiveness of migration on aggregate factors, and then an econometric model of a binary choice using panel data for 2008-2010 was applying, quantifying the impact of aggregate designed factors on immigration and emigration.


Bayesian Analysis Under Progressively Censored Rayleigh Data, Gyan Prakash Nov 2015

Bayesian Analysis Under Progressively Censored Rayleigh Data, Gyan Prakash

Journal of Modern Applied Statistical Methods

The one-parameter Rayleigh model is considered as an underlying model for evaluating the properties of Bayes estimator under Progressive Type-II right censored data. The One‑Sample Bayes prediction bound length (OSBPBL) is also measured. Based on two different asymmetric loss functions a comparative study presented for Bayes estimation. A simulation study was used to evaluate their comparative properties.


An Empirical Study On Different Ranking Methods For Effective Data Classification, Ilangovan Sangaiah, A. Vincent Antony Kumar, Appavu Balamurugan Nov 2015

An Empirical Study On Different Ranking Methods For Effective Data Classification, Ilangovan Sangaiah, A. Vincent Antony Kumar, Appavu Balamurugan

Journal of Modern Applied Statistical Methods

Ranking is the attribute selection technique used in the pre-processing phase to emphasize the most relevant attributes which allow models of classification simpler and easy to understand. It is a very important and a central task for information retrieval, such as web search engines, recommendation systems, and advertisement systems. A comparison between eight ranking methods was conducted. Ten different learning algorithms (NaiveBayes, J48, SMO, JRIP, Decision table, RandomForest, Multilayerperceptron, Kstar) were used to test the accuracy. The ranking methods with different supervised learning algorithms give different results for balanced accuracy. It was shown the selection of ranking methods could be …


Two Stage Robust Ridge Method In A Linear Regression Model, Adewale Folaranmi Lukman, Oyedeji Isola Osowole, Kayode Ayinde Nov 2015

Two Stage Robust Ridge Method In A Linear Regression Model, Adewale Folaranmi Lukman, Oyedeji Isola Osowole, Kayode Ayinde

Journal of Modern Applied Statistical Methods

Two Stage Robust Ridge Estimators based on robust estimators M, MM, S, LTS are examined in the presence of autocorrelation, multicollinearity and outliers as alternative to Ordinary Least Square Estimator (OLS). The estimator based on S estimator performs better. Mean square error was used as a criterion for examining the performances of these estimators.


Semi-Parametric Non-Proportional Hazard Model With Time Varying Covariate, Kazeem A. Adeleke, Alfred A. Abiodun, R. A. Ipinyomi Nov 2015

Semi-Parametric Non-Proportional Hazard Model With Time Varying Covariate, Kazeem A. Adeleke, Alfred A. Abiodun, R. A. Ipinyomi

Journal of Modern Applied Statistical Methods

The application of survival analysis has extended the importance of statistical methods for time to event data that incorporate time dependent covariates. The Cox proportional hazards model is one such method that is widely used. An extension of the Cox model with time-dependent covariates was adopted when proportionality assumption are violated. The purpose of this study is to validate the model assumption when hazard rate varies with time. This approach is applied to model data on duration of infertility subject to time varying covariate. Validity is assessed by a set of simulation experiments and results indicate that a non proportional …


Structural Properties Of Transmuted Weibull Distribution, Kaisar Ahmad, S. P. Ahmad, A. Ahmed Nov 2015

Structural Properties Of Transmuted Weibull Distribution, Kaisar Ahmad, S. P. Ahmad, A. Ahmed

Journal of Modern Applied Statistical Methods

The transmuted Weibull distribution, and a related special case, is introduced. Estimates of parameters are obtained by using a new method of moments.


New Entropy Estimators With Smaller Root Mean Squared Error, Amer Ibrahim Al-Omari Nov 2015

New Entropy Estimators With Smaller Root Mean Squared Error, Amer Ibrahim Al-Omari

Journal of Modern Applied Statistical Methods

New estimators of entropy of continuous random variable are suggested. The proposed estimators are investigated under simple random sampling (SRS), ranked set sampling (RSS), and double ranked set sampling (DRSS) methods. The estimators are compared with Vasicek (1976) and Al-Omari (2014) entropy estimators theoretically and by simulation in terms of the root mean squared error (RMSE) and bias values. The results indicate that the suggested estimators have less RMSE and bias values than their competing estimators introduced by Vasicek (1976) and Al-Omari (2014).


Caution For Software Use Of New Statistical Methods (R), Akiva J. Lorenz, Barry S. Markman, Shlomo Sawilowsky Nov 2015

Caution For Software Use Of New Statistical Methods (R), Akiva J. Lorenz, Barry S. Markman, Shlomo Sawilowsky

Journal of Modern Applied Statistical Methods

Open source programming languages such as R allow statisticians to develop and rapidly disseminate advanced procedures, but sometimes at the expense of a proper vetting process. A new example is the least trimmed squares regression available in R’s lqs() in the MASS library. It produces pretty regression lines, particularly in the presence of outliers. However, this procedure lacks a defined standard error, and thus it should be avoided.


Inferences About The Skipped Correlation Coefficient: Dealing With Heteroscedasticity And Non-Normality, Rand Wilcox Nov 2015

Inferences About The Skipped Correlation Coefficient: Dealing With Heteroscedasticity And Non-Normality, Rand Wilcox

Journal of Modern Applied Statistical Methods

A common goal is testing the hypothesis that Pearson’s correlation is zero and typically this is done based on Student’s T test. There are, however, several well-known concerns. First, Student’s T is sensitive to heteroscedasticity. That is, when it rejects, it is reasonable to conclude that there is dependence, but in terms of making a decision about the strength of the association, it is unsatisfactory. Second, Pearson’s correlation is not robust: it can poorly reflect the strength of the association. Even a single outlier can have a tremendous impact on the usual estimate of Pearson’s correlation, which can result in …


Resolving The Issue Of How Reliability Is Related To Statistical Power: Adhering To Mathematical Definitions, Donald W. Zimmerman, Bruno D. Zumbo Nov 2015

Resolving The Issue Of How Reliability Is Related To Statistical Power: Adhering To Mathematical Definitions, Donald W. Zimmerman, Bruno D. Zumbo

Journal of Modern Applied Statistical Methods

Reliability in classical test theory is a population-dependent concept, defined as a ratio of true-score variance and observed-score variance, where observed-score variance is a sum of true and error components. On the other hand, the power of a statistical significance test is a function of the total variance, irrespective of its decomposition into true and error components. For that reason, the reliability of a dependent variable is a function of the ratio of true-score variance and observed-score variance, whereas statistical power is a function of the sum of the same two variances. Controversies about how reliability is related to statistical …


In (Partial) Defense Of .05, Thomas R. Knapp Nov 2015

In (Partial) Defense Of .05, Thomas R. Knapp

Journal of Modern Applied Statistical Methods

Researchers are frequently chided for choosing the .05 alpha level as the determiner of statistical significance (or non-significance). A partial justification is provided.


The Distribution Of The Inverse Square Root Transformed Error Component Of The Multiplicative Time Series Model, Bright F. Ajibade, Chinwe R. Nwosu, J. I. Mbegdu Nov 2015

The Distribution Of The Inverse Square Root Transformed Error Component Of The Multiplicative Time Series Model, Bright F. Ajibade, Chinwe R. Nwosu, J. I. Mbegdu

Journal of Modern Applied Statistical Methods

The probability density function, mean and variance of the inverse square-root transformed left-truncated N(1,σ2) error component e*t(=1/ √et) of the multiplicative time series model were established. A comparison of key-statistical properties of e*t and et confirmed normality with mean 1 but with Var(e*t) ≈1/4Var(et) when σ≤0.14. Hence σ≤0.14 is the required condition for successful transformation.


Front Matter, Jmasm Editors Nov 2015

Front Matter, Jmasm Editors

Journal of Modern Applied Statistical Methods

.


Vol. 14, No. 2 (Full Issue), Jmasm Editors Nov 2015

Vol. 14, No. 2 (Full Issue), Jmasm Editors

Journal of Modern Applied Statistical Methods

.


Monte Carlo Comparison Of The Parameter Estimation Methods For The Two-Parameter Gumbel Distribution, Demet Aydin, Birdal Şenoğlu Nov 2015

Monte Carlo Comparison Of The Parameter Estimation Methods For The Two-Parameter Gumbel Distribution, Demet Aydin, Birdal Şenoğlu

Journal of Modern Applied Statistical Methods

The performances of the seven different parameter estimation methods for the Gumbel distribution are compared with numerical simulations. Estimation methods used in this study are the method of moments (ME), the method of maximum likelihood (ML), the method of modified maximum likelihood (MML), the method of least squares (LS), the method of weighted least squares (WLS), the method of percentile (PE) and the method of probability weighted moments (PWM). Performance of the estimators is compared with respect to their biases, MSE and deficiency (Def) values via Monte-Carlo simulation. A Monte Carlo Simulation study showed that the method of PWM was …