Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

External Link

Discipline
Keyword
Publication Year
Publication

Articles 1 - 25 of 25

Full-Text Articles in Applied Statistics

The Fraud Detection Triangle: A New Framework For Selecting Variables In Fraud Detection Research, Adrian Gepp, Kuldeep Kumar, Sukanto Bhattacharya Oct 2015

The Fraud Detection Triangle: A New Framework For Selecting Variables In Fraud Detection Research, Adrian Gepp, Kuldeep Kumar, Sukanto Bhattacharya

Adrian Gepp

The selection of explanatory (independent) variables is crucial to developing a fraud detection model. However, the selection process in prior financial statement fraud detection studies is not standardized. Furthermore, the categories of variables differ between studies. Consequently, the new Fraud Detection Triangle framework is proposed as an overall theory to assist in guiding the selection of variables for future fraud detection research. This new framework adapts and extends Cressey’s (1953) well-known and widely-used fraud triangle to make it more suited for use in fraud detection research. While the new framework was developed for financial statement fraud detection, it is more …


Marginal Structural Models: An Application To Incarceration And Marriage During Young Adulthood, Valerio Bacak, Edward Kennedy Jan 2015

Marginal Structural Models: An Application To Incarceration And Marriage During Young Adulthood, Valerio Bacak, Edward Kennedy

Edward H. Kennedy

Advanced methods for panel data analysis are commonly used in research on family life and relationships, but the fundamental issue of simultaneous time-dependent confounding and mediation has received little attention. In this article the authors introduce inverse-probability-weighted estimation of marginal structural models, an approach to causal analysis that (unlike conventional regression modeling) appropriately adjusts for confounding variables on the causal pathway linking the treatment with the outcome. They discuss the need for marginal structural models in social science research and describe their estimation in detail. Substantively, the authors contribute to the ongoing debate on the effects of incarceration on marriage …


Predicting Financial Distress: A Comparison Of Survival Analysis And Decision Tree Techniques, Adrian Gepp, Kuldeep Kumar Dec 2014

Predicting Financial Distress: A Comparison Of Survival Analysis And Decision Tree Techniques, Adrian Gepp, Kuldeep Kumar

Adrian Gepp

Financial distress and then the consequent failure of a business is usually an extremely costly and disruptive event. Statistical financial distress prediction models attempt to predict whether a business will experience financial distress in the future. Discriminant analysis and logistic regression have been the most popular approaches, but there is also a large number of alternative cutting – edge data mining techniques that can be used. In this paper, a semi-parametric Cox survival analysis model and non-parametric CART decision trees have been applied to financial distress prediction and compared with each other as well as the most popular approaches. This …


Exorcising The Evil Of Forum-Shopping, Kevin Clermont, Theodore Eisenberg Dec 2014

Exorcising The Evil Of Forum-Shopping, Kevin Clermont, Theodore Eisenberg

Kevin M. Clermont

Most of the business of litigation comprises pretrial disputes. A common and important dispute is over where adjudication should take place. Civil litigators deal with nearly as many change-of-venue motions as trials. The battle over venue often constitutes the critical issue in a case. The American way is to provide plaintiffs with a wide choice of venues for suit. But the American way has its drawbacks. To counter these drawbacks, an integral part of our court systems, and in particular the federal court system, is the scheme of transfer of venue "in the interest of justice." However, the leading evaluative …


How Employment-Discrimination Plaintiffs Fare In The Federal Courts Of Appeals, Kevin Clermont, Theodore Eisenberg, Stewart Schwab Dec 2014

How Employment-Discrimination Plaintiffs Fare In The Federal Courts Of Appeals, Kevin Clermont, Theodore Eisenberg, Stewart Schwab

Kevin M. Clermont

Employment-discrimination plaintiffs swim against the tide. Compared to the typical plaintiff, they win a lower proportion of cases during pretrial and after trial. Then, many of their successful cases are appealed. On appeal, they have a harder time in upholding their successes, as well in reversing adverse outcome. This tough story does not describe some tiny corner of the litigation world. Employment-discrimination cases constitute an increasing fraction of the federal civil docket, now reigning as the largest single category of cases at nearly 10 percent. In this article, we use official government data to describe the appellate phase of this …


Foreigners' Fate In America's Courts: Empirical Legal Research, Kevin Clermont, Theodore Eisenberg Dec 2014

Foreigners' Fate In America's Courts: Empirical Legal Research, Kevin Clermont, Theodore Eisenberg

Kevin M. Clermont

This article revisits the controversy regarding how foreigners fare in U.S. courts. The available data, if taken in a sufficiently big sample from numerous case categories and a range of years, indicate that foreigners have fared better in the federal courts than their domestic counterparts have fared. Thus, the data offer no support for the existence of xenophobic bias in U.S. courts. Nor do they establish xenophilia, of course. What the data do show is that case selection drives the outcomes for foreigners. Foreigners’ aversion to U.S. forums can elevate the foreigners’ success rates, when measured as a percentage of …


Judicial Politics, Death Penalty Appeals, And Case Selection: An Empirical Study, John Blume, Theodore Eisenberg Dec 2014

Judicial Politics, Death Penalty Appeals, And Case Selection: An Empirical Study, John Blume, Theodore Eisenberg

John H. Blume

Several studies try to explain case outcomes based on the politics of judicial selection methods. Scholars usually hypothesize that judges selected by partisan popular elections are subject to greater political pressure in deciding cases than are other judges. No class of cases seems more amenable to such analysis than death penalty cases. No study, however, accounts both for judicial politics and case selection, the process through which cases are selected for death penalty litigation. Yet, the case selection process cannot be ignored because it yields a set of cases for adjudication that is far from a random selection of cases. …


A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya Jul 2014

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Kuldeep Kumar

No abstract provided.


Link Spamming Wikipedia For Profit, Andrew West, Jian Chang, Krishna Venkatasubramanian, Oleg Sokolsky, Insup Lee Jun 2014

Link Spamming Wikipedia For Profit, Andrew West, Jian Chang, Krishna Venkatasubramanian, Oleg Sokolsky, Insup Lee

Oleg Sokolsky

Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.

Our analysis focuses on the "wiki" model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize *exposure*, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly …


Generating A Dynamic Synthetic Population – Using An Age-Structured Two-Sex Model For Household Dynamics, Mohammad-Reza Namazi-Rad, Payam Mokhtarian, Pascal Perez Apr 2014

Generating A Dynamic Synthetic Population – Using An Age-Structured Two-Sex Model For Household Dynamics, Mohammad-Reza Namazi-Rad, Payam Mokhtarian, Pascal Perez

Payam Mokhtarian

Generating a reliable computer-simulated synthetic population is necessary for knowledge processing and decision-making analysis in agent-based systems in order to measure, interpret and describe each target area and the human activity patterns within it. In this paper, both synthetic reconstruction (SR) and combinatorial optimisation (CO) techniques are discussed for generating a reliable synthetic population for a certain geographic region (in Australia) using aggregated- and disaggregated-level information available for such an area. A CO algorithm using the quadratic function of population estimators is presented in this paper in order to generate a synthetic population while considering a two-fold nested structure for …


Models For Improving Patient Throughput And Waiting At Hospital Emergency Departments, Jomon Aliyas Paul, Lin Li Apr 2014

Models For Improving Patient Throughput And Waiting At Hospital Emergency Departments, Jomon Aliyas Paul, Lin Li

Jomon Aliyas Paul

Background: Overcrowding diminishes Emergency Department (ED) care delivery capabilities.

Objectives: We developed a generic methodology to investigate the causes of overcrowding and to identify strategies to resolve them, and applied it in the ED of a hospital participating in the study.

Methods: We utilized Discrete Event Simulation (DES) to capture the complex ED operations. Using DES results, we developed parametric models for checking the effectiveness and quantifying the potential gains from various improvement alternatives. We performed a follow-up study to compare the outcomes before and after the model recommendations were put into effect at the hospital participating …


A Probabilistic Predictive Model For Residential Mobility In Australia, Mohammad-Reza Namazi-Rad, Nagesh Shukla, Albert Munoz, Payam Mokhtarian, Jun Ma Mar 2014

A Probabilistic Predictive Model For Residential Mobility In Australia, Mohammad-Reza Namazi-Rad, Nagesh Shukla, Albert Munoz, Payam Mokhtarian, Jun Ma

Payam Mokhtarian

Household relocation modelling is an integral part of the planning process as residential movements influence the demand for community facilities and services. Department of Families, Housing, Community Services and Indigenous Affairs (FaHCSIA) created the Household, Income and Labour Dynamics in Australia (HILDA) program to collect reliable longitudinal data on family and household dynamics. Socio-demographic information (such as general health situation and well-being, lifestyle changes, residential mobility, income and welfare dynamics, and labour market dynamics) is collected from the sampled individuals and households. The data shows that approximately 17% of Australian households and 13% of couple families in the HILDA sample …


Comparison Of Methods For Estimating The Effect Of Salvage Therapy In Prostate Cancer When Treatment Is Given By Indication., Jeremy Taylor, Jincheng Shen, Edward Kennedy, Lu Wang, Douglas Schaubel Dec 2013

Comparison Of Methods For Estimating The Effect Of Salvage Therapy In Prostate Cancer When Treatment Is Given By Indication., Jeremy Taylor, Jincheng Shen, Edward Kennedy, Lu Wang, Douglas Schaubel

Edward H. Kennedy

For patients who were previously treated for prostate cancer, salvage hormone therapy is frequently given when the longitudinal marker prostate-specific antigen begins to rise during follow-up. Because the treatment is given by indication, estimating the effect of the hormone therapy is challenging. In a previous paper we described two methods for estimating the treatment effect, called two-stage and sequential stratification. The two-stage method involved modeling the longitudinal and survival data. The sequential stratification method involves contrasts within matched sets of people, where each matched set includes people who did and did not receive hormone therapy. In this paper, we evaluate …


Recognition And Resolution Of 'Comprehension Uncertainty' In Ai, Sukanto Bhattacharya, Kuldeep Kumar Jun 2013

Recognition And Resolution Of 'Comprehension Uncertainty' In Ai, Sukanto Bhattacharya, Kuldeep Kumar

Kuldeep Kumar

Handling uncertainty is an important component of most intelligent behaviour – so uncertainty resolution is a key step in the design of an artificially intelligent decision system (Clark, 1990). Like other aspects of intelligent systems design, the aspect of uncertainty resolution is also typically sought to be handled by emulating natural intelligence (Halpern, 2003; Ball and Christensen, 2009). In this regard, a number of computational uncertainty resolution approaches have been proposed and tested by Artificial Intelligence (AI) researchers over the past several decades since birth of Al as a scientific discipline in early 1950s post- publication of Alan Turing's landmark …


Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy Dec 2012

Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy

Edward H. Kennedy

Instrumental variable analysis is a technique commonly used in the social sciences to provide evidence that a treatment causes an outcome, as contrasted with evidence that a treatment is merely associated with differences in an outcome. To extract such strong evidence from observational data, instrumental variable analysis exploits situations where some degree of randomness affects how patients are selected for a treatment. An instrumental variable is a characteristic of the world that leads some people to be more likely to get the specific treatment we want to study but does not otherwise change thosepatients’ outcomes. This seminar explains, in nonmathematical …


Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman Dec 2012

Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman

Edward H. Kennedy

Use of the electronic health record (EHR) is expected to increase rapidly in the near future, yet little research exists on whether analyzing internal EHR data using flexible, adaptive statistical methods could improve clinical risk prediction. Extensive implementation of EHR in the Veterans Health Administration provides an opportunity for exploration. Our objective was to compare the performance of various approaches for predicting risk of cerebrovascular and cardiovascular (CCV) death, using traditional risk predictors versus more comprehensive EHR data. Regression methods outperformed the Framingham risk score, even with the same predictors (AUC increased from 71% to 73% and calibration also improved). …


Time Series, Unit Roots, And Cointegration: An Introduction, Lonnie K. Stevans Dec 2012

Time Series, Unit Roots, And Cointegration: An Introduction, Lonnie K. Stevans

Lonnie K. Stevans

The econometric literature on unit roots took off after the publication of the paper by Nelson and Plosser (1982) that argued that most macroeconomic series have unit roots and that this is important for the analysis of macroeconomic policy. Yule (1926) suggested that regressions based on trending time series data can be spurious. This problem of spurious correlation was further pursued by Granger and Newbold (1974) and this also led to the development of the concept of cointegration (lack of cointegration implies spurious regression). The pathbreaking paper by Granger (1981), first presented at a conference at the University of Florida …


Managing Clustered Data Using Hierarchical Linear Modeling, Russell Warne Apr 2012

Managing Clustered Data Using Hierarchical Linear Modeling, Russell Warne

Russell T Warne

Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence assumption and lead to correct analysis of data, yet it is rarely used in nutrition research. The purpose of this viewpoint is to illustrate the benefits of hierarchical linear modeling within a nutrition research context.


Targeted Maximum Likelihood Estimation Of Natural Direct Effects, Wenjing Zheng, Mark Van Der Laan Jan 2012

Targeted Maximum Likelihood Estimation Of Natural Direct Effects, Wenjing Zheng, Mark Van Der Laan

Wenjing Zheng

In many causal inference problems, one is interested in the direct causal effect of an exposure on an outcome of interest that is not mediated by certain intermediate variables. Robins and Greenland (1992) and Pearl (2001) formalized the definition of two types of direct effects (natural and controlled) under the counterfactual framework. The efficient scores (under a nonparametric model) for the various natural effect parameters and their general robustness conditions, as well as an estimating equation based estimator using the efficient score, are provided in Tchetgen Tchetgen and Shpitser (2011b). In this article, we apply the targeted maximum likelihood framework …


A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya Dec 2011

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Adrian Gepp

No abstract provided.


Beyond Multiple Regression: Using Commonality Analysis To Better Understand R2 Results, Russell Warne Sep 2011

Beyond Multiple Regression: Using Commonality Analysis To Better Understand R2 Results, Russell Warne

Russell T Warne

Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated independent variables. Commonality analysis—heretofore rarely used in gifted education research—is a statistical method that partitions the explained variance of a dependent variable into nonoverlapping parts according to the independent variable(s) that are related to each portion. This Methodological Brief includes an example of commonality analysis and equations for researchers who wish to conduct their …


Imputation Procedures For American Community Survey Group Quarters Small Area Estimation, Chandra Erdman, Chaitra Nagaraja Dec 2009

Imputation Procedures For American Community Survey Group Quarters Small Area Estimation, Chandra Erdman, Chaitra Nagaraja

Chaitra H Nagaraja

No abstract provided.


The Effect Of Salvage Therapy On Survival In A Longitudinal Study With Treatment By Indication, Edward Kennedy, Jeremy Taylor, Douglas Schaubel, Scott Williams Dec 2009

The Effect Of Salvage Therapy On Survival In A Longitudinal Study With Treatment By Indication, Edward Kennedy, Jeremy Taylor, Douglas Schaubel, Scott Williams

Edward H. Kennedy

We consider using observational data to estimate the effect of a treatment on disease recurrence, when the decision to initiate treatment is based on longitudinal factors associated with the risk of recurrence. The effect of salvage androgen deprivation therapy (SADT) on the risk of recurrence of prostate cancer is inadequately described by the existing literature. Furthermore, standard Cox regression yields biased estimates of the effect of SADT, since it is necessary to adjust for prostate-specific antigen (PSA), which is a time-dependent confounder and an intermediate variable. In this paper, we describe and compare two methods which appropriately adjust for PSA …


Significant Figures, Tony Badrick, Peter Hickman Jul 2008

Significant Figures, Tony Badrick, Peter Hickman

Tony Badrick

For consistency of reporting the same number of significant figures should be used for results and reference intervals. The choice of the reporting interval should be based on analytical imprecision (measurement uncertainty).


Model Development Techniques And Evaluation Methods For Prediction And Classification Of Consumer Risk In The Credit Industry, Jennifer Priestley, Satish Nargundkar Dec 2003

Model Development Techniques And Evaluation Methods For Prediction And Classification Of Consumer Risk In The Credit Industry, Jennifer Priestley, Satish Nargundkar

Jennifer L. Priestley

In this chapter, we examine and compare the most prevalent modeling techniques in the credit industry, Linear Discriminant Analysis, Logistic Analysis and the emerging technique of Neural Network modeling. K-S Tests and Classification Rates are typically used in the industry to measure the success in predictive classification. We examine those two methods and a third, ROC Curves, to determine if the method of evaluation has an influence on the perceived performance of the modeling technique. We found that each modeling technique has its own strengths, and a determination of the “best” depends upon the evaluation method utilized and the costs …