Physical Sciences and Mathematics | Open Access Articles

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Kuldeep Kumar

No abstract provided.

Go to article

Nbr2 Errata And Comments, Joseph Hilbe

Joseph M Hilbe

Errata and Comments for Negative Binomial Regression, 2nd edition

Go to article

Time Series, Unit Roots, And Cointegration: An Introduction, Lonnie K. Stevans

Lonnie K. Stevans

The econometric literature on unit roots took off after the publication of the paper by Nelson and Plosser (1982) that argued that most macroeconomic series have unit roots and that this is important for the analysis of macroeconomic policy. Yule (1926) suggested that regressions based on trending time series data can be spurious. This problem of spurious correlation was further pursued by Granger and Newbold (1974) and this also led to the development of the concept of cointegration (lack of cointegration implies spurious regression). The pathbreaking paper by Granger (1981), first presented at a conference at the University of Florida …

Go to article

Capacity Coefficient Variations, Joseph W. Houpt, Andrew Heathcote, Ami Eidels, Nathan Medeiros-Ward, Jason Watson, David Strayer

Joseph W. Houpt

The capacity coefficient has become an increasingly popular measure of efficiency under changes in workload. It has been used in applications ranging from psychophysical detection tasks to complex cognitive tasks, as well as in addressing questions in social and clinical psychology. The basic formulation compares response times to each stimulus property (or task) in isolation to response times with all stimulus properties (or tasks) at the same time. A number of variations on the basic capacity coefficient have been used, both in the experimental design and in the calculations, and many more are possible. Here we outline the theoretical reasons …

Go to article

General Recognition Theory Extended To Include Response Times: Predictions For A Class Of Parallel Systems, Joseph W. Houpt, James T. Townsend, Noah H. Silbert

Joseph W. Houpt

No abstract provided.

Go to article

International Astrostatistics Association, Joseph Hilbe

Joseph M Hilbe

Overview of the history, purpose, Council and officers of the International Astrostatistics Association (IAA)

Go to article

諸外国のデータエディティング及び混淆正規分布モデルによる多変量外れ値検出法についての研究(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi

Masayoshi Takahashi

No abstract provided.

Go to article

Big Data And The Future, Sherri Rose

Sherri Rose

No abstract provided.

Go to article

Bayesian Approaches To Assessing Architecture And Stopping Rule, Joseph W. Houpt, A. Heathcote, A. Eidels, J. T. Townsend

Joseph W. Houpt

Much of scientific psychology and cognitive science can be viewed as a search to understand the mechanisms and dynamics of perception, thought and action. Two processing attributes of particular interest to psychologists are the architecture, or temporal relationships between sub-processes of the system, and the stopping rule, which dictates how many of the sub-processes must be completed for the system to finish. The Survivor Interaction Contrast (SIC) is a powerful tool for assessing the architecture and stopping rule of a mental process model. Thus far, statistical analysis of the SIC has been limited to null-hypothesis- significance tests. In this talk …

Go to article

Glme3_Ado_Do_Files, Joseph Hilbe

Joseph M Hilbe

GLME3 ado and do files (116 in total)

Go to article

Glme3 Data And Adodo Files, Joseph Hilbe

Joseph M Hilbe

A listing of Data Sets and Stata software commands and do files in GLME3 book

Go to article

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Several authors have studied the performance of optimal, squared error loss (SEL) estimated ranks. Though these are effective, in many applications interest focuses on identifying the relatively good (e.g., in the upper 10%) or relatively poor performers. We construct loss functions that address this goal and evaluate candidate rank estimates, some of which optimize specific loss functions. We study performance for a fully parametric hierarchical model with a Gaussian prior and Gaussian sampling distributions, evaluating performance for several loss functions. Results show that though SEL-optimal ranks and percentiles do not specifically focus on classifying with respect to a percentile cut …

Go to article

Ranking Usrds Provider-Specific Smrs From 1998-2001, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Provider profiling (ranking, "league tables") is prevalent in health services research. Similarly, comparing educational institutions and identifying differentially expressed genes depend on ranking. Effective ranking procedures must be structured by a hierarchical (Bayesian) model and guided by a ranking-specific loss function, however even optimal methods can perform poorly and estimates must be accompanied by uncertainty assessments. We use the 1998-2001 Standardized Mortality Ratio (SMR) data from United States Renal Data System (USRDS) as a platform to identify issues and approaches. Our analyses extend Liu et al. (2004) by combining evidence over multiple years via an AR(1) model; by considering estimates …

Go to article

General Recognition Theory Extended To Include Response Times: Predictions For A Class Of Parallel Systems, James T. Townsend, Joseph W. Houpt, Noah H. Silbert

Joseph W. Houpt

General Recognition Theory (GRT; Ashby & Townsend, 1986) is a multidimensional theory of classification. Originally developed to study various types of perceptual independence, it has also been widely employed in diverse cognitive venues, such as categorization. The initial theory and applications have been static, that is, lacking a time variable and focusing on patterns of responses, such as confusion matrices. Ashby proposed a parallel, dynamic stochastic version of GRT with application to perceptual independence based on discrete linear systems theory with imposed noise \citep{Ash89}. The current study again focuses on cognitive/perceptual independence within an identification classification paradigm. We extend stochastic …

Go to article

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational …

Go to article

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Motivation: Analyzing data from multi-platform genomics experiments combined with patients’ clinical outcomes helps us understand the complex biological processes that characterize a disease, as well as how these processes relate to the development of the disease. Current integration approaches that treat the data are limited in that they do not consider the fundamental biological relationships that exist among the data from platforms.

Statistical Model: We propose an integrative Bayesian analysis of genomics data (iBAG) framework for identifying important genes/biomarkers that are associated with clinical outcome. This framework uses a hierarchical modeling technique to combine the data obtained from multiple platforms …

Go to article

Proportional Mean Residual Life Model For Right-Censored Length-Biased Data, Gary Kwun Chuen Chan, Ying Qing Chen, Chongzhi Di

Chongzhi Di

To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length-biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to the so-called “induced informative censoring” in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes and Dasu (1990) for analysis of censored length-biased survival data. Several nonstandard data structures, …

Go to article

Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin

Peter Austin

Purpose: Observational studies using electronic administrative health care databases are often used to estimate the effects of treatments and exposures. Traditionally, a cohort design has been used to estimate these effects, but increasingly studies are using a nested case-control (NCC) design. The relative statistical efficiency of these two designs has not been examined in detail.

Methods: We used Monte Carlo simulations to compare these two designs in terms of the bias and precision of effect estimates. We examined three different settings: (A): treatment occurred at baseline and there was a single outcome of interest; (B): treatment was time-varying and there …

Go to article

Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin

Peter Austin

Researchers are increasingly using observational or nonrandomized data to estimate causal treatment effects. Essential to the production of high-quality evidence is the ability to reduce or minimize the confounding that frequently occurs in observational studies. When using the potential outcome framework to define causal treatment effects, one requires the potential outcome under each possible treatment. However, only the outcome under the actual treatment received is observed, whereas the potential outcomes under the other treatments are considered missing data. Some authors have proposed that parametric regression models be used to estimate potential outcomes. In this study, we examined the use of …

Go to article

Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin

Peter Austin

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1991-2001 and …

Go to article

Generating Survival Times To Simulate Cox Proportional Hazards Models With Time-Varying Covariates., Peter C. Austin

Peter Austin

Simulations and Monte Carlo methods serve an important role in modern statistical research. They allow for an examination of the performance of statistical procedures in settings in which analytic and mathematical derivations may not be feasible. A key element in any statistical simulation is the existence of an appropriate data-generating process: one must be able to simulate data from a specified statistical model. We describe data-generating processes for the Cox proportional hazards model with time-varying covariates when event times follow an exponential, Weibull, or Gompertz distribution. We consider three types of time-varying covariates: first, a dichotomous time-varying covariate that can …

Go to article

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Adrian Gepp

No abstract provided.

Go to article

Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn

Michael Stanley Smith

[THIS IS AN AUGUST 2010 REVISION THAT REPLACES ALL PREVIOUS VERSIONS.]

We construct a copula from the skew t distribution of Sahu, Dey & Branco (2003). This copula can capture asymmetric and extreme dependence between variables, and is one of the few copulas that can do so and still be used in high dimensions effectively. However, it is difficult to estimate the copula model by maximum likelihood when the multivariate dimension is high, or when some or all of the marginal distributions are discrete-valued, or when the parameters in the marginal distributions and copula are estimated jointly. We therefore propose …

Go to article

Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled

Michael Stanley Smith

Estimation of copula models with discrete margins is known to be difficult beyond the bivariate case. We show how this can be achieved by augmenting the likelihood with latent variables, and computing inference using the resulting augmented posterior. To evaluate this we propose two efficient Markov chain Monte Carlo sampling schemes. One generates the latent variables as a block using a Metropolis-Hasting step with a proposal that is close to its target distribution, the other generates them one at a time. Our method applies to all parametric copulas where the conditional copula functions can be evaluated, not just elliptical copulas …

Go to article

Full-Text Articles in Physical Sciences and Mathematics

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Kuldeep Kumar

Nbr2 Errata And Comments, Joseph Hilbe

Joseph M Hilbe

Time Series, Unit Roots, And Cointegration: An Introduction, Lonnie K. Stevans

Lonnie K. Stevans

Capacity Coefficient Variations, Joseph W. Houpt, Andrew Heathcote, Ami Eidels, Nathan Medeiros-Ward, Jason Watson, David Strayer

Joseph W. Houpt

General Recognition Theory Extended To Include Response Times: Predictions For A Class Of Parallel Systems, Joseph W. Houpt, James T. Townsend, Noah H. Silbert

Joseph W. Houpt

International Astrostatistics Association, Joseph Hilbe

Joseph M Hilbe

諸外国のデータエディティング及び混淆正規分布モデルによる多変量外れ値検出法についての研究(高橋将宜、選択的エディティング、セレクティブエディティング), Masayoshi Takahashi

Masayoshi Takahashi

Big Data And The Future, Sherri Rose

Sherri Rose

Bayesian Approaches To Assessing Architecture And Stopping Rule, Joseph W. Houpt, A. Heathcote, A. Eidels, J. T. Townsend

Joseph W. Houpt

Glme3_Ado_Do_Files, Joseph Hilbe

Joseph M Hilbe

Glme3 Data And Adodo Files, Joseph Hilbe

Joseph M Hilbe

Loss Function Based Ranking In Two-Stage, Hierarchical Models, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

Ranking Usrds Provider-Specific Smrs From 1998-2001, Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Rongheng Lin

General Recognition Theory Extended To Include Response Times: Predictions For A Class Of Parallel Systems, James T. Townsend, Joseph W. Houpt, Noah H. Silbert

Joseph W. Houpt

Statistical Methods For Proteomic Biomarker Discovery Based On Feature Extraction Or Functional Modeling Approaches, Jeffrey S. Morris

Jeffrey S. Morris

Integrative Bayesian Analysis Of High-Dimensional Multi-Platform Genomics Data, Wenting Wang, Veerabhadran Baladandayuthapani, Jeffrey S. Morris, Bradley M. Broom, Ganiraju C. Manyam, Kim-Anh Do

Jeffrey S. Morris

Proportional Mean Residual Life Model For Right-Censored Length-Biased Data, Gary Kwun Chuen Chan, Ying Qing Chen, Chongzhi Di

Chongzhi Di

Comparing The Cohort Design And The Nested Case-Control Design In The Presence Of Both Time-Invariant And Time-Dependent Treatment And Competing Risks: Bias And Precision, Peter C. Austin

Peter Austin

Using Ensemble-Based Methods For Directly Estimating Causal Effects: An Investigation Of Tree-Based G-Computation, Peter C. Austin

Peter Austin

Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin

Peter Austin

Generating Survival Times To Simulate Cox Proportional Hazards Models With Time-Varying Covariates., Peter C. Austin

Peter Austin

A Comparative Analysis Of Decision Trees Vis-À-Vis Other Computational Data Mining Techniques In Automotive Insurance Fraud Detection, Adrian Gepp, Kuldeep Kumar, J Holton Wilson, Sukanto Bhattacharya

Adrian Gepp

Modeling Dependence Using Skew T Copulas: Bayesian Inference And Applications, Michael S. Smith, Quan Gan, Robert Kohn

Michael Stanley Smith

Estimation Of Copula Models With Discrete Margins Via Bayesian Data Augmentation, Michael S. Smith, Mohamad A. Khaled

Michael Stanley Smith