Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 23 of 23

Full-Text Articles in Physical Sciences and Mathematics

Statistical Models For Predicting College Success, Yelen Nunez Nov 2013

Statistical Models For Predicting College Success, Yelen Nunez

Yelen Nunez

Colleges base their admission decisions on a number of factors to determine which applicants have the potential to succeed. This study utilized data for students that graduated from Florida International University between 2006 and 2012. Two models were developed (one using SAT as the principal explanatory variable and the other using ACT as the principal explanatory variable) to predict college success, measured using the student’s college grade point average at graduation. Some of the other factors that were used to make these predictions were high school performance, socioeconomic status, major, gender, and ethnicity. The model using ACT had a higher …


Create A Simple Predictive Analytics Classification Model In Java With Weka, James Howard Nov 2013

Create A Simple Predictive Analytics Classification Model In Java With Weka, James Howard

James Howard

Get an overview of the Weka classification engine and learn how to create a simple classifier for programmatic use. Understand how to store and load models, manipulate them, and use them to evaluate data. Consider applications and implementation strategies suitable for the enterprise environment so you turn a collection of training data into a functioning model for real- time prediction.


Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer Oct 2013

Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer

Mark Fiecas

Vector auto-regressive (VAR) models typically form the basis for constructing directed graphical models for investigating connectivity in a brain network with brain regions of interest (ROIs) as nodes. There are limitations in the standard VAR models. The number of parameters in the VAR model increases quadratically with the number of ROIs and linearly with the order of the model and thus due to the large number of parameters, the model could pose serious estimation problems. Moreover, when applied to imaging data, the standard VAR model does not account for variability in the connectivity structure across all subjects. In this paper, …


Counting The Impossible: Sampling And Modeling To Achieve A Large State Homeless Count, Jennifer L. Priestley, Jane Massey Oct 2013

Counting The Impossible: Sampling And Modeling To Achieve A Large State Homeless Count, Jennifer L. Priestley, Jane Massey

Jennifer L. Priestley

Objective: Using inferential statistics, we develop estimates of the homeless population of a geographically large and economically diverse state -- Georgia.

Methods: Multiple independent data sources (2000 U.S. Census, the 2006 Georgia County Guide, Georgia Chamber of Commerce) were used to develop Clusters of the 150 Georgia Counties. These clusters were used as "strata" to then execute traified sampling. Homeless counts were conducted within the sample counties, allowing for multiple regression models to be developed to generate predictions of homeless persons by county.

Results: In response to a mandate from the US Department of Housing and Urban Development, the State …


Active Presecription Drug Safety Surveillance: Exploring Omop 2011-2012 Experiments, Susan Gruber, James M. Robins Oct 2013

Active Presecription Drug Safety Surveillance: Exploring Omop 2011-2012 Experiments, Susan Gruber, James M. Robins

Susan Gruber

The Observational Medical Outcomes Partnership (OMOP), a consortium of pharmaceutical, FDA, and academic researchers focuses on developing and evaluating electronic records-based methods for enhancing post-market drug safety surveillance. The OMOP 2011-2012 experiment consists of applying variants of seven analysis methods to five different EMR or claims databases to estimate the increase (decrease) in risk associated with drug-outcome pairs whose causal association has been previously established, and serves as a gold standard for comparison. Variants of each method can produce very different effect estimates, sometimes at odds with the gold standard. We explore the reasons behind this heterogeneity, and in doing …


Reference Interval Studies: What Is The Maximum Number Of Samples Recommended?, Robert Hawkins, Tony Badrick Sep 2013

Reference Interval Studies: What Is The Maximum Number Of Samples Recommended?, Robert Hawkins, Tony Badrick

Tony Badrick

Background: Little attention has been paid to the maximum number of specimens for reference interval calculation, i.e., the number of specimens beyond which there is no further benefit in reference interval calculation. We present a model for the estimation of the maximum number of specimens for reference interval studies based on setting the 90% confidence interval of the reference limits to be equal to the analyte reporting interval. Methods: Equations describing the bounds on the upper and lower 90% confidence intervals for logarithmically transformed and untransformed data were derived and applied to determine the maximum number of specimens required to …


A Study Of Non-Central Skew T Distributions And Their Applications In Data Analysis And Change Point Detection, Abeer Hasan Jul 2013

A Study Of Non-Central Skew T Distributions And Their Applications In Data Analysis And Change Point Detection, Abeer Hasan

Abeer Hasan

Over the past three decades there has been a growing interest in searching for distribution
families that are suitable to analyze skewed data with excess kurtosis. The search started
by numerous papers on the skew normal distribution. Multivariate t distributions started to
catch attention shortly after the development of the multivariate skew normal distribution.
Many researchers proposed alternative methods to generalize the univariate t distribution to
the multivariate case. Recently, skew t distribution started to become popular in research.
Skew t distributions provide more exibility and better ability to accommodate long-tailed
data than skew normal distributions.
In this dissertation, a new …


A Mathematical Model For Estimation Of Fibre, Abhijit Bhattacharya, Kuldeep Kumar Jun 2013

A Mathematical Model For Estimation Of Fibre, Abhijit Bhattacharya, Kuldeep Kumar

Kuldeep Kumar

Yield estimates of fibre in Jute plants (Capsulanes) are usually obtained on the basis of random samples of plants. These estimates are required by the government for the purpose of planning and policy formulation. Due to time and resource constraint, it becomes quite often difficult to compute yield estimates from samples of large size. In this paper an attempt has been made to propose a method based on Gaussian quadrature to estimate the fibre yield from smaller samples. Identification of plants comprising a smaller sample and corresponding weights to be assigned to the yield of plants included in the smaller …


Recognition And Resolution Of 'Comprehension Uncertainty' In Ai, Sukanto Bhattacharya, Kuldeep Kumar Jun 2013

Recognition And Resolution Of 'Comprehension Uncertainty' In Ai, Sukanto Bhattacharya, Kuldeep Kumar

Kuldeep Kumar

Handling uncertainty is an important component of most intelligent behaviour – so uncertainty resolution is a key step in the design of an artificially intelligent decision system (Clark, 1990). Like other aspects of intelligent systems design, the aspect of uncertainty resolution is also typically sought to be handled by emulating natural intelligence (Halpern, 2003; Ball and Christensen, 2009). In this regard, a number of computational uncertainty resolution approaches have been proposed and tested by Artificial Intelligence (AI) researchers over the past several decades since birth of Al as a scientific discipline in early 1950s post- publication of Alan Turing's landmark …


Business Failure Prediction Using Statistical Techniques: A Review, Adrian Gepp, Kuldeep Kumar Jun 2013

Business Failure Prediction Using Statistical Techniques: A Review, Adrian Gepp, Kuldeep Kumar

Adrian Gepp

Accurate business failure prediction models would be extremely valuable to many industry sectors, particularly in financial investment and lending. The potential value of such models has been recently emphasised by the extremely cosdy failure of high profile businesses in both Australia and overseas, such as HIH (Australia) and Enron (USA). Consequently, there has been a significant increase in interest in business failure prediction from both industry and academia. Statistical business failure prediction models attempt to predict the failure or success of a business. Discriminant and logit analyses are the most popular approaches, and there are also a large number of …


Quantitative Interpretation Of A Genetic Model Of Carcinogenesis Using Computer Simulations, Donghai Dai, Brandon Beck, Xiaofang Wang, Cory Howk, Yi Li Apr 2013

Quantitative Interpretation Of A Genetic Model Of Carcinogenesis Using Computer Simulations, Donghai Dai, Brandon Beck, Xiaofang Wang, Cory Howk, Yi Li

Donghai Dai

The genetic model of tumorigenesis by Vogelstein et al. (V theory) and the molecular definition of cancer hallmarks by Hanahan and Weinberg (W theory) represent two of the most comprehensive and systemic understandings of cancer. Here, we develop a mathematical model that quantitatively interprets these seminal cancer theories, starting from a set of equations describing the short life cycle of an individual cell in uterine epithelium during tissue regeneration. The process of malignant transformation of an individual cell is followed and the tissue (or tumor) is described as a composite of individual cells in order to quantitatively account for intra-tumor …


Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy Dec 2012

Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy

Edward H. Kennedy

Instrumental variable analysis is a technique commonly used in the social sciences to provide evidence that a treatment causes an outcome, as contrasted with evidence that a treatment is merely associated with differences in an outcome. To extract such strong evidence from observational data, instrumental variable analysis exploits situations where some degree of randomness affects how patients are selected for a treatment. An instrumental variable is a characteristic of the world that leads some people to be more likely to get the specific treatment we want to study but does not otherwise change thosepatients’ outcomes. This seminar explains, in nonmathematical …


Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman Dec 2012

Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman

Edward H. Kennedy

Use of the electronic health record (EHR) is expected to increase rapidly in the near future, yet little research exists on whether analyzing internal EHR data using flexible, adaptive statistical methods could improve clinical risk prediction. Extensive implementation of EHR in the Veterans Health Administration provides an opportunity for exploration. Our objective was to compare the performance of various approaches for predicting risk of cerebrovascular and cardiovascular (CCV) death, using traditional risk predictors versus more comprehensive EHR data. Regression methods outperformed the Framingham risk score, even with the same predictors (AUC increased from 71% to 73% and calibration also improved). …


Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja Dec 2012

Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja

Chaitra H Nagaraja

The Gini coefficient is frequently used to measure inequality in populations. However, it is possible that inequality levels may change over time differently for disparate subgroups which cannot be detected with population-level estimates only. Therefore, it may be informative to examine inequality separately for these segments. The case where the population is split into two segments based on non-overlapping quantile ranges is examined. Asymptotic theory is derived and practical methods to estimate standard errors and construct confidence intervals using resampling methods are developed. An application to per capita income across census tracts using American Community Survey data is considered.


A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith Dec 2012

A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith

Michael Stanley Smith

We suggest a new approach for forecasting energy demand at an intraday resolution. Demand in each intraday period is modeled using semiparametric regression smoothing to account for calendar and weather components. Residual serial dependence is captured by one of two multivariate stationary time series models, with dimension equal to the number of intraday periods. These are a periodic autoregression and a dynamic factor model. We show the benefits of our approach in the forecasting of district heating demand in a steam network in Germany and aggregate electricity demand in the state of Victoria, Australia. In both studies, accounting for weather …


Constructing And Evaluating An Autoregressive House Price Index, Chaitra Nagaraja, Lawrence Brown Dec 2012

Constructing And Evaluating An Autoregressive House Price Index, Chaitra Nagaraja, Lawrence Brown

Chaitra H Nagaraja

No abstract provided.


Connecting Big Data With Big Decisions: Ideas For Synthesizing Analytics And Decision Analysis, Jeffrey Keisler Dec 2012

Connecting Big Data With Big Decisions: Ideas For Synthesizing Analytics And Decision Analysis, Jeffrey Keisler

Jeffrey Keisler

This paper describes an approach to connect decision analysis models with outputs of analytic methods applied to various types of big data. Decision analysis models focus on issues of concern to a decision maker and incorporate use of a range of methods and axioms to develop insights about what the decision maker should do. In particular, decision analysis models typically use subjective judgments from the decision maker to describe beliefs about the likelihood of events and the desirability of outcomes. In order for human judgments to be improved by the availability of large amounts of data and processing power, it …


Asymptotic Behavior Of A T Test Robust To Cluster Heterogeneity, Douglas G. Steigerwald Dec 2012

Asymptotic Behavior Of A T Test Robust To Cluster Heterogeneity, Douglas G. Steigerwald

Douglas G. Steigerwald

We study the behavior of a cluster-robust t statistic and make two principle contributions. First, we relax the restriction of previous asymptotic theory that clusters have identical size, and establish that the cluster-robust t statistic continues to have a Gaussian asymptotic null distribution. Second, we determine how variation in cluster sizes, together with other sources of cluster heterogeneity, affect the behavior of the test statistic. To do so, we determine the sample specific measure of cluster heterogeneity that governs this behavior and show that the measure depends on how three quantities vary over clusters: cluster size, the cluster specific error …


Bayesian Approaches To Copula Modelling, Michael S. Smith Dec 2012

Bayesian Approaches To Copula Modelling, Michael S. Smith

Michael Stanley Smith

Copula models have become one of the most widely used tools in the applied modelling of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient likelihood-based inference. However, to date, there has been only limited use of Bayesian approaches in the formulation and estimation of copula models. This article aims to address this shortcoming in two ways. First, to introduce copula models and aspects of copula theory that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches to formulating and estimating copula models, and their advantages over alternative methods. Copulas covered include Archimedean, copulas constructed …


Aging Population Scenarios: An Australian Experience, Chris Lloyd Dec 2012

Aging Population Scenarios: An Australian Experience, Chris Lloyd

Chris J. Lloyd

One element of the analysis of adaptive clinical trials is combining the evidence from several (often two) stages. When the endpoint is binary, standard single stage tests statistics do not control size well. Yet the combined test might not be valid if the single stage tests are not. The purpose of this paper is to numerically and theoretically examine the extent to which combining basic tests statistics mitigates or magnifies the size violation of the final test.


Romans 1:18-2:29: A Stylometric Reconsideration, Keith L. Yoder Dec 2012

Romans 1:18-2:29: A Stylometric Reconsideration, Keith L. Yoder

Keith L. Yoder

Here I use the tools of multivariate data analysis to reconsider the proposal that Romans 1:18-2:29 was not originally composed by Paul. I examine the distributions of the 35 most frequent words in the New Testament epistolary Greek text, using Correspondence Analysis, Cluster Analysis, and Linear Discriminant Analysis. These tests jointly reveal a distinct statistical demarcation between Romans 1:18-29 and the undisputed Pauline letters, as well as differentiation between the undisputed Paulines and all the other letters of the New Testament. Data analysis thus supports the proposal that Romans 1:18-2:29 is a non-Pauline text.

Note of 12 September 2018: This …


A Case-Control Study Of Physical Activity Patterns And Risk Of Non-Fatal Myocardial Infarction, Jian Gong, Hannia Campos, Mark Fiecas, Stephen Mcgarvey, Robert Goldberg, Caroline Richardson, Ana Baylin Dec 2012

A Case-Control Study Of Physical Activity Patterns And Risk Of Non-Fatal Myocardial Infarction, Jian Gong, Hannia Campos, Mark Fiecas, Stephen Mcgarvey, Robert Goldberg, Caroline Richardson, Ana Baylin

Mark Fiecas

Background The interactive effects of different types of physical activity on cardiovascular disease (CVD) risk have not been fully considered in previous studies. We aimed to identify physical activity patterns that take into account combinations of physical activities and examine the association between derived physical activity patterns and risk of acute myocardial infarction (AMI). Methods We examined the relationship between physical activity patterns, identified by principal component analysis (PCA), and AMI risk in a case-control study of myocardial infarction in Costa Rica (N=4172), 1994-2004. The component scores derived from PCA and total METS were used in natural cubic spline models …


Quantifying Temporal Correlations: A Test-Retest Evaluation Of Functional Connectivity In Resting-State Fmri, Mark Fiecas, Hernando Ombao, Dan Van Lunen, Richard Baumgartner, Alexandre Coimbra, Dai Feng Dec 2012

Quantifying Temporal Correlations: A Test-Retest Evaluation Of Functional Connectivity In Resting-State Fmri, Mark Fiecas, Hernando Ombao, Dan Van Lunen, Richard Baumgartner, Alexandre Coimbra, Dai Feng

Mark Fiecas

There have been many interpretations of functional connectivity and proposed measures of temporal correlations between BOLD signals across different brain areas. These interpretations yield from many studies on functional connectivity using resting-state fMRI data that have emerged in recent years. However, not all of these studies used the same metrics for quantifying the temporal correlations between brain regions. In this paper, we use a public-domain test–retest resting-state fMRI data set to perform a systematic investigation of the stability of the metrics that are often used in resting-state functional connectivity (FC) studies. The fMRI data set was collected across three different …