Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Application (2)
- Discriminant analysis (2)
- Functional Connectivity (2)
- Artificial intelligence (1)
- Asymptotics (1)
-
- Authorship attribution (1)
- Bankruptcy prediction (1)
- Bayesian Model Averaging and Semiparametric Regression (1)
- Bivariate autoregressive model (1)
- Bootstrap (1)
- Brain eff (1)
- Business intelligence (1)
- Cardiovascular disease (1)
- Causal analysis (1)
- Census (1)
- Claims data (1)
- Cluster (1)
- Cluster analysis (1)
- Comprehension (1)
- Conference Presentations (1)
- Copula Modeling (1)
- Correspondence analysis (1)
- Costa Rica (1)
- Data Science (1)
- Data science (1)
- Decision system (1)
- Decision tree (1)
- Demography (1)
- Demystifying data (1)
- District Demand Heating (1)
- Publication
- File Type
Articles 1 - 23 of 23
Full-Text Articles in Physical Sciences and Mathematics
Statistical Models For Predicting College Success, Yelen Nunez
Statistical Models For Predicting College Success, Yelen Nunez
Yelen Nunez
Colleges base their admission decisions on a number of factors to determine which applicants have the potential to succeed. This study utilized data for students that graduated from Florida International University between 2006 and 2012. Two models were developed (one using SAT as the principal explanatory variable and the other using ACT as the principal explanatory variable) to predict college success, measured using the student’s college grade point average at graduation. Some of the other factors that were used to make these predictions were high school performance, socioeconomic status, major, gender, and ethnicity. The model using ACT had a higher …
Create A Simple Predictive Analytics Classification Model In Java With Weka, James Howard
Create A Simple Predictive Analytics Classification Model In Java With Weka, James Howard
James Howard
Get an overview of the Weka classification engine and learn how to create a simple classifier for programmatic use. Understand how to store and load models, manipulate them, and use them to evaluate data. Consider applications and implementation strategies suitable for the enterprise environment so you turn a collection of training data into a functioning model for real- time prediction.
Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer
Hierarchical Vector Auto-Regressive Models And Their Applications To Multi-Subject Effective Connectivity, Cristina Gorrostieta, Mark Fiecas, Hernando Ombao, Erin Burke, Steven Cramer
Mark Fiecas
Counting The Impossible: Sampling And Modeling To Achieve A Large State Homeless Count, Jennifer L. Priestley, Jane Massey
Counting The Impossible: Sampling And Modeling To Achieve A Large State Homeless Count, Jennifer L. Priestley, Jane Massey
Jennifer L. Priestley
Objective: Using inferential statistics, we develop estimates of the homeless population of a geographically large and economically diverse state -- Georgia.
Methods: Multiple independent data sources (2000 U.S. Census, the 2006 Georgia County Guide, Georgia Chamber of Commerce) were used to develop Clusters of the 150 Georgia Counties. These clusters were used as "strata" to then execute traified sampling. Homeless counts were conducted within the sample counties, allowing for multiple regression models to be developed to generate predictions of homeless persons by county.
Results: In response to a mandate from the US Department of Housing and Urban Development, the State …
Active Presecription Drug Safety Surveillance: Exploring Omop 2011-2012 Experiments, Susan Gruber, James M. Robins
Active Presecription Drug Safety Surveillance: Exploring Omop 2011-2012 Experiments, Susan Gruber, James M. Robins
Susan Gruber
The Observational Medical Outcomes Partnership (OMOP), a consortium of pharmaceutical, FDA, and academic researchers focuses on developing and evaluating electronic records-based methods for enhancing post-market drug safety surveillance. The OMOP 2011-2012 experiment consists of applying variants of seven analysis methods to five different EMR or claims databases to estimate the increase (decrease) in risk associated with drug-outcome pairs whose causal association has been previously established, and serves as a gold standard for comparison. Variants of each method can produce very different effect estimates, sometimes at odds with the gold standard. We explore the reasons behind this heterogeneity, and in doing …
Reference Interval Studies: What Is The Maximum Number Of Samples Recommended?, Robert Hawkins, Tony Badrick
Reference Interval Studies: What Is The Maximum Number Of Samples Recommended?, Robert Hawkins, Tony Badrick
Tony Badrick
Background: Little attention has been paid to the maximum number of specimens for reference interval calculation, i.e., the number of specimens beyond which there is no further benefit in reference interval calculation. We present a model for the estimation of the maximum number of specimens for reference interval studies based on setting the 90% confidence interval of the reference limits to be equal to the analyte reporting interval. Methods: Equations describing the bounds on the upper and lower 90% confidence intervals for logarithmically transformed and untransformed data were derived and applied to determine the maximum number of specimens required to …
A Study Of Non-Central Skew T Distributions And Their Applications In Data Analysis And Change Point Detection, Abeer Hasan
A Study Of Non-Central Skew T Distributions And Their Applications In Data Analysis And Change Point Detection, Abeer Hasan
Abeer Hasan
A Mathematical Model For Estimation Of Fibre, Abhijit Bhattacharya, Kuldeep Kumar
A Mathematical Model For Estimation Of Fibre, Abhijit Bhattacharya, Kuldeep Kumar
Kuldeep Kumar
Yield estimates of fibre in Jute plants (Capsulanes) are usually obtained on the basis of random samples of plants. These estimates are required by the government for the purpose of planning and policy formulation. Due to time and resource constraint, it becomes quite often difficult to compute yield estimates from samples of large size. In this paper an attempt has been made to propose a method based on Gaussian quadrature to estimate the fibre yield from smaller samples. Identification of plants comprising a smaller sample and corresponding weights to be assigned to the yield of plants included in the smaller …
Recognition And Resolution Of 'Comprehension Uncertainty' In Ai, Sukanto Bhattacharya, Kuldeep Kumar
Recognition And Resolution Of 'Comprehension Uncertainty' In Ai, Sukanto Bhattacharya, Kuldeep Kumar
Kuldeep Kumar
Handling uncertainty is an important component of most intelligent behaviour – so uncertainty resolution is a key step in the design of an artificially intelligent decision system (Clark, 1990). Like other aspects of intelligent systems design, the aspect of uncertainty resolution is also typically sought to be handled by emulating natural intelligence (Halpern, 2003; Ball and Christensen, 2009). In this regard, a number of computational uncertainty resolution approaches have been proposed and tested by Artificial Intelligence (AI) researchers over the past several decades since birth of Al as a scientific discipline in early 1950s post- publication of Alan Turing's landmark …
Business Failure Prediction Using Statistical Techniques: A Review, Adrian Gepp, Kuldeep Kumar
Business Failure Prediction Using Statistical Techniques: A Review, Adrian Gepp, Kuldeep Kumar
Adrian Gepp
Accurate business failure prediction models would be extremely valuable to many industry sectors, particularly in financial investment and lending. The potential value of such models has been recently emphasised by the extremely cosdy failure of high profile businesses in both Australia and overseas, such as HIH (Australia) and Enron (USA). Consequently, there has been a significant increase in interest in business failure prediction from both industry and academia. Statistical business failure prediction models attempt to predict the failure or success of a business. Discriminant and logit analyses are the most popular approaches, and there are also a large number of …
Quantitative Interpretation Of A Genetic Model Of Carcinogenesis Using Computer Simulations, Donghai Dai, Brandon Beck, Xiaofang Wang, Cory Howk, Yi Li
Quantitative Interpretation Of A Genetic Model Of Carcinogenesis Using Computer Simulations, Donghai Dai, Brandon Beck, Xiaofang Wang, Cory Howk, Yi Li
Donghai Dai
The genetic model of tumorigenesis by Vogelstein et al. (V theory) and the molecular definition of cancer hallmarks by Hanahan and Weinberg (W theory) represent two of the most comprehensive and systemic understandings of cancer. Here, we develop a mathematical model that quantitatively interprets these seminal cancer theories, starting from a set of equations describing the short life cycle of an individual cell in uterine epithelium during tissue regeneration. The process of malignant transformation of an individual cell is followed and the tissue (or tumor) is described as a composite of individual cells in order to quantitatively account for intra-tumor …
Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy
Instrumental Variable Analyses: Exploiting Natural Randomness To Understand Causal Mechanisms, Theodore Iwashyna, Edward Kennedy
Edward H. Kennedy
Instrumental variable analysis is a technique commonly used in the social sciences to provide evidence that a treatment causes an outcome, as contrasted with evidence that a treatment is merely associated with differences in an outcome. To extract such strong evidence from observational data, instrumental variable analysis exploits situations where some degree of randomness affects how patients are selected for a treatment. An instrumental variable is a characteristic of the world that leads some people to be more likely to get the specific treatment we want to study but does not otherwise change thosepatients’ outcomes. This seminar explains, in nonmathematical …
Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman
Improved Cardiovascular Risk Prediction Using Nonparametric Regression And Electronic Health Record Data, Edward Kennedy, Wyndy Wiitala, Rodney Hayward, Jeremy Sussman
Edward H. Kennedy
Use of the electronic health record (EHR) is expected to increase rapidly in the near future, yet little research exists on whether analyzing internal EHR data using flexible, adaptive statistical methods could improve clinical risk prediction. Extensive implementation of EHR in the Veterans Health Administration provides an opportunity for exploration. Our objective was to compare the performance of various approaches for predicting risk of cerebrovascular and cardiovascular (CCV) death, using traditional risk predictors versus more comprehensive EHR data. Regression methods outperformed the Framingham risk score, even with the same predictors (AUC increased from 71% to 73% and calibration also improved). …
Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja
Theory And Methods For Gini Coefficients Partitioned By Quantile Range, Chaitra Nagaraja
Chaitra H Nagaraja
The Gini coefficient is frequently used to measure inequality in populations. However, it is possible that inequality levels may change over time differently for disparate subgroups which cannot be detected with population-level estimates only. Therefore, it may be informative to examine inequality separately for these segments. The case where the population is split into two segments based on non-overlapping quantile ranges is examined. Asymptotic theory is derived and practical methods to estimate standard errors and construct confidence intervals using resampling methods are developed. An application to per capita income across census tracts using American Community Survey data is considered.
A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith
A Comparison Of Periodic Autoregressive And Dynamic Factor Models In Intraday Energy Demand Forecasting, Thomas Mestekemper, Goeran Kauermann, Michael Smith
Michael Stanley Smith
We suggest a new approach for forecasting energy demand at an intraday resolution. Demand in each intraday period is modeled using semiparametric regression smoothing to account for calendar and weather components. Residual serial dependence is captured by one of two multivariate stationary time series models, with dimension equal to the number of intraday periods. These are a periodic autoregression and a dynamic factor model. We show the benefits of our approach in the forecasting of district heating demand in a steam network in Germany and aggregate electricity demand in the state of Victoria, Australia. In both studies, accounting for weather …
Constructing And Evaluating An Autoregressive House Price Index, Chaitra Nagaraja, Lawrence Brown
Constructing And Evaluating An Autoregressive House Price Index, Chaitra Nagaraja, Lawrence Brown
Chaitra H Nagaraja
No abstract provided.
Connecting Big Data With Big Decisions: Ideas For Synthesizing Analytics And Decision Analysis, Jeffrey Keisler
Connecting Big Data With Big Decisions: Ideas For Synthesizing Analytics And Decision Analysis, Jeffrey Keisler
Jeffrey Keisler
This paper describes an approach to connect decision analysis models with outputs of analytic methods applied to various types of big data. Decision analysis models focus on issues of concern to a decision maker and incorporate use of a range of methods and axioms to develop insights about what the decision maker should do. In particular, decision analysis models typically use subjective judgments from the decision maker to describe beliefs about the likelihood of events and the desirability of outcomes. In order for human judgments to be improved by the availability of large amounts of data and processing power, it …
Asymptotic Behavior Of A T Test Robust To Cluster Heterogeneity, Douglas G. Steigerwald
Asymptotic Behavior Of A T Test Robust To Cluster Heterogeneity, Douglas G. Steigerwald
Douglas G. Steigerwald
We study the behavior of a cluster-robust t statistic and make two principle contributions. First, we relax the restriction of previous asymptotic theory that clusters have identical size, and establish that the cluster-robust t statistic continues to have a Gaussian asymptotic null distribution. Second, we determine how variation in cluster sizes, together with other sources of cluster heterogeneity, affect the behavior of the test statistic. To do so, we determine the sample specific measure of cluster heterogeneity that governs this behavior and show that the measure depends on how three quantities vary over clusters: cluster size, the cluster specific error …
Bayesian Approaches To Copula Modelling, Michael S. Smith
Bayesian Approaches To Copula Modelling, Michael S. Smith
Michael Stanley Smith
Copula models have become one of the most widely used tools in the applied modelling of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient likelihood-based inference. However, to date, there has been only limited use of Bayesian approaches in the formulation and estimation of copula models. This article aims to address this shortcoming in two ways. First, to introduce copula models and aspects of copula theory that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches to formulating and estimating copula models, and their advantages over alternative methods. Copulas covered include Archimedean, copulas constructed …
Aging Population Scenarios: An Australian Experience, Chris Lloyd
Aging Population Scenarios: An Australian Experience, Chris Lloyd
Chris J. Lloyd
One element of the analysis of adaptive clinical trials is combining the evidence from several (often two) stages. When the endpoint is binary, standard single stage tests statistics do not control size well. Yet the combined test might not be valid if the single stage tests are not. The purpose of this paper is to numerically and theoretically examine the extent to which combining basic tests statistics mitigates or magnifies the size violation of the final test.
Romans 1:18-2:29: A Stylometric Reconsideration, Keith L. Yoder
Romans 1:18-2:29: A Stylometric Reconsideration, Keith L. Yoder
Keith L. Yoder
A Case-Control Study Of Physical Activity Patterns And Risk Of Non-Fatal Myocardial Infarction, Jian Gong, Hannia Campos, Mark Fiecas, Stephen Mcgarvey, Robert Goldberg, Caroline Richardson, Ana Baylin
A Case-Control Study Of Physical Activity Patterns And Risk Of Non-Fatal Myocardial Infarction, Jian Gong, Hannia Campos, Mark Fiecas, Stephen Mcgarvey, Robert Goldberg, Caroline Richardson, Ana Baylin
Mark Fiecas
Background The interactive effects of different types of physical activity on cardiovascular disease (CVD) risk have not been fully considered in previous studies. We aimed to identify physical activity patterns that take into account combinations of physical activities and examine the association between derived physical activity patterns and risk of acute myocardial infarction (AMI). Methods We examined the relationship between physical activity patterns, identified by principal component analysis (PCA), and AMI risk in a case-control study of myocardial infarction in Costa Rica (N=4172), 1994-2004. The component scores derived from PCA and total METS were used in natural cubic spline models …
Quantifying Temporal Correlations: A Test-Retest Evaluation Of Functional Connectivity In Resting-State Fmri, Mark Fiecas, Hernando Ombao, Dan Van Lunen, Richard Baumgartner, Alexandre Coimbra, Dai Feng
Quantifying Temporal Correlations: A Test-Retest Evaluation Of Functional Connectivity In Resting-State Fmri, Mark Fiecas, Hernando Ombao, Dan Van Lunen, Richard Baumgartner, Alexandre Coimbra, Dai Feng
Mark Fiecas
There have been many interpretations of functional connectivity and proposed measures of temporal correlations between BOLD signals across different brain areas. These interpretations yield from many studies on functional connectivity using resting-state fMRI data that have emerged in recent years. However, not all of these studies used the same metrics for quantifying the temporal correlations between brain regions. In this paper, we use a public-domain test–retest resting-state fMRI data set to perform a systematic investigation of the stability of the metrics that are often used in resting-state functional connectivity (FC) studies. The fMRI data set was collected across three different …