Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

None

Statistics and Probability

Articles 1 - 30 of 119

Full-Text Articles in Physical Sciences and Mathematics

Rank-Based Test Procedures For Interaction In The Two-Way Layout With One Observation Per Cell, Brad Hartlaub May 2017

Rank-Based Test Procedures For Interaction In The Two-Way Layout With One Observation Per Cell, Brad Hartlaub

Brad Hartlaub

ABSTRACT New aligned-rank test procedures for the composite null hypothesis of no interaction effects (without placing restrictions on the two main e#ects) against appropriate composite general alternatives are developed for the standard two-way layout with a single observation per cell. Relative power performances of the two new aligned-rank procedures and existing tests due to Tukey (1949) and de Kroon & van der Laan (1981) are examined via Monte Carlo simulation. Extensive power studies conducted on the 56and59two-way layouts with one observation per cell show superior performance of the new procedures for a variety of interaction e#ects. Simulated critical values for …


Macroconstants Of Development: A New Benchmark For The Strategic Development Of Advanced Countries And Firms, Andrey Bystrov, Vyacheslav Yusim, Tamilla Curtis Mar 2016

Macroconstants Of Development: A New Benchmark For The Strategic Development Of Advanced Countries And Firms, Andrey Bystrov, Vyacheslav Yusim, Tamilla Curtis

Dr. Tamilla Curtis

This research proposed a new indicator of countries’ development called “macroconstants of development”. The literature review indicates that the concept of "macroconstants of development" is not used at the moment in neither the theory nor the practice of industrial policy. Research of longitudinal data of total GDP, GDP per capita and their derivatives for most countries of the world was conducted. An analysis of statistical information has been done by employing econometric analyses.

Based on the analysis of the statistical data, which characterizes the development of large, technologically advanced countries in ordinary conditions, it was identified that the average acceleration …


The Fraud Detection Triangle: A New Framework For Selecting Variables In Fraud Detection Research, Adrian Gepp, Kuldeep Kumar, Sukanto Bhattacharya Feb 2016

The Fraud Detection Triangle: A New Framework For Selecting Variables In Fraud Detection Research, Adrian Gepp, Kuldeep Kumar, Sukanto Bhattacharya

Kuldeep Kumar

The selection of explanatory (independent) variables is crucial to developing a fraud detection model. However, the selection process in prior financial statement fraud detection studies is not standardized. Furthermore, the categories of variables differ between studies. Consequently, the new Fraud Detection Triangle framework is proposed as an overall theory to assist in guiding the selection of variables for future fraud detection research. This new framework adapts and extends Cressey’s (1953) well-known and widely-used fraud triangle to make it more suited for use in fraud detection research. While the new framework was developed for financial statement fraud detection, it is more …


Predicting Financial Distress: A Comparison Of Survival Analysis And Decision Tree Techniques, Adrian Gepp, Kuldeep Kumar Feb 2016

Predicting Financial Distress: A Comparison Of Survival Analysis And Decision Tree Techniques, Adrian Gepp, Kuldeep Kumar

Adrian Gepp

Financial distress and then the consequent failure of a business is usually an extremely costly and disruptive event. Statistical financial distress prediction models attempt to predict whether a business will experience financial distress in the future. Discriminant analysis and logistic regression have been the most popular approaches, but there is also a large number of alternative cutting - edge data mining techniques that can be used. In this paper, a semi-parametric Cox survival analysis model and non-parametric CART decision trees have been applied to financial distress prediction and compared with each other as well as the most popular approaches. This …


Knowledge Is Free (And Statistics Should Be Too): An Introduction To Using “R” In The Classroom, Brendan Morse Dec 2015

Knowledge Is Free (And Statistics Should Be Too): An Introduction To Using “R” In The Classroom, Brendan Morse

Brendan J. Morse

This workshop will give attendees a hands-on introduction to using R, a freely available statistics software package. The focus of the discussion will be on using R in the classroom for basic mathematical and statistical operations that would be covered in an introductory or intermediate statistics course. The benefit of using R in the classroom is that students can download and install the program (and any add-on package) on their personal computers (Mac or PC) for free and work on statistics projects outside of class. Additionally, we will explore an add-on package that creates a point-and-click interface similar to SPSS …


The Gambler's Fallacy: A Test Of Football-Betting Market Efficiency, Ladd Kochman, Ravija Badarinathi Jul 2015

The Gambler's Fallacy: A Test Of Football-Betting Market Efficiency, Ladd Kochman, Ravija Badarinathi

Ladd Kochman

Imaginary wagers placed on college football teams during the 2006-2010 seasons that were expected to beat the point spread following two games in which they lost both on the field and against the spread produced a wins-to-bets ratio that was statistically nonrandom but not profitable. However, when that rule was limited to the major conference schools, a significantly profitable W/B ratio emerged that challenges the efficiency of a competitive market.


Dogs No Longer Man's Best Friend: A Test Of Football Market Efficiency, Ladd Kochman Jul 2015

Dogs No Longer Man's Best Friend: A Test Of Football Market Efficiency, Ladd Kochman

Ladd Kochman

The outcomes of wagers on underdogs in the National Football League for the 2003-2007 seasons indicated that what had been anomalous behavior no longer existed. The failure of underdogs to beat the spread in profitable or nonrandom fashion supports the argument that competitive markets are efficient and undermines the proposition that behavioral finance can illuminate exploitable betting patterns.


Revisiting The Streaking Teams Phenomenom: A Note, Ladd Kochman, Randy Goodwin Jul 2015

Revisiting The Streaking Teams Phenomenom: A Note, Ladd Kochman, Randy Goodwin

Ladd Kochman

In an effort to learn if systematic misperceptions by market participants can undermine efficient prices and create regular profit opportunities, Camerer (1989) and Brown and Sauer (1993) investigated whether participants in the basketball-betting market overbet streaking (or "hot") teams. The purpose of this note is determine whether streaking teams - both hot and cold-in college football alter point spreads to an exploitable degree. The pointwise outcomes of college football teams following 2-, 3-, 4-, 5-, 6-, 7-, 8-, and 9-game streaks during the 1996-2000 seasons. Streaks in the aggregate produced only breakeven results when used to predict the outcomes of …


Baseball Attendance And Outcome Uncertainty: A Note, Ladd Kochman, Ravija Badarinathi Jul 2015

Baseball Attendance And Outcome Uncertainty: A Note, Ladd Kochman, Ravija Badarinathi

Ladd Kochman

Recent claims that spiraling players' salaries doom the demand for Major League Baseball (MLB) make studies like Knowles et al. (1992) especially timely and useful. By following the lead of past writers--most notably Quirk and El Hodiri (1974)--Knowles et al. proxied the demand for MLB with game attendance and (like Quirk and El Hodiri) reported that attendance is maximized when the home team is slightly favored.


Making Sense Out Of Big Data - Popular Machine Learning Tools In Business Analytics, Kuldeep Kumar, Sukanto Bhattacharya Apr 2015

Making Sense Out Of Big Data - Popular Machine Learning Tools In Business Analytics, Kuldeep Kumar, Sukanto Bhattacharya

Kuldeep Kumar

'Big data' is the new buzzword in academic as well as industry circles. Laney (2001) came up with the three Vs that characterize big data - volume, velocity and variety. When talking about big data one is usually referring to a huge volume, in terabytes rather than gigabytes, that is captured either across cross-section or across time or more likely across both i.e. as a panel. However it is the sheer size of the data set that puts big data in an entirely different category requiring a special set of analytical tools and approaches for extracting information and also data …


Extreme Rainfall Frequencies Over The Kennedy Space Center Complex, Adam Schnapp, John Lanicci Apr 2015

Extreme Rainfall Frequencies Over The Kennedy Space Center Complex, Adam Schnapp, John Lanicci

John M Lanicci

A study of extreme rainfall frequencies over the NASA Kennedy Space Center complex was accomplished using a high-density rainfall dataset from the Tropical Rainfall Measurement Mission’s observational network archive. Data from the network were gridded and analyzed to produce rainfall accumulation estimates for various return periods over the complex ranging from 1 to 100 years. Results of the analysis show that the rainfall accumulations for the 100-year return period are typically around 315 mm and 433 mm for 24-hour and 72-hour durations, respectively. These 100-year event estimates are consistent with those calculated from a longer-period archive at Titusville. Because the …


Rapid Door To Balloon Time In The Treatment Of Acute St- Elevation Myocardial Infarction Meaningfully Reduces Overall Hospital Stay, Amit N. Nanavati Md, Nainesh Patel Md, Bruce Feldman Do, J Patrick Kleaveland Md, Orlando E. Rivera Rn, David A. Cox Md Apr 2015

Rapid Door To Balloon Time In The Treatment Of Acute St- Elevation Myocardial Infarction Meaningfully Reduces Overall Hospital Stay, Amit N. Nanavati Md, Nainesh Patel Md, Bruce Feldman Do, J Patrick Kleaveland Md, Orlando E. Rivera Rn, David A. Cox Md

Nainesh C Patel MD

No abstract provided.


Teaching Of Biostatistics And Epidemiology In Medical Schools: How Do We Fare Compared With Developed Countries, Vijay Tiwari, Kuldeep Kumar, Sherin Raj Mar 2015

Teaching Of Biostatistics And Epidemiology In Medical Schools: How Do We Fare Compared With Developed Countries, Vijay Tiwari, Kuldeep Kumar, Sherin Raj

Kuldeep Kumar

Background Biostatistics is taught in almost all medical schools at the undergraduate and the postgraduate levels as a core course and is a prerequisite to epidemiology, public health and evidence-based medicine. However, it has to be taught in a different way in medical schools as compared with how it is taught to the students studying MSc (Biostatistics) or in the Statistics Department in universities. Objectives (1) To review the experience of teaching biostatistics in medical schools in India and compares the same with abroad (2) How best the curriculum can be designed as per the need of the medical students …


Dependency-Topic-Affects-Sentiment-Lda Model For Sentiment Analysis, Shunshun Yin, Jun Han, Yu Huang, Kuldeep Kumar Mar 2015

Dependency-Topic-Affects-Sentiment-Lda Model For Sentiment Analysis, Shunshun Yin, Jun Han, Yu Huang, Kuldeep Kumar

Kuldeep Kumar

Sentiment analysis tends to use automated approaches to mine the sentiment information expressed in text, such as reviews, blogs and forum discussions. As most traditional approaches for sentiment analysis are based on supervised learning models and need many labeled corpora as their training data which are not always easily obtained, various unsupervised models based on Latent Dirichlet Allocation (LDA) have been proposed for sentiment classification. In this paper, we propose a novel probabilistic modeling framework based on LDA, called Dependency-Topic-Affects-Sentiment-LDA (DTAS) model, which drops the ”bag of words” assumption and assumes that the topics of sentences in a document form …


Valuing Initial Intellectual Capital Contribution In New Ventures - A Short Technical Note, Peter Blood, Kuldeep Kumar, Sukanto Bhattacharya Mar 2015

Valuing Initial Intellectual Capital Contribution In New Ventures - A Short Technical Note, Peter Blood, Kuldeep Kumar, Sukanto Bhattacharya

Kuldeep Kumar

In this short research note, we add to the existing technical literature on venture valuations. We posit and numerically demonstrate a simple technique of valuing intellectual contribution to a new venture in the form of initial know-how. Such valuation is essential in many practical venture valuation situations where the sources of the intellectual and cash contributions are separate thus necessitating a rational model for a fair apportioning of equity.


Strategy Formation For Higher Education Institutions Using System Dynamics Modelling, Mridula Sahay, Kuldeep Kumar Mar 2015

Strategy Formation For Higher Education Institutions Using System Dynamics Modelling, Mridula Sahay, Kuldeep Kumar

Kuldeep Kumar

System Dynamics is the modeling technique used to understand the behavior of a complex system over time. It is particularly useful in long-term forecasting when several variables are interrelated with each other. System dynamics models are different from statistical models in the sense they not only provide forecast and control, but they also offer explanations and an understanding of the relationships between the dependent variable and numerous exogenous and endogenous variables. This research paper focuses on the strategy formation for quality improvement in Higher Education Institutions (HEI’s) using system dynamics models. Most HEI’s in developing countries are taking a strong …


On The Interpretation Of Multi-Year Estimates Of The American Community Survey As Period Estimates, Chaitra Nagaraja, Tucker Mcelroy Dec 2014

On The Interpretation Of Multi-Year Estimates Of The American Community Survey As Period Estimates, Chaitra Nagaraja, Tucker Mcelroy

Chaitra H Nagaraja

The rolling sample methodology of the American Community Survey introduces temporal distortions, resulting in Multi-Year Estimates that measure aggregate activity over three or five years. This paper introduces a novel, nonparametric method for quantifying the impact of viewing multi-year estimates as functions of single-year estimates belonging to the same time span. The method is based on examining the changes to confidence interval coverage. As an application of primary interest, the interpretation of a multi-year estimate as the simple average of single-year estimates is a viewpoint that underpins the published estimates of sampling variability. Therefore it is vital to ascertain the …


Financial Statement Fraud Detection Using Supervised Learning Methods (Ph.D. Dissertation), Adrian Gepp Dec 2014

Financial Statement Fraud Detection Using Supervised Learning Methods (Ph.D. Dissertation), Adrian Gepp

Adrian Gepp

No abstract provided.


Promoting Similarity Of Model Sparsity Structures In Integrative Analysis Of Cancer Genetic Data, Shuangge Ma Dec 2014

Promoting Similarity Of Model Sparsity Structures In Integrative Analysis Of Cancer Genetic Data, Shuangge Ma

Shuangge Ma

In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information across multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified …


1. Estimation Of Stress-Strength Reliability From Truncated Type-I Generalized Logistic Distribution, Srinivasa Rao Gadde Dec 2014

1. Estimation Of Stress-Strength Reliability From Truncated Type-I Generalized Logistic Distribution, Srinivasa Rao Gadde

Srinivasa Rao Gadde Dr.

No abstract provided.


Cutoff: A Spatio-Temporal Imputation Method, Lingbing Feng, Gen Nowak, Terry O'Neill, Alan H. Welsh Nov 2014

Cutoff: A Spatio-Temporal Imputation Method, Lingbing Feng, Gen Nowak, Terry O'Neill, Alan H. Welsh

Terry O'Neill

Missing values occur frequently in many different statistical applications and need to be dealt with carefully, especially when the data are collected spatio-temporally. We propose a method called CUTOFF imputation that utilizes the spatio-temporal nature of the data to accurately and efficiently impute missing values. The main feature of this method is that the estimate of a missing value is produced by incorporating similar observed temporal information from the value’s nearest spatial neighbors. Extensions to this method are also developed to expand the method’s ability to accommodate other data generating processes. We develop a cross-validation procedure that optimally chooses parameters …


A Quantitative Paleoecological Approach To High-Resolution Cyclic And Event Stratigraphy: The Upper Ordovician Miamitown Shale In The Type Cincinnatian, Benjamin Dattilo Jul 2014

A Quantitative Paleoecological Approach To High-Resolution Cyclic And Event Stratigraphy: The Upper Ordovician Miamitown Shale In The Type Cincinnatian, Benjamin Dattilo

Benjamin F. Dattilo

No abstract provided.


The Lower Ordovician Fillmore Formation Of Western Utah: Storm-Dominated Sedimentation On A Passive Margin., Benjamin Dattilo Jul 2014

The Lower Ordovician Fillmore Formation Of Western Utah: Storm-Dominated Sedimentation On A Passive Margin., Benjamin Dattilo

Benjamin F. Dattilo

No abstract provided.


The Hot Hand In Number Matching Lottery Games, Mintaek Lee Apr 2014

The Hot Hand In Number Matching Lottery Games, Mintaek Lee

Mintaek Lee

This project examines how many respondents are influenced by the Hot Hand Fallacy and Law of Small Numbers when playing a number matching lottery game. Three sets of previous winning numbers for a lottery game were provided to participants, and they were asked to estimate the likelihood of each number being drawn in the next drawing. Responses from people with less statistical background were more distorted from the theoretical probability than responses from people with more statistical background. The majority of the responses were sufficient to verify the initial hypothesis. The linear regression model was also constructed to predict the …


Repeat Sales House Price Index Methodology, Chaitra Nagaraja, Lawrence Brown, Susan Wachter Dec 2013

Repeat Sales House Price Index Methodology, Chaitra Nagaraja, Lawrence Brown, Susan Wachter

Chaitra H Nagaraja

No abstract provided.


Business Statistics In Practice, Bruce Bowerman, Julie Schermer, Andrew Johnson, Richard O'Connell, Emily Murphree Dec 2013

Business Statistics In Practice, Bruce Bowerman, Julie Schermer, Andrew Johnson, Richard O'Connell, Emily Murphree

Andrew M. Johnson

No abstract provided.


Measures For The Degree Of Overlap Of Gene Signatures And Applications To Tcga, Shuangge Ma Dec 2013

Measures For The Degree Of Overlap Of Gene Signatures And Applications To Tcga, Shuangge Ma

Shuangge Ma

For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple studies on the same cancer type/outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis …


Bias In Estimating The Causal Hazard Ratio Using Two-Stage Instrumental Variable Methods, Fei Wan, Dylan S. Small, Justin E. Bekelman, Nandita Mitra Dec 2013

Bias In Estimating The Causal Hazard Ratio Using Two-Stage Instrumental Variable Methods, Fei Wan, Dylan S. Small, Justin E. Bekelman, Nandita Mitra

fei wan

Two stage instrumental variable methods are commonly used to determine the causal effects of treatments on survival in the presence of measured and unmeasured confounding. Two stage residual inclusion (2SRI) has been the method of choice over two stage predictor substitution (2SPS) in clinical studies. We directly compare the bias in the causal hazard ratio estimated by these two methods. Under a principal stratification framework, we derive a closed form solution for asymptotic bias of the causal hazard ratio among compliers for both the 2SPS and 2SRI methods when survival time follows the Weibull distribution with random censoring. When there …


Integrative Analysis Of High-Throughput Cancer Studies With Contrasted Penalization, Shuangge Ma Oct 2013

Integrative Analysis Of High-Throughput Cancer Studies With Contrasted Penalization, Shuangge Ma

Shuangge Ma

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms ``classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by …


Reference Interval Studies: What Is The Maximum Number Of Samples Recommended?, Robert Hawkins, Tony Badrick Sep 2013

Reference Interval Studies: What Is The Maximum Number Of Samples Recommended?, Robert Hawkins, Tony Badrick

Tony Badrick

Background: Little attention has been paid to the maximum number of specimens for reference interval calculation, i.e., the number of specimens beyond which there is no further benefit in reference interval calculation. We present a model for the estimation of the maximum number of specimens for reference interval studies based on setting the 90% confidence interval of the reference limits to be equal to the analyte reporting interval. Methods: Equations describing the bounds on the upper and lower 90% confidence intervals for logarithmically transformed and untransformed data were derived and applied to determine the maximum number of specimens required to …