Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Journal

2018

Institution
Keyword
Publication

Articles 1 - 30 of 68

Full-Text Articles in Physical Sciences and Mathematics

A Proficient Two-Stage Stratified Randomized Response Strategy, Tanveer A. Tarray, Housila P. Singh Dec 2018

A Proficient Two-Stage Stratified Randomized Response Strategy, Tanveer A. Tarray, Housila P. Singh

Journal of Modern Applied Statistical Methods

A stratified randomized response model based on R. Singh, Singh, Mangat, and Tracy (1995) improved two-stage randomized response strategy is proposed. It has an optimal allocation and large gain in precision. Conditions are obtained under which the proposed model is more efficient than R. Singh et al. (1995) and H. P. Singh and Tarray (2015) models. Numerical illustrations are also given in support of the present study.


Extended Method For Several Dichotomous Covariates To Estimate The Instantaneous Risk Function Of The Aalen Additive Model, Luciane Teixeira Passos Giarola, Mario Javier Ferrua Vivanco, Marcelo Angelo Cirillo, Fortunato Silva Menezes Dec 2018

Extended Method For Several Dichotomous Covariates To Estimate The Instantaneous Risk Function Of The Aalen Additive Model, Luciane Teixeira Passos Giarola, Mario Javier Ferrua Vivanco, Marcelo Angelo Cirillo, Fortunato Silva Menezes

Journal of Modern Applied Statistical Methods

The instantaneous risk function of Aalen’s model is estimated considering dichotomous covariates, using parametric accumulated risk functions to smooth cumulative risk of Aalen by grouping the individuals into sets named parcels. This methodology can be used for data with dichotomous covariates.


Simple Unbalanced Ranked Set Sampling For Mean Estimation Of Response Variable Of Developmental Programs, Girish Chandra, Dinesh S. Bhoj, Rajiv Pandey Dec 2018

Simple Unbalanced Ranked Set Sampling For Mean Estimation Of Response Variable Of Developmental Programs, Girish Chandra, Dinesh S. Bhoj, Rajiv Pandey

Journal of Modern Applied Statistical Methods

An unbalanced ranked set sampling (RSS) procedure on the skewed survey variable is proposed to estimate the population mean of a response variable from the area of developmental programs which are generally implemented under different phases. It is based on the unbalanced RSS under linear impacts of the program and is compared with the estimators based on simple random sampling (SRS) and balanced RSS. It is shown that the relative precision of the proposed estimator is higher than those of the estimators based on SRS and balanced RSS for three chosen skewed distributions of survey variables.


Transient Solution Of An M/M/1 Retrial Queue With Reneging From Orbit, A. Azhagappan, E. Veeramani, W. Monica, K. Sonabharathi Dec 2018

Transient Solution Of An M/M/1 Retrial Queue With Reneging From Orbit, A. Azhagappan, E. Veeramani, W. Monica, K. Sonabharathi

Applications and Applied Mathematics: An International Journal (AAM)

In this paper, the transient behavior of an M/M/1 retrial queueing model is analyzed where the customers in the orbit possess the reneging behavior. There is no waiting room in the system for the arrivals. If the server is not free when the occurrence of an arrival, the arriving customer moves to the waiting group, known as orbit and retries for his service. If the server is idle when an arrival occurs (either coming from outside the queueing system or from the waiting group), the arrival immediately gets the service and leaves the system. Each individual customer in the orbit, …


Batch Arrival Bulk Service Queue With Unreliable Server, Second Optional Service, Two Different Vacations And Restricted Admissibility Policy, G. Ayyappan, R. Supraja Dec 2018

Batch Arrival Bulk Service Queue With Unreliable Server, Second Optional Service, Two Different Vacations And Restricted Admissibility Policy, G. Ayyappan, R. Supraja

Applications and Applied Mathematics: An International Journal (AAM)

This paper is concerned with batch arrival queue with an additional second optional service to a batch of customers with dissimilar service rate where the idea of restricted admissibility of arriving batch of customers is also introduced. The server may take two different vacations (i) Emergency vacation-during service the server may go for vacation to an emergency call and after completion of the vacation, the server continues the remaining service to a batch of customers. (ii) Bernoulli vacation-after completion of first essential or second optional service, the server may take a vacation or may remain in the system to serve …


Some New Discretization Methods With Application In Reliability, Gholamhossein Yari, Zahra Tondpour Dec 2018

Some New Discretization Methods With Application In Reliability, Gholamhossein Yari, Zahra Tondpour

Applications and Applied Mathematics: An International Journal (AAM)

Deriving discrete analogues (Discretization) of continuous distributions has drawn attention of researchers, in recent decades. Discretization has been playing a key role in modeling life time data because in real world, most of original life time data are continuous while they are discrete in observation. In this paper, we introduce three new two-stage composite discretization methods to meet the need of fitting discrete-time reliability and survival data sets. All three proposed methods consist of two stages where using construction a new continuous random variable by underlying continuous random variable in the first stage and so based on maintaining hazard rate …


An M^X/G(A,B)/1 Queue With Breakdown And Delay Time To Two Phase Repair Under Multiple Vacation, G. Ayyappan, M. Nirmala Dec 2018

An M^X/G(A,B)/1 Queue With Breakdown And Delay Time To Two Phase Repair Under Multiple Vacation, G. Ayyappan, M. Nirmala

Applications and Applied Mathematics: An International Journal (AAM)

In this paper, we consider an Mx /G(a,b)/1 queue with active breakdown and delay time to two phase repair under multiple vacation policy. A batch of customers arrive according to a compound Poisson process. The server serves the customers according to the “General Bulk Service Rule” (GBSR) and the service time follows a general (arbitrary) distribution. The server is unreliable and it may breakdown at any instance. As the result of breakdown, the service is suspended, the server waits for the repair to start and this waiting time is called as „delay time‟ and is assumed to follow general …


Analysis Of Batch Arrival Bulk Service Queue With Multiple Vacation Closedown Essential And Optional Repair, G. Ayyappan, T. Deepa Dec 2018

Analysis Of Batch Arrival Bulk Service Queue With Multiple Vacation Closedown Essential And Optional Repair, G. Ayyappan, T. Deepa

Applications and Applied Mathematics: An International Journal (AAM)

The objective of this paper is to analyze an queueing model with multiple vacation, closedown, essential and optional repair. Whenever the queue size is less than , the server starts closedown and then goes to multiple vacation. This process continues until at least customer is waiting in the queue. Breakdown may occur with probability when the server is busy. After finishing a batch of service, if the server gets breakdown with a probability , the server will be sent for repair. After the completion of the first essential repair, the server is sent to the second optional repair with probability …


The Impact Of Sample Size In Cross-Classified Multiple Membership Multilevel Models, Hyewon Chung, Jiseon Kim, Ryoungsun Park, Hyeonjeong Jean Nov 2018

The Impact Of Sample Size In Cross-Classified Multiple Membership Multilevel Models, Hyewon Chung, Jiseon Kim, Ryoungsun Park, Hyeonjeong Jean

Journal of Modern Applied Statistical Methods

A simulation study was conducted to examine parameter recovery in a cross-classified multiple membership multilevel model. No substantial relative bias was identified for the fixed effect or level-one variance component estimates. However, the level-two cross-classification multiple membership factor variance components were substantially biased with relatively fewer groups.


Probabilities Involving Standard Trirectangular Tetrahedral Dice Rolls, Rulon Olmstead, Doneliezer Baize Oct 2018

Probabilities Involving Standard Trirectangular Tetrahedral Dice Rolls, Rulon Olmstead, Doneliezer Baize

Rose-Hulman Undergraduate Mathematics Journal

The goal is to be able to calculate probabilities involving irregular shaped dice rolls. Here it is attempted to model the probabilities of rolling standard tri-rectangular tetrahedral dice on a hard surface, such as a table top. The vertices and edges of a tetrahedron were projected onto the surface of a sphere centered at the center of mass of the tetrahedron. By calculating the surface areas bounded by the resultant geodesics, baseline probabilities were achieved. Using a 3D printer, dice were constructed of uniform density and the results of rolling them were recorded. After calculating the corresponding confidence intervals, the …


Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, Kenneth R. Szulczyk, Shibley Sadique Oct 2018

Using Cyclical Components To Improve The Forecasts Of The Stock Market And Macroeconomic Variables, Kenneth R. Szulczyk, Shibley Sadique

Journal of Modern Applied Statistical Methods

Economic variables such as stock market indices, interest rates, and national output measures contain cyclical components. Forecasting methods excluding these cyclical components yield inaccurate out-of-sample forecasts. Accordingly, a three-stage procedure is developed to estimate a vector autoregression (VAR) with cyclical components. A Monte Carlo simulation shows the procedure estimates the parameters accurately. Subsequently, a VAR with cyclical components improves the root-mean-square error of out-of-sample forecasts by 50% for a stock market model with macroeconomic variables.


The Optimal Risk Of Estimator Of Conditional Distribution Function In A Model Of Heteroscedastic Regression With Weakly Dependent Observations, Farkhad Abdikalikov, Laura Reymova Sep 2018

The Optimal Risk Of Estimator Of Conditional Distribution Function In A Model Of Heteroscedastic Regression With Weakly Dependent Observations, Farkhad Abdikalikov, Laura Reymova

Bulletin of National University of Uzbekistan: Mathematics and Natural Sciences

Paper is devoted to estimation of conditional distribution function in heteroscedastical regression model, in which responses are α-mixing random variables. It is found the expression for mean square deviation of estimator and optimal window width sequence.


Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling Sep 2018

Comparison Of Multiple Imputation Methods For Categorical Survey Items With High Missing Rates: Application To The Family Life, Activity, Sun, Health And Eating (Flashe) Study, Benmei Liu, Erin Hennessy, April Oh, Laura A. Dwyer, Linda Nebeling

Journal of Modern Applied Statistical Methods

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study.


Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma Sep 2018

Dealing With Sensitive Quantitative Variables: A Comparison Of Sampling Designs For The Procedure Of Gupta And Thornton, Carlos Narciso Bouza Herrera, Prayas Sharma

Journal of Modern Applied Statistical Methods

The use of randomized response procedures allows diminishing the number of non-responses and increasing the accuracy of the responses. A new sampling strategy is developed where the reports are scrambled using the procedure of Gupta and Thornton. The estimator of the mean as well as the errors are developed for the Rao-Hartley-Cochran and Ranked Sets Sampling designs. The proposals are compared with the original model based on the use of simple random sampling.


Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora Sep 2018

Bayesian And Semi-Bayesian Estimation Of The Parameters Of Generalized Inverse Weibull Distribution, Kamaljit Kaur, Kalpana K. Mahajan, Sangeeta Arora

Journal of Modern Applied Statistical Methods

Bayesian and semi-Bayesian estimators of parameters of the generalized inverse Weibull distribution are obtained using Jeffreys’ prior and informative prior under specific assumptions of loss function. Using simulation, the relative efficiency of the proposed estimators is obtained under different set-ups. A real life example is also given.


Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl Aug 2018

Overcoming Small Data Limitations In Heart Disease Prediction By Using Surrogate Data, Alfeo Sabay, Laurie Harris, Vivek Bejugama, Karen Jaceldo-Siegl

SMU Data Science Review

In this paper, we present a heart disease prediction use case showing how synthetic data can be used to address privacy concerns and overcome constraints inherent in small medical research data sets. While advanced machine learning algorithms, such as neural networks models, can be implemented to improve prediction accuracy, these require very large data sets which are often not available in medical or clinical research. We examine the use of surrogate data sets comprised of synthetic observations for modeling heart disease prediction. We generate surrogate data, based on the characteristics of original observations, and compare prediction accuracy results achieved from …


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler Aug 2018

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool capable …


Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi Aug 2018

Predicting National Basketball Association Success: A Machine Learning Approach, Adarsh Kannan, Brian Kolovich, Brandon Lawrence, Sohail Rafiqi

SMU Data Science Review

In this paper, we present a machine learning based approach to projecting the success of National Basketball Association (NBA) draft prospects. With the proliferation of data, analytics have increasingly be- come a critical component in the assessment of professional and collegiate basketball players. We leverage player biometric data, college statistics, draft selection order, and positional breakdown as modelling features in our prediction algorithms. We found that a player's draft pick and their college statistics are the best predictors of their longevity in the National Basketball Association.


Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John Aug 2018

Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

SMU Data Science Review

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age, …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra Aug 2018

Cryptocurrency Price Prediction Using Tweet Volumes And Sentiment Analysis, Jethin Abraham, Daniel Higdon, John Nelson, Juan Ibarra

SMU Data Science Review

In this paper, we present a method for predicting changes in Bitcoin and Ethereum prices utilizing Twitter data and Google Trends data. Bitcoin and Ethereum, the two largest cryptocurrencies in terms of market capitalization represent over \$160 billion dollars in combined value. However, both Bitcoin and Ethereum have experienced significant price swings on both daily and long term valuations. Twitter is increasingly used as a news source influencing purchase decisions by informing users of the currency and its increasing popularity. As a result, quickly understanding the impact of tweets on price direction can provide a purchasing and selling advantage to …


A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba Jul 2018

A Math Research Project Inspired By Twin Motherhood, Tiffany N. Kolba

Journal of Humanistic Mathematics

The phenomenon of twins, triplets, quadruplets, and other higher order multiples has fascinated humans for centuries and has even captured the attention of mathematicians who have sought to model the probabilities of multiple births. However, there has not been extensive research into the phenomenon of polyovulation, which is one of the biological mechanisms that produces multiple births. In this paper, I describe how my own experience becoming a mother to twins led me on a quest to better understand the scientific processes going on inside my own body and motivated me to conduct research on polyovulation frequencies. An overview of …


A Distance Based Method For Solving Multi-Objective Optimization Problems, Murshid Kamal, Syed Aqib Jalil, Syed Mohd Muneeb, Irfan Ali Jul 2018

A Distance Based Method For Solving Multi-Objective Optimization Problems, Murshid Kamal, Syed Aqib Jalil, Syed Mohd Muneeb, Irfan Ali

Journal of Modern Applied Statistical Methods

A new model for the weighted method of goal programming is proposed based on minimizing the distances between ideal objectives to feasible objective space. It provides the best compromised solution for Multi Objective Linear Programming Problems (MOLPP). The proposed model tackles MOLPP by solving a series of single objective sub-problems, where the objectives are transformed into constraints. The compromise solution so obtained may be improved by defining priorities in terms of the weight. A criterion is also proposed for deciding the best compromise solution. Applications of the algorithm are discussed for transportation and assignment problems involving multiple and conflicting objectives. …


Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum Jul 2018

Goalie Analytics: Statistical Evaluation Of Context-Specific Goalie Performance Measures In The National Hockey League, Marc Naples, Logan Gage, Amy Nussbaum

SMU Data Science Review

In this paper, we attempt to improve upon the classic formulation of save percentage in the NHL by controlling the context of the shots and use alternative measures than save percentage. In particular, we find save percentage to be both a weakly repeatable skill and predictor of future performance, and we seek other goalie performance calculations that are more robust. To do so, we use three primary tests to test intra-season consistency, intra-season predictability, and inter-season consistency, and extend the analysis to disentangle team effects on goalie statistics. We find that there are multiple ways to improve upon classic save …


Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman Jul 2018

Fuel Flow Reduction Impact Analysis Of Drag Reducing Film Applied To Aircraft Wings, Damon Resnick, Chris Donlan, Nimish Sakalle, Cody Pinkerman

SMU Data Science Review

In this paper, we present an analysis of flight data in order to determine whether the application of the Edge Aerodynamix Conformal Vortex Generator (CVG), applied to the wings of aircraft, reduces fuel flow during cruising conditions of flight. The CVG is a special treatment and film applied to the wings of an aircraft to protect the wings and reduce the non-laminar flow of air around the wings during flight. It is thought that by reducing the non-laminar flow or vortices around and directly behind the wings that an aircraft will move more smoothly through the air and provide a …


Data Center Application Security: Lateral Movement Detection Of Malware Using Behavioral Models, Harinder Pal Singh Bhasin, Elizabeth Ramsdell, Albert Alva, Rajiv Sreedhar, Medha Bhadkamkar Jul 2018

Data Center Application Security: Lateral Movement Detection Of Malware Using Behavioral Models, Harinder Pal Singh Bhasin, Elizabeth Ramsdell, Albert Alva, Rajiv Sreedhar, Medha Bhadkamkar

SMU Data Science Review

Data center security traditionally is implemented at the external network access points, i.e., the perimeter of the data center network, and focuses on preventing malicious software from entering the data center. However, these defenses do not cover all possible entry points for malicious software, and they are not 100% effective at preventing infiltration through the connection points. Therefore, security is required within the data center to detect malicious software activity including its lateral movement within the data center. In this paper, we present a machine learning-based network traffic analysis approach to detect the lateral movement of malicious software within the …


Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin Jul 2018

Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin

SMU Data Science Review

In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from [7]. We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization [22]. This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted …


Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis Jul 2018

Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis

SMU Data Science Review

A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.


Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin Jul 2018

Predicting Game Day Outcomes In National Football League Games, Josh Klein, Anna Frowein, Chris Irwin

SMU Data Science Review

In this paper, we present a model for predicting the game day outcomes of National Football League games. 3 of the most popular sources for game day predictions are analyzed for comparison. Player data and outcomes from previous games are used, but we also incorporate several weather factors into our models. Over 1,700 games were incorporated and 3 separate models are created using simple regression, principal component analysis, and a recursive model. We also discuss the ethicality of using data science techniques by individuals with the knowledge in order to gain an advantage over a population lacking this specialized training.


Estimation Of Finite Population Mean By Using Minimum And Maximum Values In Stratified Random Sampling, Umer Daraz, Javid Shabbir, Hina Khan Jul 2018

Estimation Of Finite Population Mean By Using Minimum And Maximum Values In Stratified Random Sampling, Umer Daraz, Javid Shabbir, Hina Khan

Journal of Modern Applied Statistical Methods

In this paper we have suggested an improved class of ratio type estimators in estimating the finite population mean when information on minimum and maximum values of the auxiliary variable is known. The properties of the suggested class of estimators in terms of bias and mean square error are obtained up to first order of approximation. Two data sets are used for efficiency comparisons.