Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

10,945 Full-Text Articles 16,014 Authors 2,527,700 Downloads 222 Institutions

All Articles in Statistics and Probability

Faceted Search

10,945 full-text articles. Page 1 of 317.

Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus 2019 SDSMT

Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus

SDSU Data Science Symposium

Decision trees are a method commonly used in machine learning to either predict a categorical response or a continuous response variable. Once the tree partitions the space, the response is either determined by the majority vote – classification trees, or by averaging the response values – regression trees. This research builds a standard regression tree and then instead of averaging the responses, we train a neural network to determine the response value. We have found that our approach typically increases the predicative capability of the decision tree. We have 2 demonstrations of this approach that we wish to present as a poster ...


Multi-Linear Algebraic Eigendecompositions And Their Application In Data Science, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr. 2019 SDSMT

Multi-Linear Algebraic Eigendecompositions And Their Application In Data Science, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr.

SDSU Data Science Symposium

Multi-dimensional data analysis has seen increased interest in recent years. With more and more data arriving as 2-dimensional arrays (images) as opposed to 1-dimensioanl arrays (signals), new methods for dimensionality reduction, data analysis, and machine learning have been pursued. Most notably have been the Canonical Decompositions/Parallel Factors (commonly referred to as CP) and Tucker decompositions (commonly regarded as a high order SVD: HOSVD). In the current research we present an alternate method for computing singular value and eigenvalue decompositions on multi-way data through an algebra of circulants and illustrate their application to two well-known machine learning methods: Multi-Linear Principal ...


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson 2019 Sanford Health

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, HDL ...


Volleyball Overhead Swing Volume And Injury Frequency Over The Course Of A Season, Heather Wolfe, Katherine Poole, Alejandro G. Villasante Tezanos, Robert A. English, Timothy L. Uhl 2019 Lincoln University

Volleyball Overhead Swing Volume And Injury Frequency Over The Course Of A Season, Heather Wolfe, Katherine Poole, Alejandro G. Villasante Tezanos, Robert A. English, Timothy L. Uhl

Statistics Faculty Publications

Background: Overuse injuries are common in volleyball; however, few studies exist that quantify the workload of a volleyball athlete in a season. The relationship between workload and shoulder injury has not been extensively studied in women's collegiate volleyball athletes.

Hypothesis/Purpose: This study aims to quantify shoulder workloads by counting overhead swings during practice and matches. The purpose of the current study is to provide a complete depiction of typical overhead swings, serves, and hits, which occur in both practices and matches. The primary hypothesis was that significantly more swings will occur in practices compared to matches. The secondary ...


One-Dimensional Excited Random Walk With Unboundedly Many Excitations Per Site, Omar Chakhtoun 2019 The Graduate Center, City University of New York

One-Dimensional Excited Random Walk With Unboundedly Many Excitations Per Site, Omar Chakhtoun

All Dissertations, Theses, and Capstone Projects

We study a discrete time excited random walk on the integers lattice requiring a tail decay estimate on the number of excitations per site and extend the existing framework, methods, and results to a wider class of excited random walks.

We give criteria for recurrence versus transience, ballisticity versus zero linear speed, completely classify limit laws in the transient regime, and establish a functional limit laws in the recurrence regime.


Local Lagged Adapted Generalized Method Of Moments-An Innovative Estimation And Forecasting Approach And Its Applications.Pdf, Olusegun M. Otunuga 2019 Marshall University

Local Lagged Adapted Generalized Method Of Moments-An Innovative Estimation And Forecasting Approach And Its Applications.Pdf, Olusegun M. Otunuga

Olusegun Michael Otunuga

In this work, an attempt is made to apply the Local Lagged Adapted Generalized Method of Moments (LLGMM) to estimate state and parameters in stochastic differential dynamic models. The development of LLGMM is motivated by parameter and state estimation problems in continuous-time nonlinear and non-stationary stochastic dynamic model validation problems in biological, chemical, engineering, energy commodity markets, financial, medical, physical and social sciences. The byproducts of this innovative approach (LLGMM) are the balance between model specification and model prescription of continuous-time dynamic process and the development of discrete-time interconnected dynamic model of local sample mean and variance statistic process (DTIDMLSMVSP ...


Level Crossing Simulation Of A Queueing Model, Zhanxuan Ding 2019 University of Windsor

Level Crossing Simulation Of A Queueing Model, Zhanxuan Ding

Major Papers

Simulation of the level crossing method will be used to find approximations of the distribution of the workload for several queueing models. In particular, three different type of queueing models, with different methods of handling workload bound thresholds, will be considered. Simulation applied to workload bound thresholds is new work.


Infinite Sums, Products, And Urn Models, Yiyan Ni 2019 University of Windsor

Infinite Sums, Products, And Urn Models, Yiyan Ni

Major Papers

This paper considers an urn and its evolution in discrete time steps. The

urn initially has two different colored balls(blue and red). We discuss different

cases where k blue balls (k = 1, 2, 3, ... ) will be added (or removed) at every

step if a blue ball is withdrawn, based on the goal of eventually withdrawing a

red ball P(R eventually). We compute the probability of eventually withdrawing

a red ball with two different methods–one using infinite sums and other using

infinite products. One advantage of this is that we can obtain P(R eventually) in

a complex ...


An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine 2019 Southern Methodist University

An Evaluation Of Training Size Impact On Validation Accuracy For Optimized Convolutional Neural Networks, Jostein Barry-Straume, Adam Tschannen, Daniel W. Engels, Edward Fine

SMU Data Science Review

In this paper, we present an evaluation of training size impact on validation accuracy for an optimized Convolutional Neural Network (CNN). CNNs are currently the state-of-the-art architecture for object classification tasks. We used Amazon’s machine learning ecosystem to train and test 648 models to find the optimal hyperparameters with which to apply a CNN towards the Fashion-MNIST (Mixed National Institute of Standards and Technology) dataset. We were able to realize a validation accuracy of 90% by using only 40% of the original data. We found that hidden layers appear to have had zero impact on validation accuracy, whereas the ...


Comparisons Of Performance Between Quantum And Classical Machine Learning, Christopher Havenstein, Damarcus Thomas, Swami Chandrasekaran 2019 Southern Methodist University

Comparisons Of Performance Between Quantum And Classical Machine Learning, Christopher Havenstein, Damarcus Thomas, Swami Chandrasekaran

SMU Data Science Review

In this paper, we present a performance comparison of machine learning algorithms executed on traditional and quantum computers. Quantum computing has potential of achieving incredible results for certain types of problems, and we explore if it can be applied to machine learning. First, we identified quantum machine learning algorithms with reproducible code and had classical machine learning counterparts. Then, we found relevant data sets with which we tested the comparable quantum and classical machine learning algorithm's performance. We evaluated performance with algorithm execution time and accuracy. We found that quantum variational support vector machines in some cases had higher ...


Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri KC, Ginger m. Holt 2019 Southern Methodist University

Political Profiling Using Feature Engineering And Nlp, Chiranjeevi Mallavarapu, Ramya Mandava, Sabitri Kc, Ginger M. Holt

SMU Data Science Review

Public surveys are predominantly used when forecasting election outcomes. While the approach has had significant successes, the surveys have had their failures as well, especially when it comes to accuracy and reliability. As a result, it becomes challenging for political parties to spend their campaign budgets in a manner that facilitates the growth of a favorable and verifiable public opinion. Consequently, it is critical that a more accurate methodology to predict election outcome is developed. In this paper, we present an evaluation of the impact of utilizing dynamic public data on predicting the outcome of elections. Our model yielded a ...


Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick McDevitt, Preeti Swaminathan, Raghuram Srinivas 2019 Southern Methodist University

Pedestrian Safety -- Fundamental To A Walkable City, Joshua Herrera, Patrick Mcdevitt, Preeti Swaminathan, Raghuram Srinivas

SMU Data Science Review

In this paper, we present a method to identify urban areas with a higher likelihood of pedestrian safety related events. Pedestrian safety related events are pedestrian-vehicle interactions that result in fatalities, injuries, accidents without injury, or near--misses between pedestrians and vehicles. To develop a solution to this problem of identifying likely event locations, we assemble data, primarily from the City of Cincinnati and Hamilton County, that include safety reports from a five year period, geographic information for these events, citizen survey of pedestrian reported concerns, non-emergency requests for service for any cause in the city, property values and public transportation ...


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater 2019 Southern Methodist University

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide ...


Improving Gas Well Economics With Intelligent Plunger Lift Optimization Techniques, Atsu Atakpa, Emmanuel Farrugia, Ryan Tyree, Daniel W. Engels, Charles Sparks 2019 Southern Methodist University

Improving Gas Well Economics With Intelligent Plunger Lift Optimization Techniques, Atsu Atakpa, Emmanuel Farrugia, Ryan Tyree, Daniel W. Engels, Charles Sparks

SMU Data Science Review

In this paper, we present an approach to reducing bottom hole plunger dwell time for artificial lift systems. Lift systems are used in a process to remove contaminants from a natural gas well. A plunger is a mechanical device used to deliquefy natural gas wells by removing contaminants in the form of water, oil, wax, and sand from the wellbore. These contaminants decrease bottom-hole pressure which in turn hampers gas production by forming a physical barrier within the well tubing. As the plunger descends through the well it emits sounds which are recorded at the surface by an echo-meter that ...


Gene Co-Expression Networks Analysis Reveal Novel Molecular Endotypes In Alpha-1 Antitrypsin Deficiency, Jen-hwa Chu, Wenlan Zang 2019 Yale University

Gene Co-Expression Networks Analysis Reveal Novel Molecular Endotypes In Alpha-1 Antitrypsin Deficiency, Jen-Hwa Chu, Wenlan Zang

Yale Day of Data

Rationale:Alpha-1 antitrypsin deficiency (AATD) is a genetic condition that predisposes to early onset pulmonary emphysema and airways obstruction. The exact mechanism through which AATD leads to lung disease is incompletely understood.

Objectives: To investigate the effect of AAT genotype and augmentation therapy on bronchoalveolar lavage (BAL) and peripheral blood mononuclear cells (PBMC) transcriptome, while examining the link between gene expression profiles, and clinical features of AATD.

Methods: We performed RNA-Seq on RNA extracted from BAL and PBMC on samples obtained from 89 AATD patients enrolled in the Genomic Research in Alpha-1 Antitrypsin Deficiency and Sarcoidosis (GRADS) study. Differential gene ...


Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan 2019 Yale University School of Medicine

Non-Invasive Analysis Of The Sputum Transcriptome Discriminates Clinical Phenotypes Of Asthma, Xiting Yan

Yale Day of Data

Whole transcriptome wide gene expression profiles in the sputum and circulation from 100 asthma patients were measured using the Affymetrix HuGene 1.0ST arrays. Unsupervised clustering analysis based on pathways from KEGG were used to identify TEA clusters of patients from the sputum gene expression profiles. The identified TEA clusters have significantly different pre-bronchodilator FEV1, bronchodilator responsiveness, exhaled nitric oxide levels, history of hospitalization for asthma and history of intubation. Evaluation of TEA clusters in children from Asthma BRIDGE cohort confirmed the identified differences in intubation and hospitalization. Furthermore, evaluation of the TH2 gene signatures suggested a much lower prevalence ...


A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan 2019 Yale University School of Public Health

A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan

Yale Day of Data

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both ...


A Latent Spatial Piecewise Exponential Model For Interval-Censored Disease Surveillance Data With Time-Varying Covariates And Misclassification, Yaxuan Sun, Chong Wang, William Q. Meeker, Max Morris, Marisa L. Rotolo, Jeffery Zimmerman 2019 Iowa State University

A Latent Spatial Piecewise Exponential Model For Interval-Censored Disease Surveillance Data With Time-Varying Covariates And Misclassification, Yaxuan Sun, Chong Wang, William Q. Meeker, Max Morris, Marisa L. Rotolo, Jeffery Zimmerman

Veterinary Diagnostic and Production Animal Medicine Publications

Understanding the dynamics of disease spread is critical to achieving effective animal disease surveillance. A major challenge in modeling disease spread is the fact that the true disease status cannot be known with certainty due to the imperfect diagnostic sensitivity and specificity of the tests used to generate the disease surveillance data. Other challenges in modeling such data include interval censoring, relating disease spread to distance between units, and incorporating time-varying covariates, which are the unobserved disease statuses. We propose a latent spatial piecewise exponential model (PEX) with misclassification of events to address the challenges in modeling such disease surveillance ...


Collaborative Efforts To Forecast Seasonal Influenza In The United States, 2015–2016, Craig J. McGowan, Jarad Niemi, Nehemias Ulloa, Katie Will, et al. 2019 Centers for Disease Control and Prevention

Collaborative Efforts To Forecast Seasonal Influenza In The United States, 2015–2016, Craig J. Mcgowan, Jarad Niemi, Nehemias Ulloa, Katie Will, Et Al.

Statistics Publications

Since 2013, the Centers for Disease Control and Prevention (CDC) has hosted an annual influenza season forecasting challenge. The 2015–2016 challenge consisted of weekly probabilistic forecasts of multiple targets, including fourteen models submitted by eleven teams. Forecast skill was evaluated using a modified logarithmic score. We averaged submitted forecasts into a mean ensemble model and compared them against predictions based on historical trends. Forecast skill was highest for seasonal peak intensity and short-term forecasts, while forecast skill for timing of season onset and peak week was generally low. Higher forecast skill was associated with team participation in previous influenza ...


Toward Using High-Frequency Coastal Radars For Calibration Of S-Ais Based Ocean Vessel Tracking Models, Ben Freidrich 2019 Wilfrid Laurier University

Toward Using High-Frequency Coastal Radars For Calibration Of S-Ais Based Ocean Vessel Tracking Models, Ben Freidrich

Theses and Dissertations (Comprehensive)

Most of the world relies on ships for transportation, shipping, and tourism. Automatic Identification System messages are transmitted from ships and provide a wealth of positional data on these open ocean vessels. This data is being utilized to determine the optimal path for ships, as well as predicting where a ship may be going in the near future. It has only been in the past decade that Automatic Identification Systems (AIS) signals have been easily received with satellites (S-AIS) so there have been few studies that look at using available information and pairing it with the new abundance of ship ...


Digital Commons powered by bepress