Open Access. Powered by Scholars. Published by Universities.®

2019

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 25 of 25

Full-Text Articles in Other Statistics and Probability

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen Dec 2019

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen

SMU Data Science Review

This paper presents a comparative study on machine learning methods as they are applied to product associations, future purchase predictions, and predictions of customer churn in aftermarket operations. Association rules are used help to identify patterns across products and find correlations in customer purchase behaviour. Studying customer behaviour as it pertains to Recency, Frequency, and Monetary Value (RFM) helps inform customer segmentation and identifies customers with propensity to churn. Lastly, Flowserve’s customer purchase history enables the establishment of churn thresholds for each customer group and assists in constructing a model to predict future churners. The aim of this model is …


Ordinal Hyperplane Loss, Bob Vanderheyden Dec 2019

Ordinal Hyperplane Loss, Bob Vanderheyden

Doctor of Data Science and Analytics Dissertations

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …


Implications Of The Modifiable Areal Unit Problem For Wildfire Analyses, Timothy P. Nagle-Mcnaughton, Xi Gong, Jose A. Constantine Dec 2019

Implications Of The Modifiable Areal Unit Problem For Wildfire Analyses, Timothy P. Nagle-Mcnaughton, Xi Gong, Jose A. Constantine

Geography and Environmental Studies Faculty Publications

Wildfires pose a danger to both ecologies and communities. To this end, many large-scale analyses of wildfire patterns and behavior rely on the aggregation of point data to polygons, typically those based on distinct disparate ecological areas. However, the sizes, shapes, andorientations of the polygons to which data are aggregated are not neutral factors in the resulting analysis. The influence of the aggregation polygons on calculated results is known as the modifiable areal unit problem (MAUP), which is well-documented in the spatial statistics literature. Despite the documentation of the MAUP, relatively few wildfire studies consider the effects of the MAUP …


On The Sparre-Andersen Risk Models, Ruixi Zhang Oct 2019

On The Sparre-Andersen Risk Models, Ruixi Zhang

Electronic Thesis and Dissertation Repository

This thesis develops several strategies for calculating ruin-related quantities for a variety of extended risk models. We focus on the Sparre-Andersen risk model, also known as the renewal risk model. The idea of arbitrary distribution for the waiting time between claim payments arose in the 1950’s from the collective risk theory, and received many extensions and modifications in recent years. Our goal is to tackle model assumptions that are either too relaxed for traditional methods to apply, or so complicated that elaborate algebraic tools are needed to obtain explicit solutions.

In Chapter 2, we consider a Lévy risk process and …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden Aug 2019

Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden

Electronic Theses and Dissertations

This thesis looks at the corequisite developmental math program at East Tennessee State University (ETSU) and compares the effectiveness to the previous developmental math program by comparing the student outcomes in MATH 1530. MATH 1530 is a non-calculus based statistic and probability course that satisfies most majors’ general education math requirements. ETSU sees approximately 1,000 students a year pass through MATH 1530 which is around 6.7% of the total enrollment at ETSU[9]. We are interested in the last five years of the developmental math program before it was changed to corequisite developmental math and the first five years of corequisite …


Mathematics Versus Statistics, Mindy B. Capaldi Jul 2019

Mathematics Versus Statistics, Mindy B. Capaldi

Journal of Humanistic Mathematics

Mathematics and statistics are both important and useful subjects, but the former has maintained prominence in the American education system. On the other hand, statistics is more prevalent in daily life and is an increasingly marketable subject to know. This article gives a personal history of one mathematician’s bumpy road to learning and teaching statistics. Additionally, arguments for how and why to include statistics in the K-12 and college curricula are provided.


Some Recent Developments On Pareto-Optimal Reinsurance, Wenjun Jiang Jul 2019

Some Recent Developments On Pareto-Optimal Reinsurance, Wenjun Jiang

Electronic Thesis and Dissertation Repository

This thesis focuses on developing Pareto-optimal reinsurance policy which considers the interests of both the insurer and the reinsurer. The optimal insurance/reinsurance design has been extensively studied in actuarial science literature, while in early years most studies were concentrated on optimizing the insurer’s interests. However, as early as 1960s, Borch argued that “an agreement which is quite attractive to one party may not be acceptable to its counterparty” and he pioneered the study on “fair” risk sharing between the insurer and the reinsurer. Quite recently, the question of how to strike a balance in risk sharing between an insurer and …


Probabilistic Modeling Of Personalized Drug Combinations From Integrated Chemical Screen And Molecular Data In Sarcoma, Noah E. Berlow, Rishi Rikhi, Mathew Geltzeiler, Jinu Abraham, Matthew N. Svalina, Lara E. Davis, Erin Wise, Maria Mancini, Jonathan Noujaim, Atiya Mansoor, Michael J. Quist, Kevin L. Matlock, Martin W. Goros, Brian S. Hernandez, Yee C. Doung, Khin Thway, Tomohide Tsukahara, Jun Nishio, Elaine T. Huang, Susan Airhart, Carol J. Bult, Regina Gandour-Edwards, Robert G. Maki, Robin L. Jones, Joel E. Michalek, Milan Milovancev, Souparno Ghosh, Ranadip Pal, Charles Keller Jun 2019

Probabilistic Modeling Of Personalized Drug Combinations From Integrated Chemical Screen And Molecular Data In Sarcoma, Noah E. Berlow, Rishi Rikhi, Mathew Geltzeiler, Jinu Abraham, Matthew N. Svalina, Lara E. Davis, Erin Wise, Maria Mancini, Jonathan Noujaim, Atiya Mansoor, Michael J. Quist, Kevin L. Matlock, Martin W. Goros, Brian S. Hernandez, Yee C. Doung, Khin Thway, Tomohide Tsukahara, Jun Nishio, Elaine T. Huang, Susan Airhart, Carol J. Bult, Regina Gandour-Edwards, Robert G. Maki, Robin L. Jones, Joel E. Michalek, Milan Milovancev, Souparno Ghosh, Ranadip Pal, Charles Keller

Department of Statistics: Faculty Publications

Background: Cancer patients with advanced disease routinely exhaust available clinical regimens and lack actionable genomic medicine results, leaving a large patient population without effective treatments options when their disease inevitably progresses. To address the unmet clinical need for evidence-based therapy assignment when standard clinical approaches have failed, we have developed a probabilistic computational modeling approach which integrates molecular sequencing data with functional assay data to develop patient-specific combination cancer treatments. Methods: Tissue taken from a murine model of alveolar rhabdomyosarcoma was used to perform single agent drug screening and DNA/RNA sequencing experiments; results integrated via our computational modeling approach identified …


Cs + Sociology: Using Big Data To Identify And Understand Educational Inequality In America (1), Joseph Cleary, Elin Waring Jun 2019

Cs + Sociology: Using Big Data To Identify And Understand Educational Inequality In America (1), Joseph Cleary, Elin Waring

Open Educational Resources

This is the first of two lessons/labs for teaching and learning of computer science and sociology. Either and be used on their own or they can be used in sequence, in which case this should be used first.

Students will develop CS skills and behaviors including but not limited to: learning what an API is, learning how to access and utilize data on an API, and developing their R coding skills and knowledge. Students will also learn basic, but important, sociological principles such as how poverty is related to educational opportunities in America. Although prior knowledge of CS and sociology …


Development Of A School Boredom Proneness Scale For Children, Taylor Carrington May 2019

Development Of A School Boredom Proneness Scale For Children, Taylor Carrington

Educational Specialist, 2009-2019

One common phrase heard from students is, “I’m bored.” However, there is no real understanding of what this actually means. In this study, elementary-age students were asked to respond to a newly developed School Boredom Proneness Scale (SBPS) including questions relating to a five-factor model of boredom. Students were also asked to rate how often they become bored at school and how bored they seem compared to classmates. In addition to student responses, parents and teachers were asked to rate how bored they thought the student was, and teachers were additionally asked to rate students’ level of work completion. The …


Valuation And Risk Management Of Some Longevity And P&C Insurance Products, Yixing Zhao Apr 2019

Valuation And Risk Management Of Some Longevity And P&C Insurance Products, Yixing Zhao

Electronic Thesis and Dissertation Repository

Numerous insurance products linked to risky assets have emerged rapidly in the last couple of decades. These products have option-embedded features and typically involve at least two risk factors, namely interest and mortality risks. The need for models to capture risk factors' behaviours accurately is enormous and critical for insurance companies. The primary objective of this thesis is to develop pricing and hedging frameworks for option-embedded longevity products addressing correlated risk factors. Various methods are employed to facilitate the computation of prices and risk measures of longevity products including those with maturity benefits. Furthermore, in order to be prepared for …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Informative Group Testing For Multiplex Assays, Christopher R. Bilder, Joshua M. Tebbs, Christopher S. Mcmahan Mar 2019

Informative Group Testing For Multiplex Assays, Christopher R. Bilder, Joshua M. Tebbs, Christopher S. Mcmahan

Department of Statistics: Faculty Publications

Infectious disease testing frequently takes advantage of two tools–group testing and multiplex assays–to make testing timely and cost effective. Until the work of Tebbs et al. (2013) and Hou et al. (2017), there was no research available to understand how best to apply these tools simultaneously. This recent work focused on applications where each individual is considered to be identical in terms of the probability of disease. However, risk-factor information, such as past behavior and presence of symptoms, is very often available on each individual to allow one to estimate individual-specific probabilities. The purpose of our paper is to propose …


Functional Random Forest With Applications In Dose-Response Predictions, Raziur Rahman, Saugato Rahman Dhruba, Souparno Ghosh, Ranadip Pal Feb 2019

Functional Random Forest With Applications In Dose-Response Predictions, Raziur Rahman, Saugato Rahman Dhruba, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Drug sensitivity prediction for individual tumors is a significant challenge in personalized medicine. Current modeling approaches consider prediction of a single metric of the drug response curve such as AUC or IC50. However, the single summary metric of a dose-response curve fails to provide the entire drug sensitivity profile which can be used to design the optimal dose for a patient. In this article, we assess the problem of predicting the complete dose-response curve based on genetic characterizations. We propose an enhancement to the popular ensemble-based Random Forests approach that can directly predict the entire functional profile of …


Health Risk Tolerance As A Key Determinant Of (Un)Willingness To Behavior Change: Conceptualization And Scale Development, Hyoyeun Jun, Yan Jin Jan 2019

Health Risk Tolerance As A Key Determinant Of (Un)Willingness To Behavior Change: Conceptualization And Scale Development, Hyoyeun Jun, Yan Jin

International Crisis and Risk Communication Conference

After the study of testing determinants of risk tolerance affecting information sharing, this study was conducted as a second step to actually develop the scale for risk tolerance. Firstly, this study followed qualitative steps, such as in-depth interview and focus group, to capture how public describes the situation when they are tolerating the risk, when they knew what the recommended behavior is to relieve the risk. Secondly, this study collected 1000 U.S. public sample for the survey questionnaire that are the items generated from the qualitative steps.


Cost-Effective Surveillance For Infectious Diseases Through Specimen Pooling And Multiplex Assays, Christopher Bilder, Joshua Tebbs, Christopher Mcmahan Jan 2019

Cost-Effective Surveillance For Infectious Diseases Through Specimen Pooling And Multiplex Assays, Christopher Bilder, Joshua Tebbs, Christopher Mcmahan

Department of Statistics: Faculty Publications

To develop specimen pooling algorithms that reduce the number of tests needed to test individuals for infectious diseases with multiplex assays.


Genomic Prediction Using Canopy Coverage Image And Genotypic Information In Soybean Via A Hybrid Model, Reka Howard, Diego Jarquin Jan 2019

Genomic Prediction Using Canopy Coverage Image And Genotypic Information In Soybean Via A Hybrid Model, Reka Howard, Diego Jarquin

Department of Statistics: Faculty Publications

Prediction techniques are important in plant breeding as they provide a tool for selection that is more efficient and economical than traditional phenotypic and pedigree based selection. The conventional genomic prediction models include molecular marker information to predict the phenotype. With the development of new phenomics techniques we have the opportunity to collect image data on the plants, and extend the traditional genomic prediction models where we incorporate diverse set of information collected on the plants. In our research, we developed a hybrid matrix model that incorporates molecular marker and canopy coverage information as a weighted linear combination to predict …


Post-Er Stress Biogenesis Of Golgi Is Governed By Giantin, Cole P. Frisbie, Alexander Y. Lushnikov, Alexey V. Krasnoslobodtsev, Jean-Jack Riethoven, Jennifer L. Clarke, Elena I. Stepchenkova, Armen Petrosyan Jan 2019

Post-Er Stress Biogenesis Of Golgi Is Governed By Giantin, Cole P. Frisbie, Alexander Y. Lushnikov, Alexey V. Krasnoslobodtsev, Jean-Jack Riethoven, Jennifer L. Clarke, Elena I. Stepchenkova, Armen Petrosyan

Department of Statistics: Faculty Publications

Background: The Golgi apparatus undergoes disorganization in response to stress, but it is able to restore compact and perinuclear structure under recovery. This self-organization mechanism is significant for cellular homeostasis, but remains mostly elusive, as does the role of giantin, the largest Golgi matrix dimeric protein. Methods: In HeLa and different prostate cancer cells, we used the model of cellular stress induced by Brefeldin A (BFA). The conformational structure of giantin was assessed by proximity ligation assay and atomic force microscopy. The post-BFA distribution of Golgi resident enzymes was examined by 3D SIM high-resolution microscopy. Results: We detected that giantin …


Recursive Model For Dose-Time Responses In Pharmacological Studies, Saugato Rahman Dhruba, Aminur Rahman, Raziur Rahman, Souparno Ghosh, Ranadip Pal Jan 2019

Recursive Model For Dose-Time Responses In Pharmacological Studies, Saugato Rahman Dhruba, Aminur Rahman, Raziur Rahman, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Background: Clinical studies often track dose-response curves of subjects over time. One can easily model the dose-response curve at each time point with Hill equation, but such a model fails to capture the temporal evolution of the curves. On the other hand, one can use Gompertz equation to model the temporal behaviors at each dose without capturing the evolution of time curves across dosage

Results: In this article, we propose a parametric model for dose-time responses that follows Gompertz law in time and Hill equation across dose approximately. We derive a recursion relation for dose-response curves over time capturing the …


Using Neural Networks To Classify Discrete Circular Probability Distributions, Madelyn Gaumer Jan 2019

Using Neural Networks To Classify Discrete Circular Probability Distributions, Madelyn Gaumer

HMC Senior Theses

Given the rise in the application of neural networks to all sorts of interesting problems, it seems natural to apply them to statistical tests. This senior thesis studies whether neural networks built to classify discrete circular probability distributions can outperform a class of well-known statistical tests for uniformity for discrete circular data that includes the Rayleigh Test1, the Watson Test2, and the Ajne Test3. Each neural network used is relatively small with no more than 3 layers: an input layer taking in discrete data sets on a circle, a hidden layer, and an output …


The Dark Sky Character Of Archaeological Landscapes: Cultural Meaning And Conservation Strategies, Frank Prendergast Jan 2019

The Dark Sky Character Of Archaeological Landscapes: Cultural Meaning And Conservation Strategies, Frank Prendergast

Book/Book Chapter

This paper presents the first ever study of light pollution at selected Irish prehistoric archaeological landscapes. The concepts of cosmology and landscape are first briefly described and followed by a summary of early human settlement of the island. Building on this, the extant corpus of early prehistoric megalithic burial tombs is illustrated to show their contrasting distribution patterns and typology. Analysis of tomb locations using nearest-neighbour statistical methods reveals evidence of intentional clustering. Further geo-statistical analysis identifies the geographical locations and the density ranking of these nucleated clusters - a feature especially evident in the passage tomb tradition on this …


Design Of Experiment And Analysis Techniques For Fuel Consumption Data Using Heavy-Duty Diesel Vehicles And On-Road Testing, Sarah Ann Mills Jan 2019

Design Of Experiment And Analysis Techniques For Fuel Consumption Data Using Heavy-Duty Diesel Vehicles And On-Road Testing, Sarah Ann Mills

Graduate Theses, Dissertations, and Problem Reports

Chassis dynamometer and on-road testing are usually employed to test vehicle operation. Testing on a chassis dynamometer reduces data variability compared to on-road testing due to the controlled environment but it does not account for other important variables that affects real-world vehicle operation. This study used on-road testing to investigate the differences between two test fuels under real-world conditions. Three heavy-duty diesel vehicles were driven on different routes for a period of three months. Each vehicle was instrumented with flow meters to gather fuel consumption data, which was then compared to the fuel rate broadcasted by the engine control unit …


Global Warming Statistical Analysis, Jared Skinner Jan 2019

Global Warming Statistical Analysis, Jared Skinner

Williams Honors College, Honors Research Projects

This paper will investigate global warming and its effects on natural disasters. I will review the historic movements of climate change and activism, as well as the current discussions surrounding global warming. Secondly, I will examine various datasets, paying attention to the severity and frequency of specific natural disasters. I will then touch briefly on the topic of catastrophe modeling as it relates to the increased risk and losses associated with the discussed natural disasters and how those put the problem of global warming in a framework which financial and government institutions can grasp. I will also be analyzing economic …


Regression Tree Construction For Reinforcement Learning Problems With A General Action Space, Anthony S. Bush Jr Jan 2019

Regression Tree Construction For Reinforcement Learning Problems With A General Action Space, Anthony S. Bush Jr

Electronic Theses and Dissertations

Part of the implementation of Reinforcement Learning is constructing a regression of values against states and actions and using that regression model to optimize over actions for a given state. One such common regression technique is that of a decision tree; or in the case of continuous input, a regression tree. In such a case, we fix the states and optimize over actions; however, standard regression trees do not easily optimize over a subset of the input variables\cite{Card1993}. The technique we propose in this thesis is a hybrid of regression trees and kernel regression. First, a regression tree splits over …