Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

1,250 Full-Text Articles 2,102 Authors 352,857 Downloads 123 Institutions

All Articles in Statistical Models

Faceted Search

1,250 full-text articles. Page 1 of 43.

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda 2020 Southern Methodist University

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible ...


Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden 2020 Duquesne University

Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden

Biology and Medicine Through Mathematics Conference

No abstract provided.


Decision Tree For Predicting The Party Of Legislators, Afsana Mimi 2020 CUNY New York City College of Technology

Decision Tree For Predicting The Party Of Legislators, Afsana Mimi

Publications and Research

The motivation of the project is to identify the legislators who voted frequently against their party in terms of their roll call votes using Office of Clerk U.S. House of Representatives Data Sets collected in 2018 and 2019. We construct a model to predict the parties of legislators based on their votes. The method we used is Decision Tree from Data Mining. Python was used to collect raw data from internet, SAS was used to clean data, and all other calculations and graphical presentations are performed using the R software.


Ragweed And Sagebrush Pollen Can Distinguish Between Vegetation Types At Broad Spatial Scales, Hannah M. Carroll, Alan D. Wanamaker, Lynn G. Clark, Brian J. Wilsey 2020 Iowa State University

Ragweed And Sagebrush Pollen Can Distinguish Between Vegetation Types At Broad Spatial Scales, Hannah M. Carroll, Alan D. Wanamaker, Lynn G. Clark, Brian J. Wilsey

Ecology, Evolution and Organismal Biology Publications

Patterns of vegetation distribution at regional to subcontinental scales can inform understanding of climate. Delineating ecoregion boundaries over geologic time is complicated by the difficulty of distinguishing between prairie types at broad spatial scales using the pollen record. Pollen ratios are sometimes employed to distinguish between vegetation types, although their applicability is often limited to a geographic range. The Neotoma Paleoecology Database offers an unparalleled opportunity to synthesize a large number of pollen datasets. Ambrosia (ragweed) is a genus of mesic‐adapted species sensitive to summer moisture. Artemisia (sagebrush, wormwood, mugwort) is a genus of dry‐mesic‐adapted species resilient ...


Modeling Movement: A Machine-Learning Approach To Track Migration Routes After Displacement, Ethan Harrison 2020 William & Mary

Modeling Movement: A Machine-Learning Approach To Track Migration Routes After Displacement, Ethan Harrison

Undergraduate Honors Theses

Over the past decade, the number of individuals internally displaced by conflict (IDPs) has reached unprecedented levels. Humanitarian actors and first-responders face persistent information gaps in meeting the needs of these populations. Specifically, they face challenges in understanding where and how IDPs move after they are displaced, which is necessary to locate them in conflict-affected situations and provide them with life-saving assistance. In this paper, I propose a framework, using established machine-learning methods, to forecast the migration routes of these displaced populations (Chapter 1). In a case study of displacement in Yemen, my models predict 80% of IDPs' migration routes ...


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen 2020 Utah State University

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and ...


Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim 2020 Washington University in St. Louis

Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim

Engineering and Applied Science Theses & Dissertations

Electronic Health Records (EHR) are widely adopted and used throughout healthcare systems and are able to collect and store longitudinal information data that can be used to describe patient phenotypes. From the underlying data structures used in the EHR, discrete data can be extracted and analyzed to improve patient care and outcomes via tasks such as risk stratification and prospective disease management. Temporality in EHR is innately present given the nature of these data, however, and traditional classification models are limited in this context by the cross- sectional nature of training and prediction processes. Finding temporal patterns in EHR is ...


484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair 2020 SUNY Geneseo

484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair

GREAT Day

Ebola Virus Disease (EVD) is a rare but severe disease that is transmitted among humans through direct-contact with, and close proximity to, infected bodily fluids. From 2014-16, West Africa experienced the largest Ebola outbreak ever recorded, infecting over 28,000 people, and killing over 11,000. Although the symptoms of EVD are treatable, the disease can be extremely deadly, with an average of 50% EVD cases resulting in fatality. In areas where healthcare is scarce and vaccinations are not readily available, the practices of social distancing and self-quarantining have been shown to be highly effective in combating the spread of ...


483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal 2020 SUNY Geneseo

483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal

GREAT Day

Measles is a highly contagious disease, where large outbreaks arise by direct contact between susceptible (unvaccinated) and infectious individuals. Many Orthodox Jewish neighborhoods were affected by measles from 2018-2019. To quantify the vaccination effort on this susceptible population, a retrospective analysis was used to study the NYC and Rockland County populations using a differential equations model. A subsequent model, known as a realistically-structured network model, studied only the NYC population, in relation to typical household size. Vaccination strategies were applied to three cohorts: unvaccinated family members, members with 1 prior MMR dose, and members with 2 prior MMR doses. The ...


Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater 2020 Southern Methodist University

Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater

SMU Data Science Review

In this paper, historical data from a wholesale alcoholic beverage distributor was used to forecast sales demand. Demand forecasting is a vital part of the sale and distribution of many goods. Accurate forecasting can be used to optimize inventory, improve cash ow, and enhance customer service. However, demand forecasting is a challenging task due to the many unknowns that can impact sales, such as the weather and the state of the economy. While many studies focus effort on modeling consumer demand and endpoint retail sales, this study focused on demand forecasting from the distributor perspective. An ensemble approach was applied ...


An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard 2020 The University of Southern Mississippi

An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard

Master's Theses

Between 1997 and 2011, The National Marine Fisheries Service conducted 50 depletion experiments to estimate survey gear efficiency and stock density for Atlantic surfclam (Spisula solidissima) and ocean quahog (Arctica islandica) populations using commercial hydraulic dredges. The Patch Model was formulated to estimate gear efficiency and organism density from the data. The range of efficiencies estimated is substantial, leading to uncertainty in the application of these estimates in stock assessment. Analysis of depletion experiment simulations showed that uncertainty in the estimates of gear efficiency from depletion experiments was reduced by higher numbers of dredge tows per experiment, more tow overlap ...


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice 2020 University of South Carolina

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees ...


Population Modeling Of Tumor Growth Curves And The Reduced Gompertz Model Improve Prediction Of The Age Of Experimental Tumors, Cristina Vaghi, Anne Rodallec, Raphaelle Fanciullin, Joseph Ciccolini, Jonathan P. Mochel, Michalis Mastri, Clair Poignard, John M. L. Ebos, Sebastien Benzekry 2020 Inria Bordeaux Sud-Ouest

Population Modeling Of Tumor Growth Curves And The Reduced Gompertz Model Improve Prediction Of The Age Of Experimental Tumors, Cristina Vaghi, Anne Rodallec, Raphaelle Fanciullin, Joseph Ciccolini, Jonathan P. Mochel, Michalis Mastri, Clair Poignard, John M. L. Ebos, Sebastien Benzekry

Biomedical Sciences Publications

Tumor growth curves are classically modeled by means of ordinary differential equations. In analyzing the Gompertz model several studies have reported a striking correlation between the two parameters of the model, which could be used to reduce the dimensionality and improve predictive power. We analyzed tumor growth kinetics within the statistical framework of nonlinear mixed-effects (population approach). This allowed the simultaneous modeling of tumor dynamics and inter-animal variability. Experimental data comprised three animal models of breast and lung cancers, with 833 measurements in 94 animals. Candidate models of tumor growth included the exponential, logistic and Gompertz models. The exponential and ...


Do Workers Discriminate Against Their Out-Group Employers? Evidence From The Gig Economy, Sher Afghan Asad, Ritwik Banerjee, Joydeep Bhattacharya 2020 Iowa State University

Do Workers Discriminate Against Their Out-Group Employers? Evidence From The Gig Economy, Sher Afghan Asad, Ritwik Banerjee, Joydeep Bhattacharya

Economics Working Papers

We study possible worker-to-employer discrimination manifested via social preferences in an online labor market. Specifically, we ask, do workers exhibit positive social preferences for an out-race employer relative to an otherwise-identical, own-race one? We run a well-powered, model-based experiment wherein we recruit 6,000 workers from Amazon’s M-Turk platform for a real-effort task and randomly (and unobtrusively) reveal to them the racial identity of their non-fictitious employer. Strikingly, we find strong evidence of race-based altruism – white workers, even when they do not benefit personally, work relatively harder to generate more income for black employers. Self-declared white Republicans and Independents ...


Measuring Localization Confidence For Quantifying Accuracy And Heterogeneity In Single-Molecule Super-Resolution Microscopy, Hesam Mazidi, Tianben Ding, Arye Nehorai, Matthew D. Lew 2020 Washington University in St. Louis

Measuring Localization Confidence For Quantifying Accuracy And Heterogeneity In Single-Molecule Super-Resolution Microscopy, Hesam Mazidi, Tianben Ding, Arye Nehorai, Matthew D. Lew

Electrical & Systems Engineering Publications and Presentations

We present a computational method, termed Wasserstein-induced flux (WIF), to robustly quantify the accuracy of individual localizations within a single-molecule localization microscopy (SMLM) dataset without ground- truth knowledge of the sample. WIF relies on the observation that accurate localizations are stable with respect to an arbitrary computational perturbation. Inspired by optimal transport theory, we measure the stability of individual localizations and develop an efficient optimization algorithm to compute WIF. We demonstrate the advantage of WIF in accurately quantifying imaging artifacts in high-density reconstruction of a tubulin network. WIF represents an advance in quantifying systematic errors with unknown and complex distributions ...


A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone 2020 Kennesaw State University

A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone

Grey Literature from PhD Candidates

We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under the ROC curve, and KS statistic. By creating new ...


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone 2020 Kennesaw State University

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model ...


A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni 2020 Kennesaw State University

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Grey Literature from PhD Candidates

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank ...


Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni 2020 Kennesaw State University

Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni

Grey Literature from PhD Candidates

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross-validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are ...


Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, LaRoux Wallace 2020 Southern Methodist University

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace

SMU Data Science Review

In this paper, we present a quantitative approach to model the manufacturer’s suggested retail price (MSRP) for children’s doll- houses and establish relationships among key features that contribute most to establishing MSRP. Determination of the MSRP is a critical step in how consumers respond with their wallets when purchasing an item. KidKraft, a global leader in toys and juvenile products, sets MSRP subjectively using product experts. The process is arduous and time consuming requiring the focus of specialized resources and knowledge of the interaction between key attributes and their impact on consumer value. An accurate prediction of MSRP ...


Digital Commons powered by bepress