Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2020

Discipline
Institution
Keyword
Publication
Publication Type

Articles 31 - 60 of 84

Full-Text Articles in Statistical Models

Constitutive Model Of Lateral Unloading Creep Of Soft Soil Under Excess Pore Water Pressure, Wei Huang, Kejun Wen, Xiaojia Deng, Junjie Li, Zhijian Jiang, Yang Li, Lin Li, Farshad Amini Jun 2020

Constitutive Model Of Lateral Unloading Creep Of Soft Soil Under Excess Pore Water Pressure, Wei Huang, Kejun Wen, Xiaojia Deng, Junjie Li, Zhijian Jiang, Yang Li, Lin Li, Farshad Amini

Civil and Architectural Engineering Faculty Research

Presented in this paper is a study on the lateral unloading creep tests under different excess pore water pressures. The marine sedimentary soft soil in Shenzhen, China, was selected in this study. The results show that the excess pore water pressure plays a significant role in enhancing the unloading creep of soft soil. Higher excess pore water pressure brings more obvious creep deformation of soft soil and lower ultimate failure load. Meanwhile, the viscoelastic and the viscoplastic modulus of soft soil were found to exponentially decline with creep time. A modified merchant model and a combined model of the modified …


Edge-Cloud Iot Data Analytics: Intelligence At The Edge With Deep Learning, Ananda Mohon M. Ghosh May 2020

Edge-Cloud Iot Data Analytics: Intelligence At The Edge With Deep Learning, Ananda Mohon M. Ghosh

Electronic Thesis and Dissertation Repository

Rapid growth in numbers of connected devices, including sensors, mobile, wearable, and other Internet of Things (IoT) devices, is creating an explosion of data that are moving across the network. To carry out machine learning (ML), IoT data are typically transferred to the cloud or another centralized system for storage and processing; however, this causes latencies and increases network traffic. Edge computing has the potential to remedy those issues by moving computation closer to the network edge and data sources. On the other hand, edge computing is limited in terms of computational power and thus is not well suited for …


Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions …


Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden May 2020

Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden

Biology and Medicine Through Mathematics Conference

No abstract provided.


The Primary Volatile Composition Of Comet C/2015 Er61 (Panstarrs), Aaron Butler May 2020

The Primary Volatile Composition Of Comet C/2015 Er61 (Panstarrs), Aaron Butler

Theses

In the outer edges of the solar system exist two regions: the Kuiper belt and Oort cloud. These two regions have a high amount of icy bodies (comets) orbiting the Sun. Comets located within the Oort cloud and Kuiper belt contain an ancient codex to the solar systems contents, before the formation of our solar system. Presented are near-infrared, high-resolution (λ/Δλ ~40000) data obtained from the immersion-grating echelle spectrograph iSHELL at the 3m NASA Infrared Telescope Facility (IRTF) in Maunakea, Hawaii of the Oort cloud comet C/2015 ER61 (PANSTARRS). Observations took place on April 15 and 17 in 2017 while …


Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang May 2020

Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang

LSU Doctoral Dissertations

Large volumes of temporal event data, such as online check-ins and electronic records of hospital admissions, are becoming increasingly available in a wide variety of applications including healthcare analytics, smart cities, and social network analysis. Those temporal events are often asynchronous, interdependent, and exhibiting self-exciting properties. For example, in the patient's diagnosis events, the elevated risk exists for a patient that has been recently at risk. Machine learning that leverages event sequence data can improve the prediction accuracy of future events and provide valuable services. For example, in e-commerce and network traffic diagnosis, the analysis of user activities can be …


Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters May 2020

Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters

Masters Theses, 2020-current

American ginseng (Panax quinquefolius) is a well-known and sought-after medicinal plant native to North America that is facing increased threat of extinction due to overharvesting, herbivory, and habitat loss. Species distribution and habitat suitability models may be valuable to landowners interested in sustainable harvest or to institutions interested in the conservation and restoration of the species. With unequal sampling efforts across a region of interest, it is likely that some locations with appropriate habitat may be misrepresented in model predictions. This study refined a state-derived species distribution model for ginseng through increased sampling effort across the Cumberland Plateau …


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and economics …


Analyzing Competitive Balance In Professional Sport, Kevin Alwell May 2020

Analyzing Competitive Balance In Professional Sport, Kevin Alwell

Honors Scholar Theses

In this paper we review several measures to statistically analyze competitive balance and report which leagues have a wider variance of performance amongst its competitors. Each league seeks to maintain high levels of parity, making matches and overall season more unpredictable and appealing to the general audience. Here we quantify competitive advantage across major sports leagues in numbers using several statistical methods in order for leagues to optimize their revenue.


An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard May 2020

An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard

Master's Theses

Between 1997 and 2011, The National Marine Fisheries Service conducted 50 depletion experiments to estimate survey gear efficiency and stock density for Atlantic surfclam (Spisula solidissima) and ocean quahog (Arctica islandica) populations using commercial hydraulic dredges. The Patch Model was formulated to estimate gear efficiency and organism density from the data. The range of efficiencies estimated is substantial, leading to uncertainty in the application of these estimates in stock assessment. Analysis of depletion experiment simulations showed that uncertainty in the estimates of gear efficiency from depletion experiments was reduced by higher numbers of dredge tows per experiment, more tow overlap …


Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim May 2020

Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim

McKelvey School of Engineering Theses & Dissertations

Electronic Health Records (EHR) are widely adopted and used throughout healthcare systems and are able to collect and store longitudinal information data that can be used to describe patient phenotypes. From the underlying data structures used in the EHR, discrete data can be extracted and analyzed to improve patient care and outcomes via tasks such as risk stratification and prospective disease management. Temporality in EHR is innately present given the nature of these data, however, and traditional classification models are limited in this context by the cross- sectional nature of training and prediction processes. Finding temporal patterns in EHR is …


Point Process Modelling Of Objects In The Star Formation Complexes Of The M33 Galaxy, Dayi Li Apr 2020

Point Process Modelling Of Objects In The Star Formation Complexes Of The M33 Galaxy, Dayi Li

Electronic Thesis and Dissertation Repository

In this thesis, Gibbs point process (GPP) models are constructed to study the spatial distribution of objects in the star formation complexes of the M33 galaxy. The GPP models circumvent the limitations of the two-point correlation function employed in the current astronomy literature by naturally accounting for the inhomogeneous distribution of these objects. The spatial distribution of these objects serves as a sensitive probe in understanding the star formation process, which is crucial in understanding the formation of galaxies and the Universe. The objects under study include the CO filament structure, giant molecular clouds (GMCs) and young stellar cluster candidates …


483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal Apr 2020

483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal

GREAT Day Posters

Measles is a highly contagious disease, where large outbreaks arise by direct contact between susceptible (unvaccinated) and infectious individuals. Many Orthodox Jewish neighborhoods were affected by measles from 2018-2019. To quantify the vaccination effort on this susceptible population, a retrospective analysis was used to study the NYC and Rockland County populations using a differential equations model. A subsequent model, known as a realistically-structured network model, studied only the NYC population, in relation to typical household size. Vaccination strategies were applied to three cohorts: unvaccinated family members, members with 1 prior MMR dose, and members with 2 prior MMR doses. The …


484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair Apr 2020

484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair

GREAT Day Posters

Ebola Virus Disease (EVD) is a rare but severe disease that is transmitted among humans through direct-contact with, and close proximity to, infected bodily fluids. From 2014-16, West Africa experienced the largest Ebola outbreak ever recorded, infecting over 28,000 people, and killing over 11,000. Although the symptoms of EVD are treatable, the disease can be extremely deadly, with an average of 50% EVD cases resulting in fatality. In areas where healthcare is scarce and vaccinations are not readily available, the practices of social distancing and self-quarantining have been shown to be highly effective in combating the spread of EVD. To …


Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater Apr 2020

Demand Forecasting In Wholesale Alcohol Distribution: An Ensemble Approach, Tanvi Arora, Rajat Chandna, Stacy Conant, Bivin Sadler, Robert Slater

SMU Data Science Review

In this paper, historical data from a wholesale alcoholic beverage distributor was used to forecast sales demand. Demand forecasting is a vital part of the sale and distribution of many goods. Accurate forecasting can be used to optimize inventory, improve cash ow, and enhance customer service. However, demand forecasting is a challenging task due to the many unknowns that can impact sales, such as the weather and the state of the economy. While many studies focus effort on modeling consumer demand and endpoint retail sales, this study focused on demand forecasting from the distributor perspective. An ensemble approach was applied …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


Interdependence Across Foreign Exchange Rate Markets- A Mixed Copula Approach, Richard Adjei-Boateng Apr 2020

Interdependence Across Foreign Exchange Rate Markets- A Mixed Copula Approach, Richard Adjei-Boateng

Masters Theses & Specialist Projects

The purpose of this thesis is to study the dependence structure of exchange rate pairs using a mixture of copula as opposed to a single copula approach. Mixed copula models have the ability to generate dependence structures that do not belong to existing copula families. The flexibility in choosing component copulas in this mixture model aids the construction of a system that is simultaneously parsimonious and flexible enough to generate most dependence patterns in exchange rate data. Furthermore, the method of mixture copulas facilitates the separation of both the structure and degree of dependence, concepts that are respectively embodied in …


A Monte Carlo Analysis Of Standard Error-Based Methods For Computing Confidence Intervals, Elayna Wichert Apr 2020

A Monte Carlo Analysis Of Standard Error-Based Methods For Computing Confidence Intervals, Elayna Wichert

Masters Theses & Specialist Projects

The objective of this study is to empirically test existing techniques to calculate the likely range of values for a Classical Test Theory true score given an observed score. The traditional method for forming these confidence intervals has used the standard error of measurement (SEM) as the basis for this confidence interval. An alternate equation, the standard error of estimate (SEE), has been recommended in place of the SEM for this purpose, yet it remains overlooked in the field of psychometrics. It is important that the correct equation be used in various applications in personnel psychology. Monte Carlo analyses were …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Development Of Gaussian Learning Algorithms For Early Detection Of Alzheimer's Disease, Chen Fang Mar 2020

Development Of Gaussian Learning Algorithms For Early Detection Of Alzheimer's Disease, Chen Fang

FIU Electronic Theses and Dissertations

Alzheimer’s disease (AD) is the most common form of dementia affecting 10% of the population over the age of 65 and the growing costs in managing AD are estimated to be $259 billion, according to data reported in the 2017 by the Alzheimer's Association. Moreover, with cognitive decline, daily life of the affected persons and their families are severely impacted. Taking advantage of the diagnosis of AD and its prodromal stage of mild cognitive impairment (MCI), an early treatment may help patients preserve the quality of life and slow the progression of the disease, even though the underlying disease cannot …


Inferences For Weibull-Gamma Distribution In Presence Of Partially Accelerated Life Test, Mahmoud Mansour, M A W Mahmoud Prof., Rashad El-Sagheer Mar 2020

Inferences For Weibull-Gamma Distribution In Presence Of Partially Accelerated Life Test, Mahmoud Mansour, M A W Mahmoud Prof., Rashad El-Sagheer

Basic Science Engineering

In this paper, the point at issue is to deliberate point and interval estimations for the parameters of Weibull-Gamma distribution (WGD) using progressively Type-II censored (PROG-II-C) sample under step stress partially accelerated life test (SSPALT) model. The maximum likelihood (ML), Bayes, and four parametric bootstrap methods are used to obtain the point estimations for the distribution parameters and the acceleration factor. Furthermore, the approximate confidence intervals (ACIs), four bootstrap confidence intervals and credible intervals of the estimators have been gotten. The results of Bayes estimators are computed under the squared error loss (SEL) function using Markov Chain Monte Carlo (MCMC) …


Measuring Localization Confidence For Quantifying Accuracy And Heterogeneity In Single-Molecule Super-Resolution Microscopy, Hesam Mazidi, Tianben Ding, Arye Nehorai, Matthew D. Lew Feb 2020

Measuring Localization Confidence For Quantifying Accuracy And Heterogeneity In Single-Molecule Super-Resolution Microscopy, Hesam Mazidi, Tianben Ding, Arye Nehorai, Matthew D. Lew

Electrical & Systems Engineering Publications and Presentations

We present a computational method, termed Wasserstein-induced flux (WIF), to robustly quantify the accuracy of individual localizations within a single-molecule localization microscopy (SMLM) dataset without ground- truth knowledge of the sample. WIF relies on the observation that accurate localizations are stable with respect to an arbitrary computational perturbation. Inspired by optimal transport theory, we measure the stability of individual localizations and develop an efficient optimization algorithm to compute WIF. We demonstrate the advantage of WIF in accurately quantifying imaging artifacts in high-density reconstruction of a tubulin network. WIF represents an advance in quantifying systematic errors with unknown and complex distributions, …


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone Jan 2020

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …


A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone Jan 2020

A Two-Stage Hybrid Model By Using Artificial Neural Networks As Feature Construction Algorithms, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

We propose a two-stage hybrid approach with neural networks as the new feature construction algorithms for bankcard response classifications. The hybrid model uses a very simple neural network structure as the new feature construction tool in the first stage, then the newly created features are used as the additional input variables in logistic regression in the second stage. The model is compared with the traditional one-stage model in credit customer response classification. It is observed that the proposed two-stage model outperforms the one-stage model in terms of accuracy, the area under the ROC curve, and KS statistic. By creating new …


Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni Jan 2020

Predicting Class-Imbalanced Business Risk Using Resampling, Regularization, And Model Ensembling Algorithms, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross-validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are …


A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni Jan 2020

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank …


Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace Jan 2020

Quantitative Model For Setting Manufacturer's Suggested Retail Price, Peter Byrd, Jonathan Knowles, Dmitry Andreev, Jacob Turner, Brian Mente, Laroux Wallace

SMU Data Science Review

In this paper, we present a quantitative approach to model the manufacturer’s suggested retail price (MSRP) for children’s doll- houses and establish relationships among key features that contribute most to establishing MSRP. Determination of the MSRP is a critical step in how consumers respond with their wallets when purchasing an item. KidKraft, a global leader in toys and juvenile products, sets MSRP subjectively using product experts. The process is arduous and time consuming requiring the focus of specialized resources and knowledge of the interaction between key attributes and their impact on consumer value. An accurate prediction of MSRP during the …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


Power Analysis On A Pilot Study Of The Caloric Intake Of Children Helping Prepare Meals Versus Children Not, Danielle Clifford Jan 2020

Power Analysis On A Pilot Study Of The Caloric Intake Of Children Helping Prepare Meals Versus Children Not, Danielle Clifford

Student Research Poster Presentations 2020

The purpose of this analysis is to determine the sample size needed for a study that will be used to discover if there is a difference in the caloric intake of children who help with meal preparation and children who do not help with meal preparation.