Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology

2018

Institution
Keyword
Publication
Publication Type

Articles 1 - 26 of 26

Full-Text Articles in Applied Statistics

Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma Nov 2018

Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma

Electronic Thesis and Dissertation Repository

When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a …


Resource Assessment Report Temperate Demersal Elasmobranch Resource Of Western Australia, Matias Braccini, Nick Blay, S. A. Hesp, Brett Molony Nov 2018

Resource Assessment Report Temperate Demersal Elasmobranch Resource Of Western Australia, Matias Braccini, Nick Blay, S. A. Hesp, Brett Molony

Fisheries research reports

This document provides a cumulative description and assessment of the TDER and all of the fishing activities (i.e. fisheries / fishing sectors) affecting this resource in WA. Future Resource Assessment Reports will assess the Statewide Sharks and Rays Resource. The report is focused on the temperate indicator species (whiskery, gummy, dusky and sandbar sharks) used to assess the suites of demersal sharks and rays that comprise this resource. These species are primarily captured by demersal gillnets used in the TDGDLF that operate in the West Coast and South Coast Bioregions. For the North Coast bioregion, no commercial fishing for sharks …


Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak Oct 2018

Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak

Masters Theses

Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely …


Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao Oct 2018

Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao

Major Papers

In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.

Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors …


Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri Oct 2018

Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri

Publications and Research

Water risk management is a ubiquitous challenge faced by stakeholders in the water or agricultural sector. We present a methodological framework for forecasting water storage requirements and present an application of this methodology to risk assessment in India. The application focused on forecasting crop water stress for potatoes grown during the monsoon season in the Satara district of Maharashtra. Pre-season large-scale climate predictors used to forecast water stress were selected based on an exhaustive search method that evaluates for highest ranked probability skill score and lowest root-mean-squared error in a leave-one-out cross-validation mode. Adaptive forecasts were made in the years …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Generalizing Multistage Partition Procedures For Two-Parameter Exponential Populations, Rui Wang Aug 2018

Generalizing Multistage Partition Procedures For Two-Parameter Exponential Populations, Rui Wang

University of New Orleans Theses and Dissertations

ANOVA analysis is a classic tool for multiple comparisons and has been widely used in numerous disciplines due to its simplicity and convenience. The ANOVA procedure is designed to test if a number of different populations are all different. This is followed by usual multiple comparison tests to rank the populations. However, the probability of selecting the best population via ANOVA procedure does not guarantee the probability to be larger than some desired prespecified level. This lack of desirability of the ANOVA procedure was overcome by researchers in early 1950's by designing experiments with the goal of selecting the best …


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, …


Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis Jul 2018

Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis

SMU Data Science Review

A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.


Australian Herring And West Australian Salmon Scientific Workshop Report, October 2017, Department Of Primary Industries And Regional Development, Western Australia Jul 2018

Australian Herring And West Australian Salmon Scientific Workshop Report, October 2017, Department Of Primary Industries And Regional Development, Western Australia

Fisheries research reports

No abstract provided.


Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen May 2018

Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen

Electronic Theses and Dissertations

The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a …


Assessing The Ordinality Of Response Bias With Item Response Models: A Case Study Using The Phq-9, Venessa N. Singhroy May 2018

Assessing The Ordinality Of Response Bias With Item Response Models: A Case Study Using The Phq-9, Venessa N. Singhroy

Dissertations, Theses, and Capstone Projects

Improper scale usage in psychological and clinical assessment is an important problem. If respondents do not use the scales in a consistent manner, the reliability of a composite is likely to be attenuated. This is particularly problematic when particular items are singled out for special treatment or when subscales are of interest, not just a total score. This study used both non-parametric and parametric item response theory (IRT) methods to gain further insight into the validity of the PHQ-9, a dual purpose instrument that assesses the severity of depressive symptoms using nine Likert-scale items and allows the investigator to establish …


Analysis Challenges For High Dimensional Data, Bangxin Zhao Apr 2018

Analysis Challenges For High Dimensional Data, Bangxin Zhao

Electronic Thesis and Dissertation Repository

In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.

Two methods …


Using Random Forests To Describe Equity In Higher Education: A Critical Quantitative Analysis Of Utah’S Postsecondary Pipelines, Tyler Mcdaniel Apr 2018

Using Random Forests To Describe Equity In Higher Education: A Critical Quantitative Analysis Of Utah’S Postsecondary Pipelines, Tyler Mcdaniel

Butler Journal of Undergraduate Research

The following work examines the Random Forest (RF) algorithm as a tool for predicting student outcomes and interrogating the equity of postsecondary education pipelines. The RF model, created using longitudinal data of 41,303 students from Utah's 2008 high school graduation cohort, is compared to logistic and linear models, which are commonly used to predict college access and success. Substantially, this work finds High School GPA to be the best predictor of postsecondary GPA, whereas commonly used ACT and AP test scores are not nearly as important. Each model identified several demographic disparities in higher education access, most significantly the effects …


Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia Apr 2018

Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia

Statistical Science Theses and Dissertations

This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.

Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level …


Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff Mar 2018

Sabermetrics - Statistical Modeling Of Run Creation And Prevention In Baseball, Parker Chernoff

FIU Electronic Theses and Dissertations

The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning.

Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per …


On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams Mar 2018

On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams

FIU Electronic Theses and Dissertations

The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of …


On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar Mar 2018

On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar

FIU Electronic Theses and Dissertations

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …


Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna Mar 2018

Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna

Melissa Luna

No abstract provided.


Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam Mar 2018

Advances In Semi-Nonparametric Density Estimation And Shrinkage Regression, Hossein Zareamoghaddam

Electronic Thesis and Dissertation Repository

This thesis advocates the use of shrinkage and penalty techniques for estimating the parameters of a regression model that comprises both parametric and nonparametric components and develops semi-nonparametric density estimation methodologies that are applicable in a regression context.

First, a moment-based approach whereby a univariate or bivariate density function is approximated by means of a suitable initial density function that is adjusted by a linear combination of orthogonal polynomials is introduced. Such adjustments are shown to be mathematically equivalent to making use of standard polynomials in one or two variables. Once extended to apply to density estimation, in which case …


Building A Better Risk Prevention Model, Steven Hornyak Mar 2018

Building A Better Risk Prevention Model, Steven Hornyak

National Youth Advocacy and Resilience Conference

This presentation chronicles the work of Houston County Schools in developing a risk prevention model built on more than ten years of longitudinal student data. In its second year of implementation, Houston At-Risk Profiles (HARP), has proven effective in identifying those students most in need of support and linking them to interventions and supports that lead to improved outcomes and significantly reduces the risk of failure.


Advances In The Modeling Of Heavy-Tailed Distributions, Sang Jin Kang Jan 2018

Advances In The Modeling Of Heavy-Tailed Distributions, Sang Jin Kang

Electronic Thesis and Dissertation Repository

Several advances are proposed in connection with the approximation and estimation of heavy-tailed distributions, some of which also apply to other types of distributions. It is first explained that on initially applying the Esscher transform to heavy-tailed density functions such as the Pareto, Student-t and Cauchy densities, one can utilize a moment-based technique whereby the tilted density functions are expressed as the product of a base density function and a polynomial adjustment. Alternatively, density approximants can be secured by appropriately truncating the distributions or mapping them onto compact supports. The validity of these approaches is corroborated by simulation studies. …


The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie Jan 2018

The Family Of Conditional Penalized Methods With Their Application In Sufficient Variable Selection, Jin Xie

Theses and Dissertations--Statistics

When scientists know in advance that some features (variables) are important in modeling a data, then these important features should be kept in the model. How can we utilize this prior information to effectively find other important features? This dissertation is to provide a solution, using such prior information. We propose the Conditional Adaptive Lasso (CAL) estimates to exploit this knowledge. By choosing a meaningful conditioning set, namely the prior information, CAL shows better performance in both variable selection and model estimation. We also propose Sufficient Conditional Adaptive Lasso Variable Screening (SCAL-VS) and Conditioning Set Sufficient Conditional Adaptive Lasso Variable …


Campus Climate Sexual Assault Survey (2015) Analysis, Felicia Rosin Jan 2018

Campus Climate Sexual Assault Survey (2015) Analysis, Felicia Rosin

Williams Honors College, Honors Research Projects

The issue of sexual assault has garnered widespread attention in recent years, as is evident by the growing number of high-profile cases and mainstream social movements. With this increasingly bright spotlight, it is no surprise that The University of Akron has interest in improving the sexual violence education programs offered to students. In 2015, the university conducted a survey to gather information on the campus climate surrounding sexual assault. This analysis dives into a deeper analysis of the data gathered in an attempt to pinpoint areas that require the university’s attention. The analysis covers topics identified by Dean of Students …


Joint Analysis Of Multiple Phenotypes In Association Studies, Xiaoyu Liang Jan 2018

Joint Analysis Of Multiple Phenotypes In Association Studies, Xiaoyu Liang

Dissertations, Master's Theses and Master's Reports

Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. Therefore, developing statistical methods to test for genetic association with multiple phenotypes has become increasingly important. …


Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh Jan 2018

Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh

Electronic Theses and Dissertations

Three new generalized distributions developed via completing risk, gamma generator, Marshall-Olkin generator and exponentiation techniques are proposed and studied. Structural properties including quantile functions, hazard rate functions, moment, conditional moments, mean deviations, R\'enyi entropy, distribution of order statistics and maximum likelihood estimates are presented. Monte Carlo simulation is employed to examine the performance of the proposed distributions. Applications of the generalized distributions to real lifetime data are presented to illustrate the usefulness of the models.