Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

2017

Institution
Keyword
Publication
Publication Type

Articles 1 - 20 of 20

Full-Text Articles in Statistical Methodology

Statistical Analysis Of Momentum In Basketball, Mackenzi Stump Dec 2017

Statistical Analysis Of Momentum In Basketball, Mackenzi Stump

Honors Projects

The “hot hand” in sports has been debated for as long as sports have been around. The debate involves whether streaks and slumps in sports are true phenomena or just simply perceptions in the mind of the human viewer. This statistical analysis of momentum in basketball analyzes the distribution of time between scoring events for the BGSU Women’s Basketball team from 2011-2017. We discuss how the distribution of time between scoring events changes with normal game factors such as location of the game, game outcome, and several other factors. If scoring events during a game were always randomly distributed, or …


Data-Adaptive Kernel Support Vector Machine, Xin Liu Nov 2017

Data-Adaptive Kernel Support Vector Machine, Xin Liu

Electronic Thesis and Dissertation Repository

In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges …


A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro Aug 2017

A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro

Electronic Theses and Dissertations

As stigmatized persons, registered sex offenders betoken instability in communities. Depressed home sale values are associated with the presence of registered sex offenders even though the public is largely unaware of the presence of registered sex offenders. Using a spatial multilevel approach, the current study examines the role registered sex offenders influence sale values of homes sold in 2015 for three U.S. counties (rural, suburban, and urban) located in Illinois and Kentucky within the social disorganization framework. Homebuyers were surveyed to examine whether awareness of local registered sex offenders and the homebuyer’s community type operate as moderators between home selling …


Methods For Scalar-On-Function Regression, Philip T. Reiss, Jeff Goldsmith, Han Lin Shang, R. Todd Ogden Jul 2017

Methods For Scalar-On-Function Regression, Philip T. Reiss, Jeff Goldsmith, Han Lin Shang, R. Todd Ogden

Philip T. Reiss

Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.


Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang Jul 2017

Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang

Doctoral Dissertations

The goal of the dissertation is the investigation of financial risk analysis methodologies, using the schemes for extreme value modeling as well as techniques from copula modeling. Extreme value theory is concerned with probabilistic and statistical questions re- lated to unusual behavior or rare events. The subject has a rich mathematical theory and also a long tradition of applications in a variety of areas. We are interested in its application in risk management, with a focus on estimating and forcasting the Value-at-Risk of financial time series data. Extremal data are inherently scarce, thus making inference challenging. In order to obtain …


Marketing The Mountain State: A Large N Study Of User Engagement On Twitter, Kirk Richardson Jun 2017

Marketing The Mountain State: A Large N Study Of User Engagement On Twitter, Kirk Richardson

Capstone Projects – Politics and Government

Much of the evolving research on the use of social media in destination marketing emphasizes how information diffusion influences the reputational image of place. The present study uses Twitter data to focus on the relative differences in user engagement across discrete account types. Specifically, this is done to examine how the official destination marketing organization of Montana—the Montana Office of Tourism (MTOT)—performs relative to other account types. Several regression analyses conducted on Twitter data associated with an ongoing MTOT place branding campaign reveal that tweets sent from ‘official’ accounts are more likely to be retweeted, and are estimated to receive …


A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome Jun 2017

A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome

FIU Electronic Theses and Dissertations

Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap …


Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons May 2017

Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons

Student Scholar Symposium Abstracts and Posters

After going on the Warner Brothers Tour in December of 2015, I created a Gilmore Girls Instagram account. This account, which started off as a way for me to create edits of the show and post my photos from the tour turned into something bigger than I ever could have imagined. In just over a year I have over 55,000 followers. I post content including revival news, merchandise, and edits of the show that have been featured in Entertainment Weekly, Bustle, E! News, People Magazine, Yahoo News, & GilmoreNews.

I created a dataset of qualitative and quantitative outcomes from my …


Denoising Tandem Mass Spectrometry Data, Felix Offei May 2017

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy …


Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers May 2017

Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Survival analysis methods are a mainstay of the biomedical fields but are finding increasing use in other disciplines including finance and engineering. A widely used tool in survival analysis is the Cox proportional hazards regression model. For this model, all the predicted survivor curves have the same basic shape, which may not be a good approximation to reality. In contrast the Random Survival Forests does not make the proportional hazards assumption and has the flexibility to model survivor curves that are of quite different shapes for different groups of subjects. We applied both techniques to a number of publicly available …


Telephone Polls And Pps Sampling: A Potential Boon To The Polling Industry, Jade Mckay Burt May 2017

Telephone Polls And Pps Sampling: A Potential Boon To The Polling Industry, Jade Mckay Burt

Undergraduate Honors Capstone Projects

In the wake of the 2016 election, the polling industry has no shortage of critics. While these are difficult times for the industry as a whole, there are exciting innovations happening that will serve to benefit and revitalize the industry for years. One of these exciting innovations is Probability Proportional to Size (PPS) sampling. I will elaborate on what PPS sampling is and provide a mathematical foundation for its use in polling. I also discuss what some of the myriad of issues plaguing the polling industry are and then show how PPS sampling can be used to remedy many of …


Further Advances For The Sequential Multiple Assignment Randomized Trial (Smart), Tianjiao Dai Feb 2017

Further Advances For The Sequential Multiple Assignment Randomized Trial (Smart), Tianjiao Dai

Dissertations & Theses (Open Access)

ABSTRACT

FURTHER ADVANCES FOR THE SEQUENTIAL MULTIPLE ASSIGNMENT RANDOMIZED TRIAL (SMART)

Tianjiao Dai, M.S.

Advisory Professor: Sanjay Shete, Ph.D.

Sequential multiple assignment randomized trial (SMART) designs have been developed these years for studying adaptive interventions. In my Ph.D. study, I mainly investigate how to further improve SMART designs and optimize the interventions for each individual in the trial. My dissertation has focused on two topics of SMART designs.

1) Developing a novel SMART design that can reduce the cost and side effects associated with the interventions and proposing the corresponding analytic methods. I have developed a time-varying SMART design in …


What’S Brewing? A Statistics Education Discovery Project, Marla A. Sole, Sharon L. Weinberg Jan 2017

What’S Brewing? A Statistics Education Discovery Project, Marla A. Sole, Sharon L. Weinberg

Publications and Research

We believe that students learn best, are actively engaged, and are genuinely interested when working on real-world problems. This can be done by giving students the opportunity to work collaboratively on projects that investigate authentic, familiar problems. This article shares one such project that was used in an introductory statistics course. We describe the steps taken to investigate why customers are charged more for iced coffee than hot coffee, which included collecting data and using descriptive and inferential statistical analysis. Interspersed throughout the article, we describe strategies that can help teachers implement the project and scaffold material to assist students …


Approximate Bayesian Computation In Forensic Science, Jessie H. Hendricks Jan 2017

Approximate Bayesian Computation In Forensic Science, Jessie H. Hendricks

The Journal of Undergraduate Research

Forensic evidence is often an important factor in criminal investigations. Analyzing evidence in an objective way involves the use of statistics. However, many evidence types (i.e., glass fragments, fingerprints, shoe impressions) are very complex. This makes the use of statistical methods, such as model selection in Bayesian inference, extremely difficult.

Approximate Bayesian Computation is an algorithmic method in Bayesian analysis that can be used for model selection. It is especially useful because it can be used to assign a Bayes Factor without the need to directly evaluate the exact likelihood function - a difficult task for complex data. Several criticisms …


Machine Learning On Statistical Manifold, Bo Zhang Jan 2017

Machine Learning On Statistical Manifold, Bo Zhang

HMC Senior Theses

This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms …


Comparing The Structural Components Variance Estimator And U-Statistics Variance Estimator When Assessing The Difference Between Correlated Aucs With Finite Samples, Anna L. Bosse Jan 2017

Comparing The Structural Components Variance Estimator And U-Statistics Variance Estimator When Assessing The Difference Between Correlated Aucs With Finite Samples, Anna L. Bosse

Theses and Dissertations

Introduction: The structural components variance estimator proposed by DeLong et al. (1988) is a popular approach used when comparing two correlated AUCs. However, this variance estimator is biased and could be problematic with small sample sizes.

Methods: A U-statistics based variance estimator approach is presented and compared with the structural components variance estimator through a large-scale simulation study under different finite-sample size configurations.

Results: The U-statistics variance estimator was unbiased for the true variance of the difference between correlated AUCs regardless of the sample size and had lower RMSE than the structural components variance estimator, providing better type 1 error …


On The Equivalence Between Bayesian And Frequentist Nonparametric Hypothesis Testing, Qiuchen Hai Jan 2017

On The Equivalence Between Bayesian And Frequentist Nonparametric Hypothesis Testing, Qiuchen Hai

Dissertations, Master's Theses and Master's Reports

Testing of hypotheses about the population parameter is one of the most fundamental tasks in the empirical sciences and is often conducted by using parametric tests (e.g., the t-test and F-test), in which they assume that the samples are from populations that are normally distributed. When the normality assumption is violated, nonparametric tests are employed as alternatives for making statistical inference. In recent years, the Bayesian versions of parametric tests have been well studied in the literature, whereas in contrast, the Bayesian versions of nonparametric tests are quite scant (for exception, Yuan and Johnson (2008) ) in the literature, mainly …


Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan Jan 2017

Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan

Theses and Dissertations--Statistics

We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.

We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses …


Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu Jan 2017

Nonparametric Compound Estimation, Derivative Estimation, And Change Point Detection, Sisheng Liu

Theses and Dissertations--Statistics

Firstly, we reviewed some popular nonparameteric regression methods during the past several decades. Then we extended the compound estimation (Charnigo and Srinivasan [2011]) to adapt random design points and heteroskedasticity and proposed a modified Cp criteria for tuning parameter selection. Moreover, we developed a DCp criteria for tuning paramter selection problem in general nonparametric derivative estimation. This extends GCp criteria in Charnigo, Hall and Srinivasan [2011] with random design points and heteroskedasticity. Next, we proposed a change point detection method via compound estimation for both fixed design and random design case, the adaptation of heteroskedasticity was considered for the method. …


Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua Dec 2016

Penalized Nonparametric Scalar-On-Function Regression Via Principal Coordinates, Philip T. Reiss, David L. Miller, Pei-Shien Wu, Wen-Yu Hua

Philip T. Reiss

A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. The core idea is to regress the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, the proposed …