Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

2017

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 50

Full-Text Articles in Statistical Methodology

Statistical Analysis Of Momentum In Basketball, Mackenzi Stump Dec 2017

Statistical Analysis Of Momentum In Basketball, Mackenzi Stump

Honors Projects

The “hot hand” in sports has been debated for as long as sports have been around. The debate involves whether streaks and slumps in sports are true phenomena or just simply perceptions in the mind of the human viewer. This statistical analysis of momentum in basketball analyzes the distribution of time between scoring events for the BGSU Women’s Basketball team from 2011-2017. We discuss how the distribution of time between scoring events changes with normal game factors such as location of the game, game outcome, and several other factors. If scoring events during a game were always randomly distributed, or …


Bayesian Model For Detection Of Outliers In Linear Regression With Application To Longitudinal Data, Zahraa Al-Sharea Dec 2017

Bayesian Model For Detection Of Outliers In Linear Regression With Application To Longitudinal Data, Zahraa Al-Sharea

Graduate Theses and Dissertations

Outlier detection is one of the most important challenges with many present-day applications. Outliers can occur due to uncertainty in data generating mechanisms or due to an error in data recording/processing. Outliers can drastically change the study's results and make predictions less reliable. Detecting outliers in longitudinal studies is quite challenging because this kind of study is working with observations that change over time. Therefore, the same subject can produce an outlier at one point in time produce regular observations at all other time points. A Bayesian hierarchical modeling assigns parameters that can quantify whether each observation is an outlier …


Variational Bayes Estimation Of Discrete-Margined Copula Models With Application To Ime Series, Ruben Loaiza-Maya, Michael S. Smith Nov 2017

Variational Bayes Estimation Of Discrete-Margined Copula Models With Application To Ime Series, Ruben Loaiza-Maya, Michael S. Smith

Michael Stanley Smith

We propose a new variational Bayes estimator for high-dimensional copulas with discrete, or a combination of discrete and continuous, margins. The method is based on a variational approximation to a tractable augmented posterior, and is faster than previous likelihood-based approaches. We use it to estimate drawable vine copulas for univariate and multivariate Markov ordinal and mixed time series. These have dimension $rT$, where $T$ is the number of observations and $r$ is the number of series, and are difficult to estimate using previous methods. 
The vine pair-copulas are carefully selected to allow for heteroskedasticity, which is a feature of most ordinal …


Data-Adaptive Kernel Support Vector Machine, Xin Liu Nov 2017

Data-Adaptive Kernel Support Vector Machine, Xin Liu

Electronic Thesis and Dissertation Repository

In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges …


Constructing A Confidence Interval For The Fraction Who Benefit From Treatment, Using Randomized Trial Data, Emily J. Huang, Ethan X. Fang, Daniel F. Hanley, Michael Rosenblum Oct 2017

Constructing A Confidence Interval For The Fraction Who Benefit From Treatment, Using Randomized Trial Data, Emily J. Huang, Ethan X. Fang, Daniel F. Hanley, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

The fraction who benefit from treatment is the proportion of patients whose potential outcome under treatment is better than that under control. Inference on this parameter is challenging since it is only partially identifiable, even in our context of a randomized trial. We propose a new method for constructing a confidence interval for the fraction, when the outcome is ordinal or binary. Our confidence interval procedure is pointwise consistent. It does not require any assumptions about the joint distribution of the potential outcomes, although it has the flexibility to incorporate various user-defined assumptions. Unlike existing confidence interval methods for partially …


On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira Oct 2017

On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira

Electronic Thesis and Dissertation Repository

In family studies, we are interested in estimating the penetrance function of the event of interest in the presence of competing risks. Failure to account for competing risks may lead to bias in the estimation of the penetrance function. In this thesis, three statistical challenges are addressed: clustering, missing data, and competing risks. We proposed the cause-specific model with shared frailty and ascertainment correction to account for clustering and competing risks along with ascertainment of families into study. Multiple imputation is used to account for missing data. The simulation study showed good performance of our proposed model in estimating the …


Comparison Of Adaptive Randomized Trial Designs For Time-To-Event Outcomes That Expand Versus Restrict Enrollment Criteria, To Test Non-Inferiority, Josh Betz, Jon Arni Steingrimsson, Tianchen Qian, Michael Rosenblum Sep 2017

Comparison Of Adaptive Randomized Trial Designs For Time-To-Event Outcomes That Expand Versus Restrict Enrollment Criteria, To Test Non-Inferiority, Josh Betz, Jon Arni Steingrimsson, Tianchen Qian, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

Adaptive enrichment designs involve preplanned rules for modifying patient enrollment criteria based on data accrued in an ongoing trial. These designs may be useful when it is suspected that a subpopulation, e.g., defined by a biomarker or risk score measured at baseline, may benefit more from treatment than the complementary subpopulation. We compare two types of such designs, for the case of two subpopulations that partition the overall population. The first type starts by enrolling the subpopulation where it is suspected the new treatment is most likely to work, and then may expand inclusion criteria if there is early evidence …


Tree-Based Regression For Interval-Valued Data, Chih-Ching Yeh Aug 2017

Tree-Based Regression For Interval-Valued Data, Chih-Ching Yeh

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Regression methods for interval-valued data have been increasingly studied in recent years. As most of the existing works focus on linear models, it is important to note that many problems in practice are nonlinear in nature and therefore development of nonlinear regression tools for intervalvalued data is crucial. In this project, we propose a tree-based regression method for interval-valued data, which is well applicable to both linear and nonlinear problems. Unlike linear regression models that usually require additional constraints to ensure positivity of the predicted interval length, the proposed method estimates the regression function in a nonparametric way, so the …


Statistical Computational Topology And Geometry For Understanding Data, Joshua Lee Mike Aug 2017

Statistical Computational Topology And Geometry For Understanding Data, Joshua Lee Mike

Doctoral Dissertations

Here we describe three projects involving data analysis which focus on engaging statistics with the geometry and/or topology of the data.

The first project involves the development and implementation of kernel density estimation for persistence diagrams. These kernel densities consider neighborhoods for every feature in the center diagram and gives to each feature an independent, orthogonal direction. The creation of kernel densities in this realm yields a (previously unavailable) full characterization of the (random) geometry of a dataspace or data distribution.

In the second project, cohomology is used to guide a search for kidney exchange cycles within a kidney paired …


A Tail-Based Test For Differential Expression Analysis And Pathway Analysis In Rna-Sequencing Data, Jiong Chen Aug 2017

A Tail-Based Test For Differential Expression Analysis And Pathway Analysis In Rna-Sequencing Data, Jiong Chen

Dissertations & Theses (Open Access)

RNA sequencing data have been abundantly generated in biomedical research for biomarker discovery and pathway analysis. Such data at the exon-level are usually heavily tailed and correlated. Conventional statistical tests based on the mean or median difference for differential expression likely suffer from low power when the between-group difference occurs mostly in the upper or lower tail of the distribution of gene expression. We propose a tail-based test to make comparisons between groups in terms of a specific distribution area rather than a single location. The proposed test, which is derived from quantile regression, adjusts for covariates and accounts for …


Novel Bayesian Adaptive Clinical Trial Designs In Early Phases, Haitao Pan Aug 2017

Novel Bayesian Adaptive Clinical Trial Designs In Early Phases, Haitao Pan

Dissertations & Theses (Open Access)

Early phase, or phase I and phase II, trials are the first step in testing new medicines that have been developed in the lab. The main goal of phase I clinical trials is to establish the recommended dose of new drugs for phase II trials. For the cytotoxic drugs, the goal is to find maximum tolerated dose (MTD). The guiding principle for dose escalation in phase I trials is to avoid exposing too many patients to subtherapeutic doses while preserving safety and maintaining rapid accrual. Therefore, dose escalation methods, especially Bayesian designs, are recommended to be used in phase I …


A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro Aug 2017

A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro

Electronic Theses and Dissertations

As stigmatized persons, registered sex offenders betoken instability in communities. Depressed home sale values are associated with the presence of registered sex offenders even though the public is largely unaware of the presence of registered sex offenders. Using a spatial multilevel approach, the current study examines the role registered sex offenders influence sale values of homes sold in 2015 for three U.S. counties (rural, suburban, and urban) located in Illinois and Kentucky within the social disorganization framework. Homebuyers were surveyed to examine whether awareness of local registered sex offenders and the homebuyer’s community type operate as moderators between home selling …


Methods For Scalar-On-Function Regression, Philip T. Reiss, Jeff Goldsmith, Han Lin Shang, R. Todd Ogden Jul 2017

Methods For Scalar-On-Function Regression, Philip T. Reiss, Jeff Goldsmith, Han Lin Shang, R. Todd Ogden

Philip T. Reiss

Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.


Testing The Independence Hypothesis Of Accepted Mutations For Pairs Of Adjacent Amino Acids In Protein Sequences, Jyotsna Ramanan, Peter Revesz Jul 2017

Testing The Independence Hypothesis Of Accepted Mutations For Pairs Of Adjacent Amino Acids In Protein Sequences, Jyotsna Ramanan, Peter Revesz

School of Computing: Faculty Publications

Evolutionary studies usually assume that the genetic mutations are independent of each other. However, that does not imply that the observed mutations are independent of each other because it is possible that when a nucleotide is mutated, then it may be biologically beneficial if an adjacent nucleotide mutates too. With a number of decoded genes currently available in various genome libraries and online databases, it is now possible to have a large-scale computer-based study to test whether the independence assumption holds for pairs of adjacent amino acids. Hence the independence question also arises for pairs of adjacent amino acids within …


Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang Jul 2017

Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang

Doctoral Dissertations

The goal of the dissertation is the investigation of financial risk analysis methodologies, using the schemes for extreme value modeling as well as techniques from copula modeling. Extreme value theory is concerned with probabilistic and statistical questions re- lated to unusual behavior or rare events. The subject has a rich mathematical theory and also a long tradition of applications in a variety of areas. We are interested in its application in risk management, with a focus on estimating and forcasting the Value-at-Risk of financial time series data. Extremal data are inherently scarce, thus making inference challenging. In order to obtain …


Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu Jul 2017

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

In this thesis, we propose statistical models for addressing commonly encountered data types and study designs in large epidemiologic investigations aimed at understanding the molecular basis of complex disorders. The motivating applications come from diverse disease areas in Women's Health, including the study of type II diabetes in the Women's Health Initiative (WHI), invasive breast cancer in the Nurses' Health Study and the study of the metabolomic underpinnings of cardiovascular disease in the WHI. We have also put significant effort into making the implementation of the proposed methods accessible through freely available, user-friendly software packages in R. The first chapter …


Marketing The Mountain State: A Large N Study Of User Engagement On Twitter, Kirk Richardson Jun 2017

Marketing The Mountain State: A Large N Study Of User Engagement On Twitter, Kirk Richardson

Capstone Projects – Politics and Government

Much of the evolving research on the use of social media in destination marketing emphasizes how information diffusion influences the reputational image of place. The present study uses Twitter data to focus on the relative differences in user engagement across discrete account types. Specifically, this is done to examine how the official destination marketing organization of Montana—the Montana Office of Tourism (MTOT)—performs relative to other account types. Several regression analyses conducted on Twitter data associated with an ongoing MTOT place branding campaign reveal that tweets sent from ‘official’ accounts are more likely to be retweeted, and are estimated to receive …


A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome Jun 2017

A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome

FIU Electronic Theses and Dissertations

Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap …


Propensity Score Analysis With Matching Weights, Liang Li Jun 2017

Propensity Score Analysis With Matching Weights, Liang Li

Liang Li

The propensity score analysis is one of the most widely used methods for studying the causal treatment effect in observational studies. This paper studies treatment effect estimation with the method of matching weights. This method resembles propensity score matching but offers a number of new features including efficient estimation, rigorous variance calculation, simple asymptotics, statistical tests of balance, clearly identified target population with optimal sampling property, and no need for choosing matching algorithm and caliper size. In addition, we propose the mirror histogram as a useful tool for graphically displaying balance. The method also shares some features of the inverse …


Incorporating Place And Space: A Hierarchical Spatial Approach To Exploring Preventable Congestive Heart Failure Hospitalizations In New York City, Rachael Weiss Riley Jun 2017

Incorporating Place And Space: A Hierarchical Spatial Approach To Exploring Preventable Congestive Heart Failure Hospitalizations In New York City, Rachael Weiss Riley

Dissertations and Theses

Background: Faced with rising medical care costs, increasing prevalence, and widening health disparities, preventing congestive heart failure (CHF) hospitalizations is a central public health concern. Despite evidence of geographical clustering in preventable CHF admissions, there is a lack of research designed to examine spatial patterning of CHF and the local area neighborhood determinants that contribute to this variability. This study sought to assess and evaluate the importance of both space and place in analyzing preventable CHF hospitalizations and readmissions by applying appropriate statistical techniques, clarifying the assumption inherent in each method, and interpreting the findings within the context of existing …


Mechanistic Mathematical Models: An Underused Platform For Hpv Research, Marc Ryser, Patti Gravitt, Evan R. Myers Jun 2017

Mechanistic Mathematical Models: An Underused Platform For Hpv Research, Marc Ryser, Patti Gravitt, Evan R. Myers

Global Health Faculty Publications

Health economic modeling has become an invaluable methodology for the design and evaluation of clinical and public health interventions against the human papillomavirus (HPV) and associated diseases. At the same time, relatively little attention has been paid to a different yet complementary class of models, namely that of mechanistic mathematical models. The primary focus of mechanistic mathematical models is to better understand the intricate biologic mechanisms and dynamics of disease. Inspired by a long and successful history of mechanistic modeling in other biomedical fields, we highlight several areas of HPV research where mechanistic models have the potential to advance the …


Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons May 2017

Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons

Student Scholar Symposium Abstracts and Posters

After going on the Warner Brothers Tour in December of 2015, I created a Gilmore Girls Instagram account. This account, which started off as a way for me to create edits of the show and post my photos from the tour turned into something bigger than I ever could have imagined. In just over a year I have over 55,000 followers. I post content including revival news, merchandise, and edits of the show that have been featured in Entertainment Weekly, Bustle, E! News, People Magazine, Yahoo News, & GilmoreNews.

I created a dataset of qualitative and quantitative outcomes from my …


On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang May 2017

On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang

Arts & Sciences Electronic Theses and Dissertations

The general goal of this thesis is to investigate and examine some issues about post-selection inference which arises from the setting where statistical inference is carried out after a datadriven model selection step. In this setting, the classical inference theory which requires a fixed priori model becomes invalid since the selected model is a result of random event. Hence, a common practice in applied research which ignores the model selection and builds up confidence interval will result in misleading or even false conclusion. In this thesis, specifically, we first discusses some examples to show how the classical inference theory loses …


Denoising Tandem Mass Spectrometry Data, Felix Offei May 2017

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy …


Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers May 2017

Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Survival analysis methods are a mainstay of the biomedical fields but are finding increasing use in other disciplines including finance and engineering. A widely used tool in survival analysis is the Cox proportional hazards regression model. For this model, all the predicted survivor curves have the same basic shape, which may not be a good approximation to reality. In contrast the Random Survival Forests does not make the proportional hazards assumption and has the flexibility to model survivor curves that are of quite different shapes for different groups of subjects. We applied both techniques to a number of publicly available …


Telephone Polls And Pps Sampling: A Potential Boon To The Polling Industry, Jade Mckay Burt May 2017

Telephone Polls And Pps Sampling: A Potential Boon To The Polling Industry, Jade Mckay Burt

Undergraduate Honors Capstone Projects

In the wake of the 2016 election, the polling industry has no shortage of critics. While these are difficult times for the industry as a whole, there are exciting innovations happening that will serve to benefit and revitalize the industry for years. One of these exciting innovations is Probability Proportional to Size (PPS) sampling. I will elaborate on what PPS sampling is and provide a mathematical foundation for its use in polling. I also discuss what some of the myriad of issues plaguing the polling industry are and then show how PPS sampling can be used to remedy many of …


Trend And Return Level Of Extreme Snow Events In New York City, Mintaek Lee May 2017

Trend And Return Level Of Extreme Snow Events In New York City, Mintaek Lee

Boise State University Theses and Dissertations

A major winter storm brought up to 42 inches of snow in parts of the Mid-Atlantic and Northeast United States for January 22-24, 2016. The blizzard of January 2016 impacted about 102.8 million people, where at least 55 people died due to the snowstorm and it caused economic losses in a range of $500 million to $3 billion. This thesis studies two important aspects of extreme snow events: maximum snowfall and maximum snow depth. We apply extreme value methods to extreme snowfall and snow depth data from the New York City area to examine if there are any significant linear …


Error Costs, Legal Standards Of Proof And Statistical Significance, Michelle Burtis, Jonah B. Gelbach, Bruce H. Kobayashi Apr 2017

Error Costs, Legal Standards Of Proof And Statistical Significance, Michelle Burtis, Jonah B. Gelbach, Bruce H. Kobayashi

All Faculty Scholarship

The relationship between legal standards of proof and thresholds of statistical significance is a well-known and studied phenomena in the academic literature. Moreover, the distinction between the two has been recognized in law. For example, in Matrix v. Siracusano, the Court unanimously rejected the petitioner’s argument that the issue of materiality in a securities class action can be defined by the presence or absence of a statistically significant effect. However, in other contexts, thresholds based on fixed significance levels imported from academic settings continue to be used as a legal standard of proof. Our positive analysis demonstrates how a …


Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger Apr 2017

Estimating Autoantibody Signatures To Detect Autoimmune Disease Patient Subsets, Zhenke Wu, Livia Casciola-Rosen, Ami A. Shah, Antony Rosen, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Autoimmune diseases are characterized by highly specific immune responses against molecules in self-tissues. Different autoimmune diseases are characterized by distinct immune responses, making autoantibodies useful for diagnosis and prediction. In many diseases, the targets of autoantibodies are incompletely defined. Although the technologies for autoantibody discovery have advanced dramatically over the past decade, each of these techniques generates hundreds of possibilities, which are onerous and expensive to validate. We set out to establish a method to greatly simplify autoantibody discovery, using a pre-filtering step to define subgroups with similar specificities based on migration of labeled, immunoprecipitated proteins on sodium dodecyl sulfate …


Contributions To Statistical Testing, Prediction, And Modeling, John C. Pesko Mar 2017

Contributions To Statistical Testing, Prediction, And Modeling, John C. Pesko

Mathematics & Statistics ETDs

1. "Parametric Bootstrap (PB) and Objective Bayesian (OB) Testing with Applications to Heteroscedastic ANOVA": For one-way heteroscedastic ANOVA, we show a close relationship between the PB and OB approaches to significance testing, demonstrating the conditions for which the two approaches are equivalent. Using a simulation study, PB and OB performance is compared to a test based on the predictive distribution as well as the unweighted test of Akritas & Papadatos (2004). We extend this work to the RCBD with subsampling model, and prove a repeated sampling property and large sample property for general OB significance testing.

2. "Early Identification of …