Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Physical Sciences and Mathematics

Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou Nov 2013

Regularization Methods For Predicting An Ordinal Response Using Longitudinal High-Dimensional Genomic Data, Jiayi Hou

Theses and Dissertations

Ordinal scales are commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. Notable examples include cancer staging, which is a five-category ordinal scale indicating tumor size, node involvement, and likelihood of metastasizing. Glasgow Coma Scale (GCS), which gives a reliable and objective assessment of conscious status of a patient, is an ordinal scaled measure. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical ordinal modeling methods based on the likelihood approach have contributed to the analysis of data in …


Review And Extension For The O’Brien Fleming Multiple Testing Procedure, Hanan Hammouri Nov 2013

Review And Extension For The O’Brien Fleming Multiple Testing Procedure, Hanan Hammouri

Theses and Dissertations

O'Brien and Fleming (1979) proposed a straightforward and useful multiple testing procedure (group sequential testing procedure) for comparing two treatments in clinical trials where subject responses are dichotomous (e.g. success and failure). O'Brien and Fleming stated that their group sequential testing procedure has the same Type I error rate and power as that of a fixed one-stage chi-square test, but gives the opportunity to terminate the trial early when one treatment is clearly performing better than the other. We studied and tested the O'Brien and Fleming procedure specifically by correcting the originally proposed critical values. Furthermore, we updated the O’Brien …


Response Adaptive Design Using Auxiliary And Primary Outcomes, Shuxian Sinks Nov 2013

Response Adaptive Design Using Auxiliary And Primary Outcomes, Shuxian Sinks

Theses and Dissertations

Response adaptive designs intend to allocate more patients to better treatments without undermining the validity and the integrity of the trial. The immediacy of the primary response (e.g. deaths, remission) determines the efficiency of the response adaptive design, which often requires outcomes to be quickly or immediately observed. This presents difficulties for survival studies, which may require long durations to observe the primary endpoint. Therefore, we introduce auxiliary endpoints to assist the adaptation with the primary endpoint, where an auxiliary endpoint is generally defined as any measurement that is positively associated with the primary endpoint. Our proposed design (referred to …


The Effects Of School Type On Kindergarten Reading Achievement: Comparing Multiple Regression To Propensity Score Matching, Farrin Denise Bridgewater Aug 2013

The Effects Of School Type On Kindergarten Reading Achievement: Comparing Multiple Regression To Propensity Score Matching, Farrin Denise Bridgewater

Theses and Dissertations

BACKGROUND: Students taught at private schools by and large attain higher marks on reading achievement tests than do students taught at public schools. This difference is further aggravated by race, socioeconomic status, and reading ability at the entry of kindergarten.

PURPOSE: The goal of this nonexperimental study was to investigate whether students in either school type vary in reading achievement when they are measured on similar confounding variables (i.e., race, SES, and reading scores at the entrance of kindergarten).

METHODS: Propensity score matching, a method used to estimate causal treatment effect, was used to analyze the original sample of 12,250 …


The Estimation And Evaluation Of Optimal Thresholds For Two Sequential Testing Strategies, Amber R. Wilk Jul 2013

The Estimation And Evaluation Of Optimal Thresholds For Two Sequential Testing Strategies, Amber R. Wilk

Theses and Dissertations

Many continuous medical tests often rely on a threshold for diagnosis. There are two sequential testing strategies of interest: Believe the Positive (BP) and Believe the Negative (BN). BP classifies a patient positive if either the first test is greater than a threshold θ1 or negative on the first test and greater than θ2 on the second test. BN classifies a patient positive if the first test is greater than a threshold θ3 and greater than θ4 on the second test. Threshold pairs θ = (θ1, θ2) or (θ3, θ4), depending on strategy, are defined as optimal if they maximized …


Choosing The Cut Point For A Restricted Mean In Survival Analysis, A Data Driven Method, Emily H. Sheldon Apr 2013

Choosing The Cut Point For A Restricted Mean In Survival Analysis, A Data Driven Method, Emily H. Sheldon

Theses and Dissertations

Survival Analysis generally uses the median survival time as a common summary statistic. While the median possesses the desirable characteristic of being unbiased, there are times when it is not the best statistic to describe the data at hand. Royston and Parmar (2011) provide an argument that the restricted mean survival time should be the summary statistic used when the proportional hazards assumption is in doubt. Work in Restricted Means dates back to 1949 when J.O. Irwin developed a calculation for the standard error of the restricted mean using Greenwood’s formula. Since then the development of the restricted mean has …


Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese Apr 2013

Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese

Theses and Dissertations

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal components analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of principal components analysis to quantify the existence of batch effects, called guided PCA (gPCA). We describe a …


Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico Mar 2013

Characterization Of A Weighted Quantile Score Approach For Highly Correlated Data In Risk Analysis Scenarios, Caroline Carrico

Theses and Dissertations

In risk evaluation, the effect of mixtures of environmental chemicals on a common adverse outcome is of interest. However, due to the high dimensionality and inherent correlations among chemicals that occur together, the traditional methods (e.g. ordinary or logistic regression) are unsuitable. We extend and characterize a weighted quantile score (WQS) approach to estimating an index for a set of highly correlated components. In the case with environmental chemicals, we use the WQS to identify “bad actors” and estimate body burden. The accuracy of the WQS was evaluated through extensive simulation studies in terms of validity (ability of the WQS …


Accounting For Model Uncertainty In Linear Mixed-Effects Models, Adam Sima Feb 2013

Accounting For Model Uncertainty In Linear Mixed-Effects Models, Adam Sima

Theses and Dissertations

Standard statistical decision-making tools, such as inference, confidence intervals and forecasting, are contingent on the assumption that the statistical model used in the analysis is the true model. In linear mixed-effect models, ignoring model uncertainty results in an underestimation of the residual variance, contributing to hypothesis tests that demonstrate larger than nominal Type-I errors and confidence intervals with smaller than nominal coverage probabilities. A novel utilization of the generalized degrees of freedom developed by Zhang et al. (2012) is used to adjust the estimate of the residual variance for model uncertainty. Additionally, the general global linear approximation is extended to …


Models And Software Development For Interval-Censored Data, Chun Pan Jan 2013

Models And Software Development For Interval-Censored Data, Chun Pan

Theses and Dissertations

Interval-censored time-to-event data occur naturally in studies of diseases where the symptoms are not directly observable, and periodic clinical examinations are required for detection. Due to the lack of well-established procedures, interval-censored data have been conventionally treated as right-censored data, however, this introduces bias at the first place. This dissertation focuses on methodological research and software development for interval-censored data. Specifically, it consists of three projects. The first project is to create an R package for regression analysis and survival curve estimation of interval-censored data based on several published papers by our research team. In the second project, a Bayesian …


Estimation And Q-Matrix Validation For Diagnostic Classification Models, Yuling Feng Jan 2013

Estimation And Q-Matrix Validation For Diagnostic Classification Models, Yuling Feng

Theses and Dissertations

Diagnostic classification models (DCMs) are structured latent class models widely discussed in the field of psychometrics. They model subjects' underlying attribute patterns and classify subjects into unobservable groups based on their mastery of attributes required to answer the items correctly. The effective implementation of DCMs depends on correct specification of a Q-matrix which is a binary matrix linking attribute patterns to items. Current literature on assessing the appropriateness of Q-matrix specifications has focused on validation methods for the deterministic-input, noisy-and-gate (DINA) model. The goal of the study is to develop general Q-matrix validation methods that can be applied to a …


The Complete Plus-Minus: A Case Study Of The Columbus Blue Jackets, Nathan Spagnola Jan 2013

The Complete Plus-Minus: A Case Study Of The Columbus Blue Jackets, Nathan Spagnola

Theses and Dissertations

A new hockey statistic termed the Complete Plus-Minus (CPM) was created to calculate the abilities of hockey players in the National Hockey League (NHL). This new statistic was used to analyze the Columbus Blue Jackets for the 2011-2012 season. The CPM for the Blue Jackets was created using two logistic regressions that modeled a goal being scored for and against the Blue Jackets. Whether a goal was scored for or against the team were the responses, while events on the ice were the predictors in the model. It was found that the team's poor performance was due to a weak …


Protein Identification Using Bayesian Stochastic Search, Christina Nicole Lewis Jan 2013

Protein Identification Using Bayesian Stochastic Search, Christina Nicole Lewis

Theses and Dissertations

Current methods for protein identification in tandem mass spectrometry (MS/MS) involve database searches or de novo peptide sequencing, with database searches being the standard method. With database searches, issues arise when the species is not in the database. Shortcomings of de novo peptide sequencing and database searches include chemical noise, overly complex fragments, and incomplete b and y ion sequences. Here we present a Bayesian approach to identifying peptides. Our model uses prior information about the average relative abundances of bond cleavages and the prior probability of any particular amino acid sequence. The proposed likelihood function is composed of two …


Heaped Data In Count Models, Tammy Harris Jan 2013

Heaped Data In Count Models, Tammy Harris

Theses and Dissertations

Heaped data result when subjects who recall the frequency of events prefer for reporting from a limited set of rounded responses or preferred digits over reporting exact counts. These rounded responses and digit preferences (also referred to as data coarsening) could be characterized by reported frequencies (or counts) favoring multiples of 20, reporting counts ending with 0 or 5, or a preference for reporting an even number over an odd number or vice versa. This mixture of values is a type of measurement error (pattern of misreporting) that can lead to biased estimation and imprecision in discrete quantitative data. Sometimes …


Advanced Methodology Developments In Mixture Cure Models, Chao Cai Jan 2013

Advanced Methodology Developments In Mixture Cure Models, Chao Cai

Theses and Dissertations

Modern medical treatments have substantially improved cure rates for many chronic diseases and have generated increasing interest in appropriate statistical models to handle survival data with non-negligible cure fractions. The mixture cure models are designed to model such data set, which assume that studied population is a mixture of being cured and uncured. In this dissertation, I will develop two programs named smcure and NPHMC in R. The first program aims to facilitate estimating two popular mixture cure models: the proportional hazards (PH) mixture cure model and accelerated failure time (AFT) mixture cure model. The second program focuses on designing …


A New Method For The Comparison Of Survival Distributions, Jaymie Shanahan Jan 2013

A New Method For The Comparison Of Survival Distributions, Jaymie Shanahan

Theses and Dissertations

The assessment of overall homogeneity of time-to-event curves is a key element in survival analysis in biomedical research. The currently commonly used testing methods, e.g. log-rank test, Wilcoxon test, and Kolmogorov-Smirnov test, may have a significant loss of statistical testing power under certain circumstances. In this thesis we replicate a testing method (Lin & Xu, 2009) that is robust for the comparison of the overall homogeneity of survival curves based on the absolute difference of the area under the survival curves using normal approximation by Greenwood's formula, and propose a new weight component to their test statistic. The weight component …


Permutation Testing For Covariance Matrices, With Applications In Shape Analysis, Blake Cassidy Hill Jan 2013

Permutation Testing For Covariance Matrices, With Applications In Shape Analysis, Blake Cassidy Hill

Theses and Dissertations

In many applications, it is of interest to compare covariance structures. In this work, we propose hypothesis tests for comparing covariance matrices for data in different groups, especially in shape analysis. The main motivation for the work is comparing covariance matrices of the size and shapes of damaged versus undamaged DNA molecules. A practical motivation behind analyzing the differences between these DNA covariance matrices is to compare the variation between the two groups during situations where the molecules are repairing. The testing methods proposed in this dissertation consist of three types of permutation testing methods for differences in covariance structures. …


Modeling Mixed Unfolding/Monotone Dichotomous Item Exams, Na Yang Jan 2013

Modeling Mixed Unfolding/Monotone Dichotomous Item Exams, Na Yang

Theses and Dissertations

Item response theory (IRT) is widely applied to analyze educational and psychological assessments. Readily available IRT implementations allow for two common types of models: monotone models used for dominance scales (Guttman 1950; Rasch 1960/1980; Birnbaum 1968; Mokken 1971) and unfolding models used for proximity scales (Coombs, 1964; Andrich, 1996; Roberts, Donoghue and Laughlin, 2000).

When an exam contains items following both types of models, there is currently no method to distinguish the item types, estimate their characteristics, or estimate the examinee characteristics. Thus, there is no existing methodology to simultaneously analyze items like ``At a minimum, I am in favor …