Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 31

Full-Text Articles in Physical Sciences and Mathematics

Nonparametric Methods For Analysis And Sizing Of Cluster Randomization Trials With Baseline Measurements, Chengchun Yu Sep 2023

Nonparametric Methods For Analysis And Sizing Of Cluster Randomization Trials With Baseline Measurements, Chengchun Yu

Electronic Thesis and Dissertation Repository

Cluster randomization trials are popular in situations where the intervention needs to be implemented at the cluster level, or logistical, financial and/or ethical reason dictates the choice for randomization at the cluster level, or minimization of contamination is needed. It is very common for cluster trials to take measurements before randomization and again at follow-up, resulting in a clustered pretest-posttest design. For continuous outcomes, the cluster-adjusted analysis of covariance approach can be used to adjust for accidental bias and improve efficiency. However, a direct application of this method is nonsensical if the measures are incompatible with an interval scale, yet …


Multiple Endpoints In Randomized Controlled Trials: A Review And An Illustration Of The Global Test, Lindsay Cameron Apr 2023

Multiple Endpoints In Randomized Controlled Trials: A Review And An Illustration Of The Global Test, Lindsay Cameron

Electronic Thesis and Dissertation Repository

A randomized controlled trial is often used to provide high quality evidence regarding treatment interventions. Due to the complex nature of many diseases, trials usually select multiple primary outcomes to capture the efficacy of the interventions. In this thesis, we conducted a literature search to determine the prevalence of the different types of multiple outcomes that have been used in randomized controlled trials. We also reviewed the corresponding statistical methods used to deal with such outcomes. In addition, we described the benefits of using global tests as a statistical method when there are multiple primary outcomes in order to answer …


Regression-Based Methods For Dynamic Treatment Regimes With Mismeasured Covariates Or Misclassified Response, Dan Liu Sep 2022

Regression-Based Methods For Dynamic Treatment Regimes With Mismeasured Covariates Or Misclassified Response, Dan Liu

Electronic Thesis and Dissertation Repository

The statistical study of dynamic treatment regimes (DTRs) focuses on estimating sequential treatment decision rules tailored to patient-level information across multiple stages of intervention. Regression-based methods in DTR have been studied in the literature with a critical assumption that all the observed variables are precisely measured. However, this assumption is often violated in many applications. One example is the STAR*D study, in which the patient's depressive score is subject to measurement error. In this thesis, we explore problems in the context of DTR with measurement error or misclassification considered in the observed data.

The first project deals with covariate measurement …


Flexible Modelling Of Time-Dependent Covariate Effects With Correlated Competing Risks: Application To Hereditary Breast And Ovarian Cancer Families, Seungwoo Lee Apr 2022

Flexible Modelling Of Time-Dependent Covariate Effects With Correlated Competing Risks: Application To Hereditary Breast And Ovarian Cancer Families, Seungwoo Lee

Electronic Thesis and Dissertation Repository

This thesis aims to develop a flexible approach for modelling time-dependent covariate effects on event risk using B-splines in the presence of correlated competing risks. The performance of the proposed model was evaluated via simulation in terms of the bias and precision of the estimation of the parameters and penetrance functions. In addition, we extended the concordance index to account for time-dependent effects and competing events simultaneously and demonstrated its inference procedures. We applied our proposed methods to data rising from the BRCA1 mutation families from the breast cancer family registry to evaluate the time-dependent effects of mammographic screening and …


Addressing Bias In Non-Experimental Studies Assessing Treatment Outcomes In Prostate Cancer, David E. Guy Jun 2021

Addressing Bias In Non-Experimental Studies Assessing Treatment Outcomes In Prostate Cancer, David E. Guy

Electronic Thesis and Dissertation Repository

We evaluated the ability of matching techniques to balance baseline characteristics between treatment groups using non-experimental data. We identified a set of balance diagnostics that assessed key differences in baseline covariates with potential for confounding. These diagnostics were used in a novel systematic approach to developing and evaluating models for use in propensity score matching that optimized balance and data retention. We then compared the performance of propensity score and coarsened exact matching strategies in optimizing balance and data retention, using non-experimental data from a pan-Canadian prostate cancer database. Both matching techniques balanced baseline covariates adequately and retained approximately 70% …


Sample Size Formulas For Estimating Areas Under The Receiver Operating Characteristic Curves With Precision And Assurance, Grace Lu Jun 2021

Sample Size Formulas For Estimating Areas Under The Receiver Operating Characteristic Curves With Precision And Assurance, Grace Lu

Electronic Thesis and Dissertation Repository

The area under the receiver operating characteristic curve (AUC) is commonly used to quantify the discriminative ability of tests with ordinal or continuous test data. When planning a study to evaluate a new test, it is important to determine a minimum sample size required to achieve a prespecified precision of estimating AUC. However, conventional sample size formulas do not consider the probability of achieving a prespecified precision, resulting in underestimation of sample sizes. To incorporate the assurance probability, asymptotic sample size formulas were derived using different variance estimators for AUC in this thesis. The precision of AUC estimations was quantified …


Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue Feb 2021

Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue

Electronic Thesis and Dissertation Repository

Sample size estimation is usually the first step in planning a research study. Too small a study cannot adequately address the objectives, while too large a study may waste resources or unethical. For binary outcomes, several sample size estimation methods are available based on logistic regression models, which focusing on odds ratios. In prospective studies, risk ratios are preferable for ease of interpretation and communication. In this thesis, we compared the power difference between the logistic regression model and the modified Poisson regression model via simulation studies. We then proposed sample size estimation formulas based on the modified Poisson regression …


Classification-Based Method For Estimating Dynamic Treatment Regimes, Junwei Shen Aug 2020

Classification-Based Method For Estimating Dynamic Treatment Regimes, Junwei Shen

Electronic Thesis and Dissertation Repository

Dynamic treatment regimes are sequential decision rules dictating how to individualize treatments to patients based on evolving treatments and covariate history. In this thesis, we investigate two methods of estimating dynamic treatment regimes. The first method extends outcome weighted learning from two-treatments to multi-treatments and allows for negative treatment outcome. We show that under two different sets of assumptions, the Fisher consistency can be maintained. The second method estimates treatment rules by a neural classification tree. A weighted squared loss function is defined to approximate the indicator function to maintain the smoothness. A method of tree reconstruction and pruning is …


Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo Aug 2019

Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo

Electronic Thesis and Dissertation Repository

The research is motivated by the prostate cancer imaging study conducted at the University of Western Ontario to classify cancer status using multiple in-vivo images. The prostate cancer histological image and the in-vivo images are subject to misalignment in the co-registration procedure, which can be viewed as measurement error in covariates or response. We investigate methods to correct this problem.

The first proposed method corrects the predicted class probability when the data has misclassified labels. The correction equation is derived from the relationship between the true response and the error-prone response. The probability for the observed class label is adjusted …


Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu Aug 2019

Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu

Electronic Thesis and Dissertation Repository

Mark-recapture studies are often used to estimate the survival of individuals in a population and identify factors that affect survival in order to understand how the population might be affected by changing conditions. Factors that vary between individuals and over time, like body mass, present a challenge because they can only be observed when an individual is captured. Several models have been proposed to deal with the missing-covariate problem and commonly impose a logit link function which implies that the survival probability varies between 0 and 1. In this thesis I explore the estimability of four possible models when survival …


Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma Nov 2018

Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma

Electronic Thesis and Dissertation Repository

When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a …


Statistical Tools For Assessment Of Spatial Properties Of Mutations Observed Under The Microarray Platform, Bin Luo Sep 2018

Statistical Tools For Assessment Of Spatial Properties Of Mutations Observed Under The Microarray Platform, Bin Luo

Electronic Thesis and Dissertation Repository

Mutations are alterations of the DNA nucleotide sequence of the genome. Analyses of spatial properties of mutations are critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. The studies in this thesis focus on two types of mutations: point mutations, i.e., single nucleotide polymorphism (SNP) genotype differences, and mutations in segments, i.e., copy number variations (CNVs). The microarray platform, such as the Mouse Diversity Genotyping Array (MDGA), detects these mutations genome-wide with lower cost compared to whole genome sequencing, and thus is considered for suitability as a screening tool for large populations. Yet it provides observation …


Analysis Challenges For High Dimensional Data, Bangxin Zhao Apr 2018

Analysis Challenges For High Dimensional Data, Bangxin Zhao

Electronic Thesis and Dissertation Repository

In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.

Two methods …


Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu Apr 2018

Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu

Electronic Thesis and Dissertation Repository

ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs.

This thesis presents a novel …


Data-Adaptive Kernel Support Vector Machine, Xin Liu Nov 2017

Data-Adaptive Kernel Support Vector Machine, Xin Liu

Electronic Thesis and Dissertation Repository

In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges …


On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira Oct 2017

On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira

Electronic Thesis and Dissertation Repository

In family studies, we are interested in estimating the penetrance function of the event of interest in the presence of competing risks. Failure to account for competing risks may lead to bias in the estimation of the penetrance function. In this thesis, three statistical challenges are addressed: clustering, missing data, and competing risks. We proposed the cause-specific model with shared frailty and ascertainment correction to account for clustering and competing risks along with ascertainment of families into study. Multiple imputation is used to account for missing data. The simulation study showed good performance of our proposed model in estimating the …


Confidence Interval Estimation Of Cumulative Incidence For Clustered Competing Risks, Atul Sivaswamy Feb 2017

Confidence Interval Estimation Of Cumulative Incidence For Clustered Competing Risks, Atul Sivaswamy

Electronic Thesis and Dissertation Repository

In a cluster randomized trial studying a primary outcome patients are sometimes exposed to competing events. These are risks that alter the probability of the primary outcome occurring. Traditional methods of estimating the cumulative incidence for an outcome and its associated confidence interval under competing risks do not account for the effect of clustering. This may cause incorrect estimation of confidence intervals because outcomes among patients from the same center are correlated. This thesis compared six nonparametric methods of confidence interval construction for cumulative incidence, four of which account for clustering effect, under competing risks via simulation study. Over the …


Joint Modelling In Liver Transplantation, Elizabeth M. Renouf Jun 2016

Joint Modelling In Liver Transplantation, Elizabeth M. Renouf

Electronic Thesis and Dissertation Repository

In the setting of liver transplantation, clinical trials and transplant registries regularly collect repeated measurements of clinical biomarkers which may be strongly associated with a time-to-event such as graft failure or disease recurrence. Multiple time-to-event outcomes are routinely collected. However, joint models are rarely used. This thesis will describe important considerations for joint modelling in the setting of liver transplantation. We will focus on transplant registry data from the United States. We develop a new tool for joint modelling in the context where a critical health event can be tracked in the longitudinal biomarker and often presents as a non-linear …


A Link Between Paediatric Asthma And Obesity: Are They Caused By The Same Environmental Conditions?, Phylicia Gonsalves May 2016

A Link Between Paediatric Asthma And Obesity: Are They Caused By The Same Environmental Conditions?, Phylicia Gonsalves

Electronic Thesis and Dissertation Repository

The highly associated paediatric conditions of asthma and overweight have seen dramatic increases over the past few decades. This thesis explored air pollution exposure as a potential underlying mechanism of co-morbid asthma and overweight among adolescents aged 12 to 18 years. Data from the Canadian Community Health Survey were merged with a database containing estimates of air pollution as assessed by particulate matter ≤ 2.5 microns (PM2.5) concentrations at the postal code centroid in southwestern Ontario. Logistic regression was used to conduct the analysis. Adolescents were more likely to be overweight as PM2.5 concentrations increased. There was …


On The Estimation Of Intracluster Correlation For Time-To-Event Outcomes In Cluster Randomized Trials, Sumeet Kalia Aug 2015

On The Estimation Of Intracluster Correlation For Time-To-Event Outcomes In Cluster Randomized Trials, Sumeet Kalia

Electronic Thesis and Dissertation Repository

Cluster randomized trials (CRTs) involve the random assignment of intact social units rather than independent subjects to intervention groups. Time-to-event outcomes often are endpoints in CRTs where the intracluster correlation coefficient (ICC) serves as a descriptive parameter to assess the similarity among outcomes in a cluster. However, estimating the ICC in CRTs with time-to-event outcomes is a challenge due to the presence of censored observations. The ICC is estimated for two CRTs using the censoring indicators and observed outcomes.

A simulation study explores the effect of administrative censoring on estimating the ICC. Results show that the ICC estimators derived from …


Healthy And Unhealthy Statistics: Examining The Impact Of Erroneous Statistical Analyses In Health-Related Research, Britney Allen Aug 2015

Healthy And Unhealthy Statistics: Examining The Impact Of Erroneous Statistical Analyses In Health-Related Research, Britney Allen

Electronic Thesis and Dissertation Repository

Sound statistical analyses are essential to the advancement of medicine. Although certainly not always the case, far too many publications are based on weak or inappropriate statistical methodology, leading to questionable results. Statistical reporting guidelines and standards for research are being introduced which should help curb this problem. Wide recognition of the need for statistical methodologies aligned with research questions and study designs, and the impact when this is not the case, would help prevent this problem. In this thesis, I illustrate the consequences of erroneous statistical analyses on data from an observational study on Multiple Sclerosis and I investigate …


Statistical Methods For The Analysis Of Rna Sequencing Data, Man-Kee Maggie Chu Mar 2014

Statistical Methods For The Analysis Of Rna Sequencing Data, Man-Kee Maggie Chu

Electronic Thesis and Dissertation Repository

The next generation sequencing technology, RNA-sequencing (RNA-seq), has an increasing popularity over traditional microarrays in transcriptome analyses. Statistical methods used for gene expression analyses with these two technologies are different because the array-based technology measures intensities using continuous distributions, whereas RNA-seq provides absolute quantification of gene expression using counts of reads. There is a need for reliable statistical methods to exploit the information from the rapidly evolving sequencing technologies and limited work has been done on expression analysis of time-course RNA-seq data. In this dissertation, we propose a model-based clustering method for identifying gene expression patterns in time-course RNA-seq data. …


Flexible Partially Linear Single Index Regression Models For Multivariate Survival Data, Na Lei Dec 2013

Flexible Partially Linear Single Index Regression Models For Multivariate Survival Data, Na Lei

Electronic Thesis and Dissertation Repository

Survival regression models usually assume that covariate effects have a linear form. In many circumstances, however, the assumption of linearity may be violated. The present work addresses this limitation by adding nonlinear covariate effects to survival models. Nonlinear covariates are handled using a single index structure, which allows high-dimensional nonlinear effects to be reduced to a scalar term. The nonlinear single index approach is applied to modeling of survival data with multivariate responses, in three popular models: the proportional hazards (PH) model, the proportional odds (PO) model, and the generalized transformation model. Another extension of the PH and PO model …


A New Diagnostic Test For Regression, Yun Shi Apr 2013

A New Diagnostic Test For Regression, Yun Shi

Electronic Thesis and Dissertation Repository

A new diagnostic test for regression and generalized linear models is discussed. The test is based on testing if the residuals are close together in the linear space of one of the covariates are correlated. This is a generalization of the famous problem of spurious correlation in time series regression. A full model building approach for the case of regression was developed in Mahdi (2011, Ph.D. Thesis, Western University, ”Diagnostic Checking, Time Series and Regression”) using an iterative generalized least squares algorithm. Simulation experiments were reported that demonstrate the validity and utility of this approach but no actual applications were …


Modeling Sequential Event Times Using Family Data, Balakumar Swaminathan Aug 2012

Modeling Sequential Event Times Using Family Data, Balakumar Swaminathan

Electronic Thesis and Dissertation Repository

In genetic epidemiology, families harboring certain genetic mutations are predisposed to successive cancers in their lifetime. This thesis aims to provide reliable estimates of relative risk and age-dependent cumulative risks (penetrance) associated with the mutated gene for successive cancers. We develop a statistical framework for modeling sequential event times arising from family data. A shared frailty model is employed to incorporate the dependence between the two event times. Because families are ascertained through non-random sampling, an ascertainment-corrected retrospective likelihood approach is proposed to account for the non-ignorable sampling design. Simulation studies demonstrate that our proposed method provides unbiased and reliable …


Simultaneous Confidence Intervals For Risk Ratios In The Many-To-One Comparisons Of Proportions, Jungwon Shin Jul 2012

Simultaneous Confidence Intervals For Risk Ratios In The Many-To-One Comparisons Of Proportions, Jungwon Shin

Electronic Thesis and Dissertation Repository

For many-to-one comparisons of independent binomial proportions using their ratios, we propose the MOVER approach generalizing Fieller's theorem to a ratio of proportions by obtaining variance estimates in the neighbourhood of confidence limits for each proportion. We review two existing methods of inverting Wald and score test statistics and compare their performance with the proposed MOVER approach with score limits and Jeffreys limits for single proportions. As an appropriate multiplicity adjustment incorporating correlations between risk ratios, a Dunnett critical value is computed assuming a common, constant correlation of 0.5 instead of plugging in sample correlation coefficients. The simulation results suggest …


Heterogeneity Issues In The Meta-Analysis Of Cluster Randomization Trials., Shun Fu Chen May 2012

Heterogeneity Issues In The Meta-Analysis Of Cluster Randomization Trials., Shun Fu Chen

Electronic Thesis and Dissertation Repository

An increasing number of systematic reviews summarize results from cluster randomization trials. Applying existing meta-analysis methods to such trials is problematic because responses of subjects within clusters are likely correlated. The aim of this thesis is to evaluate heterogeneity in the context of fixed effects models providing guidance for conducting a meta-analysis of such trials. The approaches include the adjusted Q statistic, adjusted heterogeneity variance estimators and their corresponding confidence intervals and adjusted measures of heterogeneity and their corresponding confidence intervals. Attention is limited to meta-analyses of completely randomized trials having a binary outcome. An analytic expression for power of …


Confidence Intervals For Comparison Of The Squared Multiple Correlation Coefficients Of Non-Nested Models, Li Tan Jr. Feb 2012

Confidence Intervals For Comparison Of The Squared Multiple Correlation Coefficients Of Non-Nested Models, Li Tan Jr.

Electronic Thesis and Dissertation Repository

Multiple linear regression analysis is used widely to evaluate how an outcome or responsevariable is related to a set of predictors. Once a final model is specified, the interpretation of predictors can be achieved by assessing the relative importance of predictors.

A common approach to predictor importance is to compare the increase in squared multiple correlation for a given model when one predictor is added to the increase when another predictor is added to the same model.

This thesis proposes asymmetric confidence-intervals for a difference between two correlated squared multiple correlation coefficients of non-nested models. These new proceduresare developed by …


Confidence Interval Estimation For Continuous Outcomes In Cluster Randomization Trials, Julia Taleban Apr 2011

Confidence Interval Estimation For Continuous Outcomes In Cluster Randomization Trials, Julia Taleban

Electronic Thesis and Dissertation Repository

Cluster randomization trials are experiments where intact social units (e.g. hospitals, schools, communities, and families) are randomized to the arms of the trial rather than individuals. The popularity of this design among health researchers is partially due to reduced contamination of treatment effects and convenience. However, the advantages of cluster randomization trials come with a price. Due to the dependence of individuals within a cluster, cluster randomization trials suffer reduced statistical efficiency and often require a complex analysis of study outcomes.

The primary purpose of this thesis is to propose new confidence intervals for effect measures commonly of interest for …


Cost-Efficient Variable Selection Using Branching Lars, Li Hua Yue Nov 2010

Cost-Efficient Variable Selection Using Branching Lars, Li Hua Yue

Electronic Thesis and Dissertation Repository

Variable selection is a difficult problem in statistical model building. Identification of cost efficient diagnostic factors is very important to health researchers, but most variable selection methods do not take into account the cost of collecting data for the predictors. The trade off between statistical significance and cost of collecting data for the statistical model is our focus. A Branching LARS (BLARS) procedure has been developed that can select and estimate the important predictors to build a model not only good at prediction but also cost efficient. BLARS method is an extension of the LARS variable selection method to incorporate …