Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Measurement error (3)
- Cluster randomization trials (2)
- Confidence intervals (2)
- Dynamic treatment regimes (2)
- Misclassification (2)
-
- Missing data (2)
- Power (2)
- Sample size (2)
- Accelerated failure time model (1)
- Adolescents (1)
- Air pollution (1)
- Ascertainment (1)
- Assurance probability (1)
- Asthma (1)
- B-spline (1)
- BLARS (1)
- Bayesian inference (1)
- Bias assessment (1)
- Bias reduction (1)
- Binary data (1)
- Binary outcome (1)
- Binding Sites (1)
- Bonferroni (1)
- Branch and bound (1)
- Breast and ovarian cancers (1)
- Brier score (1)
- Canadian Community Health Survey (1)
- Change-point (1)
- Classification (1)
- Classification; Data-adaptive kernel SVMs; Imaging data; Multi-class classifier; Predictive Model; Support vector machine. (1)
Articles 1 - 30 of 31
Full-Text Articles in Physical Sciences and Mathematics
Nonparametric Methods For Analysis And Sizing Of Cluster Randomization Trials With Baseline Measurements, Chengchun Yu
Nonparametric Methods For Analysis And Sizing Of Cluster Randomization Trials With Baseline Measurements, Chengchun Yu
Electronic Thesis and Dissertation Repository
Cluster randomization trials are popular in situations where the intervention needs to be implemented at the cluster level, or logistical, financial and/or ethical reason dictates the choice for randomization at the cluster level, or minimization of contamination is needed. It is very common for cluster trials to take measurements before randomization and again at follow-up, resulting in a clustered pretest-posttest design. For continuous outcomes, the cluster-adjusted analysis of covariance approach can be used to adjust for accidental bias and improve efficiency. However, a direct application of this method is nonsensical if the measures are incompatible with an interval scale, yet …
Multiple Endpoints In Randomized Controlled Trials: A Review And An Illustration Of The Global Test, Lindsay Cameron
Multiple Endpoints In Randomized Controlled Trials: A Review And An Illustration Of The Global Test, Lindsay Cameron
Electronic Thesis and Dissertation Repository
A randomized controlled trial is often used to provide high quality evidence regarding treatment interventions. Due to the complex nature of many diseases, trials usually select multiple primary outcomes to capture the efficacy of the interventions. In this thesis, we conducted a literature search to determine the prevalence of the different types of multiple outcomes that have been used in randomized controlled trials. We also reviewed the corresponding statistical methods used to deal with such outcomes. In addition, we described the benefits of using global tests as a statistical method when there are multiple primary outcomes in order to answer …
Regression-Based Methods For Dynamic Treatment Regimes With Mismeasured Covariates Or Misclassified Response, Dan Liu
Electronic Thesis and Dissertation Repository
The statistical study of dynamic treatment regimes (DTRs) focuses on estimating sequential treatment decision rules tailored to patient-level information across multiple stages of intervention. Regression-based methods in DTR have been studied in the literature with a critical assumption that all the observed variables are precisely measured. However, this assumption is often violated in many applications. One example is the STAR*D study, in which the patient's depressive score is subject to measurement error. In this thesis, we explore problems in the context of DTR with measurement error or misclassification considered in the observed data.
The first project deals with covariate measurement …
Flexible Modelling Of Time-Dependent Covariate Effects With Correlated Competing Risks: Application To Hereditary Breast And Ovarian Cancer Families, Seungwoo Lee
Electronic Thesis and Dissertation Repository
This thesis aims to develop a flexible approach for modelling time-dependent covariate effects on event risk using B-splines in the presence of correlated competing risks. The performance of the proposed model was evaluated via simulation in terms of the bias and precision of the estimation of the parameters and penetrance functions. In addition, we extended the concordance index to account for time-dependent effects and competing events simultaneously and demonstrated its inference procedures. We applied our proposed methods to data rising from the BRCA1 mutation families from the breast cancer family registry to evaluate the time-dependent effects of mammographic screening and …
Addressing Bias In Non-Experimental Studies Assessing Treatment Outcomes In Prostate Cancer, David E. Guy
Addressing Bias In Non-Experimental Studies Assessing Treatment Outcomes In Prostate Cancer, David E. Guy
Electronic Thesis and Dissertation Repository
We evaluated the ability of matching techniques to balance baseline characteristics between treatment groups using non-experimental data. We identified a set of balance diagnostics that assessed key differences in baseline covariates with potential for confounding. These diagnostics were used in a novel systematic approach to developing and evaluating models for use in propensity score matching that optimized balance and data retention. We then compared the performance of propensity score and coarsened exact matching strategies in optimizing balance and data retention, using non-experimental data from a pan-Canadian prostate cancer database. Both matching techniques balanced baseline covariates adequately and retained approximately 70% …
Sample Size Formulas For Estimating Areas Under The Receiver Operating Characteristic Curves With Precision And Assurance, Grace Lu
Electronic Thesis and Dissertation Repository
The area under the receiver operating characteristic curve (AUC) is commonly used to quantify the discriminative ability of tests with ordinal or continuous test data. When planning a study to evaluate a new test, it is important to determine a minimum sample size required to achieve a prespecified precision of estimating AUC. However, conventional sample size formulas do not consider the probability of achieving a prespecified precision, resulting in underestimation of sample sizes. To incorporate the assurance probability, asymptotic sample size formulas were derived using different variance estimators for AUC in this thesis. The precision of AUC estimations was quantified …
Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue
Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue
Electronic Thesis and Dissertation Repository
Sample size estimation is usually the first step in planning a research study. Too small a study cannot adequately address the objectives, while too large a study may waste resources or unethical. For binary outcomes, several sample size estimation methods are available based on logistic regression models, which focusing on odds ratios. In prospective studies, risk ratios are preferable for ease of interpretation and communication. In this thesis, we compared the power difference between the logistic regression model and the modified Poisson regression model via simulation studies. We then proposed sample size estimation formulas based on the modified Poisson regression …
Classification-Based Method For Estimating Dynamic Treatment Regimes, Junwei Shen
Classification-Based Method For Estimating Dynamic Treatment Regimes, Junwei Shen
Electronic Thesis and Dissertation Repository
Dynamic treatment regimes are sequential decision rules dictating how to individualize treatments to patients based on evolving treatments and covariate history. In this thesis, we investigate two methods of estimating dynamic treatment regimes. The first method extends outcome weighted learning from two-treatments to multi-treatments and allows for negative treatment outcome. We show that under two different sets of assumptions, the Fisher consistency can be maintained. The second method estimates treatment rules by a neural classification tree. A weighted squared loss function is defined to approximate the indicator function to maintain the smoothness. A method of tree reconstruction and pruning is …
Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo
Electronic Thesis and Dissertation Repository
The research is motivated by the prostate cancer imaging study conducted at the University of Western Ontario to classify cancer status using multiple in-vivo images. The prostate cancer histological image and the in-vivo images are subject to misalignment in the co-registration procedure, which can be viewed as measurement error in covariates or response. We investigate methods to correct this problem.
The first proposed method corrects the predicted class probability when the data has misclassified labels. The correction equation is derived from the relationship between the true response and the error-prone response. The probability for the observed class label is adjusted …
Exploring The Estimability Of Mark-Recapture Models With Individual, Time-Varying Covariates Using The Scaled Logit Link Function, Jiaqi Mu
Electronic Thesis and Dissertation Repository
Mark-recapture studies are often used to estimate the survival of individuals in a population and identify factors that affect survival in order to understand how the population might be affected by changing conditions. Factors that vary between individuals and over time, like body mass, present a challenge because they can only be observed when an individual is captured. Several models have been proposed to deal with the missing-covariate problem and commonly impose a logit link function which implies that the survival probability varies between 0 and 1. In this thesis I explore the estimability of four possible models when survival …
Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma
Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma
Electronic Thesis and Dissertation Repository
When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a …
Statistical Tools For Assessment Of Spatial Properties Of Mutations Observed Under The Microarray Platform, Bin Luo
Electronic Thesis and Dissertation Repository
Mutations are alterations of the DNA nucleotide sequence of the genome. Analyses of spatial properties of mutations are critical for understanding certain mutational mechanisms relevant to genetic disease, diversity, and evolution. The studies in this thesis focus on two types of mutations: point mutations, i.e., single nucleotide polymorphism (SNP) genotype differences, and mutations in segments, i.e., copy number variations (CNVs). The microarray platform, such as the Mouse Diversity Genotyping Array (MDGA), detects these mutations genome-wide with lower cost compared to whole genome sequencing, and thus is considered for suitability as a screening tool for large populations. Yet it provides observation …
Analysis Challenges For High Dimensional Data, Bangxin Zhao
Analysis Challenges For High Dimensional Data, Bangxin Zhao
Electronic Thesis and Dissertation Repository
In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.
Two methods …
Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu
Computational Modelling Of Human Transcriptional Regulation By An Information Theory-Based Approach, Ruipeng Lu
Electronic Thesis and Dissertation Repository
ChIP-seq experiments can identify the genome-wide binding site motifs of a transcription factor (TF) and determine its sequence specificity. Multiple algorithms were developed to derive TF binding site (TFBS) motifs from ChIP-seq data, including the entropy minimization-based Bipad that can derive both contiguous and bipartite motifs. Prior studies applying these algorithms to ChIP-seq data only analyzed a small number of top peaks with the highest signal strengths, biasing their resultant position weight matrices (PWMs) towards consensus-like, strong binding sites; nor did they derive bipartite motifs, disabling the accurate modelling of binding behavior of dimeric TFs.
This thesis presents a novel …
Data-Adaptive Kernel Support Vector Machine, Xin Liu
Data-Adaptive Kernel Support Vector Machine, Xin Liu
Electronic Thesis and Dissertation Repository
In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges …
On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira
On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira
Electronic Thesis and Dissertation Repository
In family studies, we are interested in estimating the penetrance function of the event of interest in the presence of competing risks. Failure to account for competing risks may lead to bias in the estimation of the penetrance function. In this thesis, three statistical challenges are addressed: clustering, missing data, and competing risks. We proposed the cause-specific model with shared frailty and ascertainment correction to account for clustering and competing risks along with ascertainment of families into study. Multiple imputation is used to account for missing data. The simulation study showed good performance of our proposed model in estimating the …
Confidence Interval Estimation Of Cumulative Incidence For Clustered Competing Risks, Atul Sivaswamy
Confidence Interval Estimation Of Cumulative Incidence For Clustered Competing Risks, Atul Sivaswamy
Electronic Thesis and Dissertation Repository
In a cluster randomized trial studying a primary outcome patients are sometimes exposed to competing events. These are risks that alter the probability of the primary outcome occurring. Traditional methods of estimating the cumulative incidence for an outcome and its associated confidence interval under competing risks do not account for the effect of clustering. This may cause incorrect estimation of confidence intervals because outcomes among patients from the same center are correlated. This thesis compared six nonparametric methods of confidence interval construction for cumulative incidence, four of which account for clustering effect, under competing risks via simulation study. Over the …
Joint Modelling In Liver Transplantation, Elizabeth M. Renouf
Joint Modelling In Liver Transplantation, Elizabeth M. Renouf
Electronic Thesis and Dissertation Repository
In the setting of liver transplantation, clinical trials and transplant registries regularly collect repeated measurements of clinical biomarkers which may be strongly associated with a time-to-event such as graft failure or disease recurrence. Multiple time-to-event outcomes are routinely collected. However, joint models are rarely used. This thesis will describe important considerations for joint modelling in the setting of liver transplantation. We will focus on transplant registry data from the United States. We develop a new tool for joint modelling in the context where a critical health event can be tracked in the longitudinal biomarker and often presents as a non-linear …
A Link Between Paediatric Asthma And Obesity: Are They Caused By The Same Environmental Conditions?, Phylicia Gonsalves
A Link Between Paediatric Asthma And Obesity: Are They Caused By The Same Environmental Conditions?, Phylicia Gonsalves
Electronic Thesis and Dissertation Repository
The highly associated paediatric conditions of asthma and overweight have seen dramatic increases over the past few decades. This thesis explored air pollution exposure as a potential underlying mechanism of co-morbid asthma and overweight among adolescents aged 12 to 18 years. Data from the Canadian Community Health Survey were merged with a database containing estimates of air pollution as assessed by particulate matter ≤ 2.5 microns (PM2.5) concentrations at the postal code centroid in southwestern Ontario. Logistic regression was used to conduct the analysis. Adolescents were more likely to be overweight as PM2.5 concentrations increased. There was …
On The Estimation Of Intracluster Correlation For Time-To-Event Outcomes In Cluster Randomized Trials, Sumeet Kalia
On The Estimation Of Intracluster Correlation For Time-To-Event Outcomes In Cluster Randomized Trials, Sumeet Kalia
Electronic Thesis and Dissertation Repository
Cluster randomized trials (CRTs) involve the random assignment of intact social units rather than independent subjects to intervention groups. Time-to-event outcomes often are endpoints in CRTs where the intracluster correlation coefficient (ICC) serves as a descriptive parameter to assess the similarity among outcomes in a cluster. However, estimating the ICC in CRTs with time-to-event outcomes is a challenge due to the presence of censored observations. The ICC is estimated for two CRTs using the censoring indicators and observed outcomes.
A simulation study explores the effect of administrative censoring on estimating the ICC. Results show that the ICC estimators derived from …
Healthy And Unhealthy Statistics: Examining The Impact Of Erroneous Statistical Analyses In Health-Related Research, Britney Allen
Healthy And Unhealthy Statistics: Examining The Impact Of Erroneous Statistical Analyses In Health-Related Research, Britney Allen
Electronic Thesis and Dissertation Repository
Sound statistical analyses are essential to the advancement of medicine. Although certainly not always the case, far too many publications are based on weak or inappropriate statistical methodology, leading to questionable results. Statistical reporting guidelines and standards for research are being introduced which should help curb this problem. Wide recognition of the need for statistical methodologies aligned with research questions and study designs, and the impact when this is not the case, would help prevent this problem. In this thesis, I illustrate the consequences of erroneous statistical analyses on data from an observational study on Multiple Sclerosis and I investigate …
Statistical Methods For The Analysis Of Rna Sequencing Data, Man-Kee Maggie Chu
Statistical Methods For The Analysis Of Rna Sequencing Data, Man-Kee Maggie Chu
Electronic Thesis and Dissertation Repository
The next generation sequencing technology, RNA-sequencing (RNA-seq), has an increasing popularity over traditional microarrays in transcriptome analyses. Statistical methods used for gene expression analyses with these two technologies are different because the array-based technology measures intensities using continuous distributions, whereas RNA-seq provides absolute quantification of gene expression using counts of reads. There is a need for reliable statistical methods to exploit the information from the rapidly evolving sequencing technologies and limited work has been done on expression analysis of time-course RNA-seq data. In this dissertation, we propose a model-based clustering method for identifying gene expression patterns in time-course RNA-seq data. …
Flexible Partially Linear Single Index Regression Models For Multivariate Survival Data, Na Lei
Flexible Partially Linear Single Index Regression Models For Multivariate Survival Data, Na Lei
Electronic Thesis and Dissertation Repository
Survival regression models usually assume that covariate effects have a linear form. In many circumstances, however, the assumption of linearity may be violated. The present work addresses this limitation by adding nonlinear covariate effects to survival models. Nonlinear covariates are handled using a single index structure, which allows high-dimensional nonlinear effects to be reduced to a scalar term. The nonlinear single index approach is applied to modeling of survival data with multivariate responses, in three popular models: the proportional hazards (PH) model, the proportional odds (PO) model, and the generalized transformation model. Another extension of the PH and PO model …
A New Diagnostic Test For Regression, Yun Shi
A New Diagnostic Test For Regression, Yun Shi
Electronic Thesis and Dissertation Repository
A new diagnostic test for regression and generalized linear models is discussed. The test is based on testing if the residuals are close together in the linear space of one of the covariates are correlated. This is a generalization of the famous problem of spurious correlation in time series regression. A full model building approach for the case of regression was developed in Mahdi (2011, Ph.D. Thesis, Western University, ”Diagnostic Checking, Time Series and Regression”) using an iterative generalized least squares algorithm. Simulation experiments were reported that demonstrate the validity and utility of this approach but no actual applications were …
Modeling Sequential Event Times Using Family Data, Balakumar Swaminathan
Modeling Sequential Event Times Using Family Data, Balakumar Swaminathan
Electronic Thesis and Dissertation Repository
In genetic epidemiology, families harboring certain genetic mutations are predisposed to successive cancers in their lifetime. This thesis aims to provide reliable estimates of relative risk and age-dependent cumulative risks (penetrance) associated with the mutated gene for successive cancers. We develop a statistical framework for modeling sequential event times arising from family data. A shared frailty model is employed to incorporate the dependence between the two event times. Because families are ascertained through non-random sampling, an ascertainment-corrected retrospective likelihood approach is proposed to account for the non-ignorable sampling design. Simulation studies demonstrate that our proposed method provides unbiased and reliable …
Simultaneous Confidence Intervals For Risk Ratios In The Many-To-One Comparisons Of Proportions, Jungwon Shin
Simultaneous Confidence Intervals For Risk Ratios In The Many-To-One Comparisons Of Proportions, Jungwon Shin
Electronic Thesis and Dissertation Repository
For many-to-one comparisons of independent binomial proportions using their ratios, we propose the MOVER approach generalizing Fieller's theorem to a ratio of proportions by obtaining variance estimates in the neighbourhood of confidence limits for each proportion. We review two existing methods of inverting Wald and score test statistics and compare their performance with the proposed MOVER approach with score limits and Jeffreys limits for single proportions. As an appropriate multiplicity adjustment incorporating correlations between risk ratios, a Dunnett critical value is computed assuming a common, constant correlation of 0.5 instead of plugging in sample correlation coefficients. The simulation results suggest …
Heterogeneity Issues In The Meta-Analysis Of Cluster Randomization Trials., Shun Fu Chen
Heterogeneity Issues In The Meta-Analysis Of Cluster Randomization Trials., Shun Fu Chen
Electronic Thesis and Dissertation Repository
An increasing number of systematic reviews summarize results from cluster randomization trials. Applying existing meta-analysis methods to such trials is problematic because responses of subjects within clusters are likely correlated. The aim of this thesis is to evaluate heterogeneity in the context of fixed effects models providing guidance for conducting a meta-analysis of such trials. The approaches include the adjusted Q statistic, adjusted heterogeneity variance estimators and their corresponding confidence intervals and adjusted measures of heterogeneity and their corresponding confidence intervals. Attention is limited to meta-analyses of completely randomized trials having a binary outcome. An analytic expression for power of …
Confidence Intervals For Comparison Of The Squared Multiple Correlation Coefficients Of Non-Nested Models, Li Tan Jr.
Confidence Intervals For Comparison Of The Squared Multiple Correlation Coefficients Of Non-Nested Models, Li Tan Jr.
Electronic Thesis and Dissertation Repository
Multiple linear regression analysis is used widely to evaluate how an outcome or responsevariable is related to a set of predictors. Once a final model is specified, the interpretation of predictors can be achieved by assessing the relative importance of predictors.
A common approach to predictor importance is to compare the increase in squared multiple correlation for a given model when one predictor is added to the increase when another predictor is added to the same model.
This thesis proposes asymmetric confidence-intervals for a difference between two correlated squared multiple correlation coefficients of non-nested models. These new proceduresare developed by …
Confidence Interval Estimation For Continuous Outcomes In Cluster Randomization Trials, Julia Taleban
Confidence Interval Estimation For Continuous Outcomes In Cluster Randomization Trials, Julia Taleban
Electronic Thesis and Dissertation Repository
Cluster randomization trials are experiments where intact social units (e.g. hospitals, schools, communities, and families) are randomized to the arms of the trial rather than individuals. The popularity of this design among health researchers is partially due to reduced contamination of treatment effects and convenience. However, the advantages of cluster randomization trials come with a price. Due to the dependence of individuals within a cluster, cluster randomization trials suffer reduced statistical efficiency and often require a complex analysis of study outcomes.
The primary purpose of this thesis is to propose new confidence intervals for effect measures commonly of interest for …
Cost-Efficient Variable Selection Using Branching Lars, Li Hua Yue
Cost-Efficient Variable Selection Using Branching Lars, Li Hua Yue
Electronic Thesis and Dissertation Repository
Variable selection is a difficult problem in statistical model building. Identification of cost efficient diagnostic factors is very important to health researchers, but most variable selection methods do not take into account the cost of collecting data for the predictors. The trade off between statistical significance and cost of collecting data for the statistical model is our focus. A Branching LARS (BLARS) procedure has been developed that can select and estimate the important predictors to build a model not only good at prediction but also cost efficient. BLARS method is an extension of the LARS variable selection method to incorporate …