Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Physical Sciences and Mathematics (25)
- Statistics and Probability (25)
- Statistical Methodology (14)
- Statistical Models (11)
- Biostatistics (10)
-
- Applied Statistics (9)
- Survival Analysis (7)
- Longitudinal Data Analysis and Time Series (4)
- Categorical Data Analysis (3)
- Statistical Theory (3)
- Clinical Trials (2)
- Computer Sciences (2)
- Data Science (2)
- Medical Sciences (2)
- Medicine and Health Sciences (2)
- Other Statistics and Probability (2)
- Artificial Intelligence and Robotics (1)
- Bioinformatics (1)
- Design of Experiments and Sample Surveys (1)
- Discrete Mathematics and Combinatorics (1)
- Genetic Phenomena (1)
- Life Sciences (1)
- Mathematics (1)
- Medical Biomathematics and Biometrics (1)
- Microarrays (1)
- Multivariate Analysis (1)
- Numerical Analysis and Scientific Computing (1)
- Probability (1)
- Theory and Algorithms (1)
- Vital and Health Statistics (1)
Articles 1 - 30 of 38
Full-Text Articles in Entire DC Network
Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang
Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang
Statistical Science Theses and Dissertations
In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.
The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …
Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang
Bayesian Statistical Modeling Of Spatially Resolved Transcriptomics Data, Xi Jiang
Statistical Science Theses and Dissertations
Spatially resolved transcriptomics (SRT) quantifies expression levels at different spatial locations, providing a new and powerful tool to investigate novel biological insights. As experimental technologies enhance both in capacity and efficiency, there arises a growing demand for the development of analytical methodologies.
One question in SRT data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process, which could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types …
A Comparison Of Confidence Intervals In State Space Models, Jinyu Du
A Comparison Of Confidence Intervals In State Space Models, Jinyu Du
Statistical Science Theses and Dissertations
This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …
Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu
Optimal Experimental Planning Of Reliability Experiments Based On Coherent Systems, Yang Yu
Statistical Science Theses and Dissertations
In industrial engineering and manufacturing, assessing the reliability of a product or system is an important topic. Life-testing and reliability experiments are commonly used reliability assessment methods to gain sound knowledge about product or system lifetime distributions. Usually, a sample of items of interest is subjected to stresses and environmental conditions that characterize the normal operating conditions. During the life-test, successive times to failure are recorded and lifetime data are collected. Life-testing is useful in many industrial environments, including the automobile, materials, telecommunications, and electronics industries.
There are different kinds of life-testing experiments that can be applied for different purposes. …
Contributions To Causal Inference In Observational Studies, Jenny Park, Daniel F. Heitjan, Christy Boling Turer
Contributions To Causal Inference In Observational Studies, Jenny Park, Daniel F. Heitjan, Christy Boling Turer
Statistical Science Theses and Dissertations
The electronic health record (EHR) is a digital version of the patient chart. All clinically relevant patient information can be accessed from the EHR by professionals involved in the patient’s care. For researchers, the EHR is a rich, convenient source for data to address a vast range of medical research questions.
In observational studies with EHR data, it is common to define the treatment/exposure status as a binary indicator reflecting whether patient was documented to receive a particular medication or procedure. The outcome can be any type of information on patient status documented in the EHR after the treatment has …
Empirical Likelihood Ratio Tests For Homogeneity Of Distributions Of Component Lifetimes From System Lifetime Data With Known System Structures, Jingjing Qu
Statistical Science Theses and Dissertations
In system reliability, practitioners may be interested in testing the homogeneity of the component lifetime distributions based on system lifetimes from multiple data sources for various reasons, such as identifying the component supplier that provides the most reliable components.
In the first part of the dissertation, we develop distribution-free hypothesis testing procedures for the homogeneity of the component lifetime distributions based on system lifetime data when the system structures are known. Several nonparametric testing statistics based on the empirical likelihood method are proposed for testing the homogeneity of two or more component lifetime distributions. The computational approaches to obtain the …
Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth
Development Of Bayesian Hierarchical Methods Involving Meta-Analysis, Jackson Barth
Statistical Science Theses and Dissertations
When conducting statistical analysis in the Bayesian paradigm, the most critical decision made by the researcher is the identification of a prior distribution for a parameter. Despite the mathematical soundness of the Bayesian approach, a wrongly specified prior can lead to biased and incorrect results. To avoid this, prior distributions should be based on real data, which are easily accessible in the "big data" era. This dissertation explores two applications of Bayesian hierarchical modelling that incorporate information obtained from a meta-analysis.
The first of these applications is in the normalization of genomics data, specifically for nanostring nCounter datasets. A meta-analysis …
Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile
Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile
Statistical Science Theses and Dissertations
Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …
Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez
Influence Diagnostics For Generalized Estimating Equations Applied To Correlated Categorical Data, Louis Vazquez
Statistical Science Theses and Dissertations
Influence diagnostics in regression analysis allow analysts to identify observations that have a strong influence on model fitted probabilities and parameter estimates. The most common influence diagnostics, such as Cook’s Distance for linear regression, are based on a deletion approach where the results of a model with and without observations of interest are compared. Here, deletion-based influence diagnostics are proposed for generalized estimating equations (GEE) for correlated, or clustered, nominal multinomial responses. The proposed influence diagnostics focus on GEEs with the baseline-category logit link function and a local odds ratio parameterization of the association structure. Formulas for both observation- and …
Bayesian Methods For Random-Effects Meta-Analysis Of Rare Binary Events In Biomedical Research, Ming Zhang
Bayesian Methods For Random-Effects Meta-Analysis Of Rare Binary Events In Biomedical Research, Ming Zhang
Statistical Science Theses and Dissertations
Rare binary events data arise frequently in medical research. Due to lack of statistical power in individual studies involving such data, meta-analysis has become an increasingly important tool for combining results from multiple independent studies. However, traditional meta-analysis methods often report severely biased estimates in such rare-event settings. Moreover, many rely on models assuming a pre-specified direction for variability between control and treatment groups for mathematical convenience, which may be violated in practice. In Chapter 1, based on a flexible random-effects model that removes the assumption about the direction, we propose new Bayesian procedures for estimating and testing the overall …
Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong
Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong
Statistical Science Theses and Dissertations
The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …
Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun
Dynamic Prediction For Alternating Recurrent Events Using A Semiparametric Joint Frailty Model, Jaehyeon Yun
Statistical Science Theses and Dissertations
Alternating recurrent events data arise commonly in health research; examples include hospital admissions and discharges of diabetes patients; exacerbations and remissions of chronic bronchitis; and quitting and restarting smoking. Recent work has involved formulating and estimating joint models for the recurrent event times considering non-negligible event durations. However, prediction models for transition between recurrent events are lacking. We consider the development and evaluation of methods for predicting future events within these models. Specifically, we propose a tool for dynamically predicting transition between alternating recurrent events in real time. Under a flexible joint frailty model, we derive the predictive probability of …
Compositional Datasets And The Nested Dirichlet Distribution, Bianca Luedeker
Compositional Datasets And The Nested Dirichlet Distribution, Bianca Luedeker
Statistical Science Theses and Dissertations
Compositional data is a type of multivariate data where each component of a vector is sandwiched between 0 and 1 and the sum of the components is 1. For example, the proportion of time that each of 7 mice spend in one of four quadrants of a circular water maze is between 0 and 1, and the total proportion of time spent in the maze is 1. If there are two sets of mice, one set of normal mice and one set of cognitively impaired mice, the experiment has a two-sample design. Such data is frequently analyzed incorrectly by comparing …
Differential Methods In Modern Biological Data Analysis, Micah Thornton
Differential Methods In Modern Biological Data Analysis, Micah Thornton
Statistical Science Theses and Dissertations
Analysis of biological data for differentiation of organisms/cells within and across species or even the same organism is important to a wide variety of applications. This work considers three different biological data sets at the genome, proteome, and epigenome levels: respectively, DNA sequences, glycosalation data, and DNA methylation. We explore some statistical modeling approaches for handling these modern datasets, and provide a relevant set of experiments for explanation and illustration.
First, genomic Fourier coefficients, which capture information about the harmonics of genetic sequences in terms of nucleotide pattern recurrence are investigated as summary metrics for medium sized virus genomes from …
Exact Inference For Meta-Analysis Of Rare Events And Its Application In Human Genetics, Yanqiu Shao
Exact Inference For Meta-Analysis Of Rare Events And Its Application In Human Genetics, Yanqiu Shao
Statistical Science Theses and Dissertations
Meta-analysis is a statistical approach that integrates data from multiple studies. By aggregating information, it enhances the power to detect the effects of interest and provides an estimate of the effect size with both accuracy and precision. Both fixed-effect and random-effect models are developed and widely used in biomedical research including clinical trials and genomic studies. In the case of rare events data, conventional meta-analysis methods that rely on large sample approximation may not be able to make reliable inferences. There have been various approaches proposed to deal with this situation, in particular, rare binary adverse events in clinical studies. …
Ultra-High Dimensional Bayesian Variable Selection With Lasso-Type Priors, Can Xu
Ultra-High Dimensional Bayesian Variable Selection With Lasso-Type Priors, Can Xu
Statistical Science Theses and Dissertations
With the rapid development of new data collection and acquisition techniques, high-dimensional data have emerged from various fields. Consequentially, new variable selection methods especially in ultra-high dimensional problems are demanding.
The first part of this dissertation focuses on developing a new Bayesian variable selection method for a differential expression analysis using raw NanoString nCounter data. The medium-throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter …
Modified Degradation Process Models And Statistical Methods For Assessing Robustness And Reliability Of Complex Networks, Yuzhou Chen
Statistical Science Theses and Dissertations
In this thesis, we develop a novel stochastic modeling approach based on multiple interdependent topological measures of complex networks. The key engine behind our approach is to evaluate the dynamics of multiple network motifs as descriptors of the underlying network topology. Under a framework of the gamma degradation model, we develop a formal statistical framework for the analysis of reliability and robustness of a single complex network as well as for assessing differences in reliability properties exhibited by two different networks. We validate the proposed methodology with Monte Carlo simulation studies and illustrate the utility of the proposed approach by …
Bayesian Statistical Modeling Of Metagenomics Sequencing Data, Shuang Jiang
Bayesian Statistical Modeling Of Metagenomics Sequencing Data, Shuang Jiang
Statistical Science Theses and Dissertations
Microbiome count data are high-dimensional and usually suffer from uneven sampling depth, over-dispersion, and zero-inflation. In this thesis, we develop specialized analytical models for analyzing such count data. In Chapter 2, I develop a bi-level Bayesian hierarchical framework for microbiome differential abundance analysis. The bottom level is a multivariate count-generating process that links the observed counts to their latent normalized abundances. The top level is a mixture of Gaussian distributions with a feature selection scheme for differential abundance analysis. A simulation study on both simulated and synthetic data is conducted. A colorectal cancer case study demonstrates that a resulting diagnostic …
Estimation Of Parameters Of Gamma And Generalized Gamma Distributions Based On Censored Experimental Data, Xiangwen Shang
Estimation Of Parameters Of Gamma And Generalized Gamma Distributions Based On Censored Experimental Data, Xiangwen Shang
Statistical Science Theses and Dissertations
In time-to-event data analysis, censoring is one of the unique features that restricts our ability to observe the time-to-events and poses difficulties for statistical analysis. Censoring occurs when the exact time-to-event cannot be observed for some or all observations. In this thesis, we study the parameter estimation methods for a two-parameter gamma distribution and a three-parameter generalized gamma distribution based on different kinds of censored data arising from life-testing experiments.
We first study the parameter estimation of a three-parameter generalized gamma distribution based on left-truncated and right-censored data. It is well known that the maximum likelihood estimates of the parameters …
Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang
Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang
Statistical Science Theses and Dissertations
This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.
In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …
Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha
Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha
Statistical Science Theses and Dissertations
Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Methods such as video transects for estimating red snapper abundance and aerial surveys for estimating moose population sizes are highly affected by these problems since total abundance will be underestimated if missing/mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts are observed when visibility is high, partially observed when visibility is low …
Integrating Different Data Sources For Estimation Of Total With Unknown Population Size, Zhaoce Liu
Integrating Different Data Sources For Estimation Of Total With Unknown Population Size, Zhaoce Liu
Statistical Science Theses and Dissertations
Probability sampling has served as the gold-standard in survey practice for many decades. However, as many new data collection methods become available, it is possible to improve the quality and efficiency of traditional survey practices by integrating different sample sources. Web-based surveys from the so-called opt-in panels are one type of nonprobability sample that becoming popular these years. They often come with large sample sizes to yield efficient estimates, but selection bias may compromise the generalizability of results to the broader population.
Our motivating example is a survey conducted by the National Marine Fisheries Service (NMFS), which collects data to …
Statistical Modeling Of High-Throughput Sequencing Data And Spatially Resolved Transcriptomic Data, Shen Yin
Statistical Modeling Of High-Throughput Sequencing Data And Spatially Resolved Transcriptomic Data, Shen Yin
Statistical Science Theses and Dissertations
Recent studies have shown that RNA sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from Formalin-Fixed Paraffin-Embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data. In Chapters 1 and 2, we propose a new normalization method, labeled MIXnorm, and its simplified version SMIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we develop a nested EM algorithm, in which closed-form …
Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu
Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu
Statistical Science Theses and Dissertations
In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.
In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy between …
Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen
Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen
Statistical Science Theses and Dissertations
Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …
Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda
Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda
Statistical Science Theses and Dissertations
For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions …
Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen
Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen
Statistical Science Theses and Dissertations
In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.
Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang
Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang
Statistical Science Theses and Dissertations
This dissertation contains two topics: (1) A Comparative Study of Statistical Methods for Quantifying and Testing Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events; (2) Estimation of Variances in Cluster Randomized Designs Using Ranked Set Sampling.
Meta-analysis, the statistical procedure for combining results from multiple studies, has been widely used in medical research to evaluate intervention efficacy and safety. In many practical situations, the variation of treatment effects among the collected studies, often measured by the heterogeneity parameter, may exist and can greatly affect the inference about effect sizes. Comparative studies have been done for only one or …
Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li
Sample Size Calculation Of Clinical Trials With Correlated Outcomes, Dateng Li
Statistical Science Theses and Dissertations
In this thesis, we investigate sample size calculation for three kinds of clinical trials: (1). Randomized controlled trials (RCTs) with longitudinal count outcomes; (2). Cluster randomized trials (CRTs) with count outcomes; (3). CRTs with multiple binary co-primary endpoints.
Clinical Trial Design And Analysis, Shuang Li
Clinical Trial Design And Analysis, Shuang Li
Statistical Science Theses and Dissertations
Clinical trials are experiments tested on human to compare the effect of certain intervention. In early-stage trials, fewer number of patients are enrolled to get preliminary information on safety and efficacy. In late-stage trials, larger number of patients are randomized to further confirm the efficacy and safety.
In Chapter 2, we propose a family of designs for phase I oncology trials. In these trials, oncologists assign different patients at a varying range of dose levels to find the dose that gives the highest acceptable rate of dose-limiting toxicities, which will be the recommended dose for phase II trials. Our proposed …