Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 49

Full-Text Articles in Physical Sciences and Mathematics

Statistical Methods For Meta-Analysis In Large-Scale Genomic Experiments, Wimarsha Thathsarani Jayanetti Dec 2022

Statistical Methods For Meta-Analysis In Large-Scale Genomic Experiments, Wimarsha Thathsarani Jayanetti

Mathematics & Statistics Theses & Dissertations

Recent developments in high throughput genomic assays have opened up the possibility of testing hundreds and thousands of genes simultaneously. With the availability of vast amounts of public databases, researchers tend to combine genomic analysis results from multiple studies in the form of a meta-analysis. Meta-analysis methods can be broadly classified into two main categories. The first approach is to combine the statistical significance (pvalues) of the genes from each individual study, and the second approach is to combine the statistical estimates (effect sizes) from the individual studies. In this dissertation, we will discuss how adherence to the standard null …


A Copula Model Approach To Identify The Differential Gene Expression, Prasansha Liyanaarachchi Dec 2021

A Copula Model Approach To Identify The Differential Gene Expression, Prasansha Liyanaarachchi

Mathematics & Statistics Theses & Dissertations

Deoxyribonucleic acid, more commonly known as DNA, is a complex double helix-shaped molecule present in all living organisms and hosts thousands of genes. However, only a few genes exhibit differential expression and play a vital role in a particular disease such as breast cancer. Microarray technology is one of the modern technologies developed to study these gene expressions. There are two major microarray technologies available for expression analysis: Spotted cDNA array and oligonucleotide array. The focus of our research is the statistical analysis of data that arises from the spotted cDNA microarray. Numerous models have been proposed in the literature …


Inference And Estimation In Change Point Models For Censored Data, Kristine Gierz Dec 2020

Inference And Estimation In Change Point Models For Censored Data, Kristine Gierz

Mathematics & Statistics Theses & Dissertations

In general, the change point problem considers inference of a change in distribution for a set of time-ordered observations. This has applications in a large variety of fields and can also apply to survival data. With improvements to medical diagnoses and treatments, incidences and mortality rates have changed. However, the most commonly used analysis methods do not account for such distributional changes. In survival analysis, change point problems can concern a shift in a distribution for a set of time-ordered observations, potentially under censoring or truncation.

In this dissertation, we first propose a sequential testing approach for detecting multiple change …


D-Vine Pair-Copula Models For Longitudinal Binary Data, Huihui Lin Aug 2020

D-Vine Pair-Copula Models For Longitudinal Binary Data, Huihui Lin

Mathematics & Statistics Theses & Dissertations

Dependent longitudinal binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. A popular method for analyzing such data is the multivariate probit (MP) model. The motivation for this dissertation stems from the fact that the MP model fails even the binary correlations are within the feasible range. The reason being the underlying correlation matrix of the latent variables in the MP model may not be positive definite. In this dissertation, we study alternatives that are based on D-vine pair-copula models. We consider both the serial dependence modeled by the first order autoregressive (AR(1)) and …


Copula-Based Zero-Inflated Count Time Series Models, Mohammed Sulaiman Alqawba Jul 2019

Copula-Based Zero-Inflated Count Time Series Models, Mohammed Sulaiman Alqawba

Mathematics & Statistics Theses & Dissertations

Count time series data are observed in several applied disciplines such as in environmental science, biostatistics, economics, public health, and finance. In some cases, a specific count, say zero, may occur more often than usual. Additionally, serial dependence might be found among these counts if they are recorded over time. Overlooking the frequent occurrence of zeros and the serial dependence could lead to false inference. In this dissertation, we propose two classes of copula-based time series models for zero-inflated counts with the presence of covariates. Zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and zero-inflated Conway-Maxwell-Poisson (ZICMP) distributed marginals of the …


Spatio-Temporal Cluster Detection And Local Moran Statistics Of Point Processes, Jennifer L. Matthews Apr 2019

Spatio-Temporal Cluster Detection And Local Moran Statistics Of Point Processes, Jennifer L. Matthews

Mathematics & Statistics Theses & Dissertations

Moran's index is a statistic that measures spatial dependence, quantifying the degree of dispersion or clustering of point processes and events in some location/area. Recognizing that a single Moran's index may not give a sufficient summary of the spatial autocorrelation measure, a local indicator of spatial association (LISA) has gained popularity. Accordingly, we propose extending LISAs to time after partitioning the area and computing a Moran-type statistic for each subarea. Patterns between the local neighbors are unveiled that would not otherwise be apparent. We consider the measures of Moran statistics while incorporating a time factor under simulated multilevel Palm distribution, …


Extended Poisson Models For Count Data With Inflated Frequencies, Monika Arora Jul 2018

Extended Poisson Models For Count Data With Inflated Frequencies, Monika Arora

Mathematics & Statistics Theses & Dissertations

Count data often exhibits inflated counts for zero. There are numerous papers in the literature that show how to fit Poisson regression models that account for the zero inflation. However, in many situations the frequencies of zero and of some other value k tends to be higher than the Poisson model can fit appropriately. Recently, Sheth-Chandra (2011), Lin and Tsai (2012) introduced a mixture model to account for the inflated frequencies of zero and k. In this dissertation, we study basic properties of this mixture model and parameter estimation for grouped and ungrouped data. Using stochastic representation we show …


Approximation Of Quantiles Of Rank Test Statistics Using Almost Sure Limit Theorems, Mark Ledbetter Jan 2018

Approximation Of Quantiles Of Rank Test Statistics Using Almost Sure Limit Theorems, Mark Ledbetter

Mathematics & Statistics Theses & Dissertations

There are many problems in statistics where the analysis is based on asymptotic distributions. In some cases, the asymptotic distribution is in an open form or is intractable. One possible solution is the logarithmic quantile estimation (LQE) method introduced by Thangavelu (2005) for rank tests and Fridline (2010) for the correlation coefficient. LQE is derived from an almost sure version of the central limit theorem using the results of Berkes and Csaki (2001), and it estimates the quantiles of a test statistic using only the data. To date, LQE has been used in only a few applications. We extend the …


Methods For Analyzing Attribute-Level Best-Worst Discrete Choice Experiments, Amanda Faye Working Oct 2017

Methods For Analyzing Attribute-Level Best-Worst Discrete Choice Experiments, Amanda Faye Working

Mathematics & Statistics Theses & Dissertations

Discrete choice experiments (DCEs) have applications in many areas such as social sciences, economics, transportation research, health systems, and clinical decisions to mention a few. Usually discrete choice models (DCMs) focus on predicting the product choice; however, these models do not provide information about what attributes of the products are impacting consumers’ choices the most. Today, it is common to record the best and worst features of a product (or profile), also called attribute levels, and the goal is to investigate and build models for estimation of attribute and attribute-level impacts on consumer behavior. Attribute-level best-worst DCEs provide information into …


Analysis Off Dependent Discrete Choices Using Gaussian Copula, Arjun Poddar Jul 2016

Analysis Off Dependent Discrete Choices Using Gaussian Copula, Arjun Poddar

Mathematics & Statistics Theses & Dissertations

A popular tool for analyzing product choices of consumers is the well-known conditional logit discrete choice model. Originally publicized by McFadden (1974), this model assumes that the random components of the underlying latent utility functions of the consumers follow independent Gumbel distributions. However, in practice the independence assumption may be violated and a more reasonable model should account for the dependence of the utilities. In this dissertation we use the Gaussian copula with compound symmetric and autoregressive of order one correlation matrices to construct a general multivariate model for the joint distribution of the utilities. The induced correlations on the …


Supervised Classification Using Copula And Mixture Copula, Sumen Sen Jul 2015

Supervised Classification Using Copula And Mixture Copula, Sumen Sen

Mathematics & Statistics Theses & Dissertations

Statistical classification is a field of study that has developed significantly after 1960's. This research has a vast area of applications. For example, pattern recognition has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rule assumes normality. However in many situations, this assumption is often questionable. In fact for some data, the pattern vector is a mixture of discrete and continuous random variables. In this dissertation, we use copula densities to model class conditional distributions. Such types of densities are useful when the marginal densities of a pattern vector are not normally …


Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula Apr 2015

Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula

Mathematics & Statistics Theses & Dissertations

It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. …


Bivariate Doubly Inflated Poisson And Related Regression Models, Pooja Sengupta Jul 2014

Bivariate Doubly Inflated Poisson And Related Regression Models, Pooja Sengupta

Mathematics & Statistics Theses & Dissertations

Count data are common in observational scientific investigations, and in many instances, such as twin or crossover studies, the data consists of dependent bivariate counts. An appropriate model for such data is the bivariate Poisson distribution given in Kocherlakota and Kocherlakota (2001). However, in situations where inflated count of (0, 0) occur, Lee et al. (2009) proposed the zero-inflated bivariate Poisson distribution which accounts for the inflated count. In this research, we introduce and study a bivariate distribution that accounts for an inflated count of the (k, k) cell for some k>0, in addition to the …


Modelling Locally Changing Variance Structured Time Series Data By Using Breakpoints Bootstrap Filtering, Rajan Lamichhane Jul 2013

Modelling Locally Changing Variance Structured Time Series Data By Using Breakpoints Bootstrap Filtering, Rajan Lamichhane

Mathematics & Statistics Theses & Dissertations

Stochastic processes have applications in many areas such as oceanography and engineering. Special classes of such processes deal with time series of sparse data. Studies in such cases focus in the analysis, construction and prediction in parametric models. Here, we assume several non-linear time series with additive noise components, and the model fitting is proposed in two stages. The first stage identifies the density using all the clusters information, without specifying any prior knowledge of the underlying distribution function of the time series. The effect of covariates is controlled by fitting the linear regression model with serially correlated errors. In …


Analysis Of Continuous Longitudinal Data With Arma(1, 1) And Antedependence Correlation Structures, Sirisha Mushti Apr 2013

Analysis Of Continuous Longitudinal Data With Arma(1, 1) And Antedependence Correlation Structures, Sirisha Mushti

Mathematics & Statistics Theses & Dissertations

Longitudinal or repeated measure data are common in biomedical and clinical trials. These data are often collected on individuals at scheduled times resulting in dependent responses. Inference methods for studying the behavior of responses over time as well as methods to study the association with certain risk factors or covariates taking into account the dependencies are of great importance. In this research we focus our study on the analysis of continuous longitudinal data. To model the dependencies of the responses over time, we consider appropriate correlation structures generated by the stationary and non-stationary time-series models. We develop new estimation procedures …


A Statistical Model To Determine Multiple Binding Sites Of A Transcription Factor On Dna Using Chip-Seq Data, Rasika Jayatillake Jul 2012

A Statistical Model To Determine Multiple Binding Sites Of A Transcription Factor On Dna Using Chip-Seq Data, Rasika Jayatillake

Mathematics & Statistics Theses & Dissertations

Protein-DNA interaction is vital to many biological processes in cells such as cell division, embryo development and regulating gene expression. Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq) is a new technology that can reveal protein binding sites in genome with superior accuracy. Although many methods have been proposed to find binding sites for ChIP-seq data, they can find only one binding site within a short region of the genome. In this study we introduce a statistical model to identify multiple binding sites of a transcription factor within a short region of the genome using the ChIP-seq data. Mapped sequence …


Analysis Of Discrete Choice Probit Models With Structured Correlation Matrices, Bhaskara Ravi Jan 2012

Analysis Of Discrete Choice Probit Models With Structured Correlation Matrices, Bhaskara Ravi

Mathematics & Statistics Theses & Dissertations

Discrete choice models are very popular in Economics and the conditional logit model is the most widely used model to analyze consumer choice behavior, which was introduced in a seminal paper by McFadden (1974). This model is based on the assumption that the unobserved factors, which determine the consumer choices, are independent and follow a Gumbel distribution, widely known as the Independence of irrelevant Alternatives (IIA) assumption. Alternate models that relax IIA assumption are the Generalized Extreme Value (GEV) models, which allow dependency between unobserved factors. However, GEV models do not incorporate all dependency patterns, other choice behaviors such as …


The Doubly Inflated Poisson And Related Regression Models, Manasi Sheth-Chandra Jan 2011

The Doubly Inflated Poisson And Related Regression Models, Manasi Sheth-Chandra

Mathematics & Statistics Theses & Dissertations

Most real life count data consists of some values that are more frequent than allowed by the common parametric families of distributions. For data consisting of only excess zeros, in a seminal paper Lambert (1992) introduced Zero-Inflated Poisson (ZIP) model, which is a mixture model that accounts for the inflated zeros. In this thesis, two Doubly Inflated Poisson (DIP) probability models, DIP (p, λ) and DIP ( p1, p2, λ), are discussed for situations where there is another inflated value k > 0 besides the inflated zeros. The distributional properties such as identifiability, moments, and conditional probabilities …


Modeling And Analysis Of Repeated Ordinal Data Using Copula Based Likelihoods And Estimating Equation Methods, Raghavendra Rao Kurada Jan 2011

Modeling And Analysis Of Repeated Ordinal Data Using Copula Based Likelihoods And Estimating Equation Methods, Raghavendra Rao Kurada

Mathematics & Statistics Theses & Dissertations

Repeated or longitudinal ordinal data occur in many fields such as biology, epidemiology, and finance. These data normally are analyzed using both likelihood and non-likelihood methods. The first part of this dissertation discusses the multivariate ordered probit model which is a likelihood method based on latent variables. We show that this latent variable model belong to a very general class of Copula models. We use the copula representation for the multivariate ordered probit model to obtain maximum likelihood estimates of the parameters. We apply the methodology in the analysis of real life data examples.

Though likelihood methods are preferable, there …


A Study Of Relationships Between Family Members Using Familial Correlations, Corinne Wilson Jul 2010

A Study Of Relationships Between Family Members Using Familial Correlations, Corinne Wilson

Mathematics & Statistics Theses & Dissertations

Familial correlations measure the resemblance between family members and are used in many fields of study including epidemiology, genetics, heredity, and psychology. Here, an analysis of familial correlations where male and female children of the same family can have different correlations in the unequal family size case is presented. First, three likelihood based tests, namely the likelihood ratio test, Rao score test, and Wald test, and two more asymptotic tests which use Srivastava's estimator of the intraclass correlation coefficient are considered to test the null hypothesis of equality of the intraclass correlation coefficients when families have unequal numbers of children. …


Semi-Parametric Likelihood Functions For Bivariate Survival Data, S. H. Sathish Indika Jul 2010

Semi-Parametric Likelihood Functions For Bivariate Survival Data, S. H. Sathish Indika

Mathematics & Statistics Theses & Dissertations

Because of the numerous applications, characterization of multivariate survival distributions is still a growing area of research. The aim of this thesis is to investigate a joint probability distribution that can be derived for modeling nonnegative related random variables. We restrict the marginals to a specified lifetime distribution, while proposing a linear relationship between them with an unknown (error) random variable that we completely characterize. The distributions are all of positive supports, but one class has a positive probability of simultaneous occurrence. In that sense, we capture the absolutely continuous case, and the Marshall-Olkin type with a positive probability of …


Rao's Quadratic Entropy And Some New Applications, Yueqin Zhao Apr 2010

Rao's Quadratic Entropy And Some New Applications, Yueqin Zhao

Mathematics & Statistics Theses & Dissertations

Many problems in statistical inference are formulated as testing the diversity of populations. The entropy functions measure the similarity of a distribution function to the uniform distribution and hence can be used as a measure of diversity. Rao (1982a) proposed the concept of quadratic entropy. Its concavity property makes the decomposition similar to ANOVA for categorical data feasible. In this thesis, after reviewing the properties and providing a modification to quadratic entropy, various applications of quadratic entropy are explored. First, analysis of quadratic entropy with the suggested modification to analyze the contingency table data is explored. Then its application to …


Canonical Correlation Analysis For Longitudinal Data, Raymond Mccollum Jan 2010

Canonical Correlation Analysis For Longitudinal Data, Raymond Mccollum

Mathematics & Statistics Theses & Dissertations

Data (multivariate data) on two sets of vectors commonly occur in applications. Statistical analysis of these data is usually done using a canonical correlation analysis (CCA). Occurrence of these data at multiple occasions or conditions leads to longitudinal multivariate data for a CCA. We address the problem of canonical correlation analysis on longitudinal data when the data have a Kronecker product covariance structure. Using structured correlation matrices we model the dependency of repeatedly observed data. Recent work of Srivastava, Nahtman, and von Rosen (2008) developed an iterative algorithm to determine the maximum likelihood estimate of the Kronecker product covariance structure …


Analysis Of Models For Longitudinal And Clustered Binary Data, Weiming Yang Jan 2010

Analysis Of Models For Longitudinal And Clustered Binary Data, Weiming Yang

Mathematics & Statistics Theses & Dissertations

This dissertation deals with modeling and statistical analysis of longitudinal and clustered binary data. Such data consists of observations on a dichotomous response variable generated from multiple time or cluster points, that exhibit either decaying correlation or equi-correlated dependence. The current literature addresses modeling the dependence using an appropriate correlation structure, but ignores the feasible bounds on the correlation parameter imposed by the marginal means.

The first part of this dissertation deals with two multivariate probability models, the first order Markov chain model and the multivariate probit model, that adhere to the feasible bounds on the correlation. For both the …


Canonical Correlation And Correspondence Analysis Of Longitudinal Data, Jayesh Srivastava Apr 2007

Canonical Correlation And Correspondence Analysis Of Longitudinal Data, Jayesh Srivastava

Mathematics & Statistics Theses & Dissertations

Assessing the relationship between two sets of multivariate vectors is an important problem in statistics. Canonical correlation coefficients are used to study these relationships. Canonical correlation analysis (CCA) is a general multivariate method that is mainly used to study relationships when both sets of variables are quantitative. When the variables are qualitative (categorical), a technique called correspondence analysis (CA) is used. Canonical correspondence analysis (CCPA) is used to deal with the case when one set of variables is categorical and the other set is quantitative. By exploiting the interrelationships between these three techniques we first provide a theoretical basis for …


Modeling And Efficient Estimation Of Intra-Family Correlations, Roy Sabo Jan 2007

Modeling And Efficient Estimation Of Intra-Family Correlations, Roy Sabo

Mathematics & Statistics Theses & Dissertations

Familial data occur when observations are taken on multiple members of the same family. Due to relationships between these members, both genetic and by cohabitation, their response variables will likely exhibit some form of dependence. Most of the existing literature models this dependence with an equicorrelated structure. This structure is appropriate when the dependencies between family members are similar, such as in genetic studies, but not in cases where we expect the dependencies to differ, such as behavioral comparisons across different age groups. In this dissertation we first discuss an alternative structure based upon first-order autoregressive correlation. Specifically we create …


Efficient Unbiased Estimating Equations For Analyzing Structured Correlation Matrices, Yihao Deng Jul 2006

Efficient Unbiased Estimating Equations For Analyzing Structured Correlation Matrices, Yihao Deng

Mathematics & Statistics Theses & Dissertations

Analysis of dependent continuous and discrete data has become an active area of research. For normal data, correlations fully quantify the dependence. And historically, maximum likelihood method has been very successful to estimate the correlations and unbiased estimating equation approach has become a popular alternative when there may be a departure from normality. In this thesis we show that the optimal unbiased estimating equation coincides with the likelihood equations for normal data. We then introduce a general class of weighted unbiased estimating equations to estimate parameters in a structured correlation matrix. We derive expressions for asymptotic covariance of the estimates, …


Estimating Familial Correlations Using A Kotz Type Density, Amal Helu Jul 2006

Estimating Familial Correlations Using A Kotz Type Density, Amal Helu

Mathematics & Statistics Theses & Dissertations

Two useful familial correlations often used to study the resemblance between the family members are the sib-sib correlation (ρss) and the mom-sib or parent-sib correlation (ρps). Since their introduction early in the last century by Galton, Fisher and others, many improved estimators of these correlations have been suggested in the literature. Several moment based estimators as well as the maximum likelihood estimators under the assumption of multivariate normality have been extensively studied and compared by various authors. However, the performance of these estimators when the data are not from multivariate normal distribution is poor. In this …


Statistical Analysis Of Longitudinal And Multivariate Discrete Data, Deepak Mav Apr 2005

Statistical Analysis Of Longitudinal And Multivariate Discrete Data, Deepak Mav

Mathematics & Statistics Theses & Dissertations

Correlated multivariate Poisson and binary variables occur naturally in medical, biological and epidemiological longitudinal studies. Modeling and simulating such variables is difficult because the correlations are restricted by the marginal means via Fréchet bounds in a complicated way. In this dissertation we will first discuss partially specified models and methods for estimating the regression and correlation parameters. We derive the asymptotic distributions of these parameter estimates. Using simulations based on extensions of the algorithm due to Sim (1993, Journal of Statistical Computation and Simulation, 47, pp. 1–10), we study the performance of these estimates using infeasibility, coverage probabilities of the …


Estimation Of Parameters In Replicated Time Series Regression Models, Genming Shi Jul 2003

Estimation Of Parameters In Replicated Time Series Regression Models, Genming Shi

Mathematics & Statistics Theses & Dissertations

The time series regression model was widely studied in the literature by several authors. However, statistical analysis of replicated time series regression models has received little attention. In this thesis, we study the application of quasi-least squares, a relatively new method, to estimate the parameters in replicated time series models with general ARMA( p, q) correlation structure. We also study several established methods for estimating the parameters in those models, including the maximum likelihood, method of moments, and the GEE method. Asymptotic comparisons of the methods are made bV fixing the number of repeated measurements in each series, and …