Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 23 of 23

Full-Text Articles in Physical Sciences and Mathematics

Statistical Methods For Meta-Analysis In Large-Scale Genomic Experiments, Wimarsha Thathsarani Jayanetti Dec 2022

Statistical Methods For Meta-Analysis In Large-Scale Genomic Experiments, Wimarsha Thathsarani Jayanetti

Mathematics & Statistics Theses & Dissertations

Recent developments in high throughput genomic assays have opened up the possibility of testing hundreds and thousands of genes simultaneously. With the availability of vast amounts of public databases, researchers tend to combine genomic analysis results from multiple studies in the form of a meta-analysis. Meta-analysis methods can be broadly classified into two main categories. The first approach is to combine the statistical significance (pvalues) of the genes from each individual study, and the second approach is to combine the statistical estimates (effect sizes) from the individual studies. In this dissertation, we will discuss how adherence to the standard null …


A Copula Model Approach To Identify The Differential Gene Expression, Prasansha Liyanaarachchi Dec 2021

A Copula Model Approach To Identify The Differential Gene Expression, Prasansha Liyanaarachchi

Mathematics & Statistics Theses & Dissertations

Deoxyribonucleic acid, more commonly known as DNA, is a complex double helix-shaped molecule present in all living organisms and hosts thousands of genes. However, only a few genes exhibit differential expression and play a vital role in a particular disease such as breast cancer. Microarray technology is one of the modern technologies developed to study these gene expressions. There are two major microarray technologies available for expression analysis: Spotted cDNA array and oligonucleotide array. The focus of our research is the statistical analysis of data that arises from the spotted cDNA microarray. Numerous models have been proposed in the literature …


Copula-Based Zero-Inflated Count Time Series Models, Mohammed Sulaiman Alqawba Jul 2019

Copula-Based Zero-Inflated Count Time Series Models, Mohammed Sulaiman Alqawba

Mathematics & Statistics Theses & Dissertations

Count time series data are observed in several applied disciplines such as in environmental science, biostatistics, economics, public health, and finance. In some cases, a specific count, say zero, may occur more often than usual. Additionally, serial dependence might be found among these counts if they are recorded over time. Overlooking the frequent occurrence of zeros and the serial dependence could lead to false inference. In this dissertation, we propose two classes of copula-based time series models for zero-inflated counts with the presence of covariates. Zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and zero-inflated Conway-Maxwell-Poisson (ZICMP) distributed marginals of the …


Spatio-Temporal Cluster Detection And Local Moran Statistics Of Point Processes, Jennifer L. Matthews Apr 2019

Spatio-Temporal Cluster Detection And Local Moran Statistics Of Point Processes, Jennifer L. Matthews

Mathematics & Statistics Theses & Dissertations

Moran's index is a statistic that measures spatial dependence, quantifying the degree of dispersion or clustering of point processes and events in some location/area. Recognizing that a single Moran's index may not give a sufficient summary of the spatial autocorrelation measure, a local indicator of spatial association (LISA) has gained popularity. Accordingly, we propose extending LISAs to time after partitioning the area and computing a Moran-type statistic for each subarea. Patterns between the local neighbors are unveiled that would not otherwise be apparent. We consider the measures of Moran statistics while incorporating a time factor under simulated multilevel Palm distribution, …


Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula Apr 2015

Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula

Mathematics & Statistics Theses & Dissertations

It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. …


Analysis Of Continuous Longitudinal Data With Arma(1, 1) And Antedependence Correlation Structures, Sirisha Mushti Apr 2013

Analysis Of Continuous Longitudinal Data With Arma(1, 1) And Antedependence Correlation Structures, Sirisha Mushti

Mathematics & Statistics Theses & Dissertations

Longitudinal or repeated measure data are common in biomedical and clinical trials. These data are often collected on individuals at scheduled times resulting in dependent responses. Inference methods for studying the behavior of responses over time as well as methods to study the association with certain risk factors or covariates taking into account the dependencies are of great importance. In this research we focus our study on the analysis of continuous longitudinal data. To model the dependencies of the responses over time, we consider appropriate correlation structures generated by the stationary and non-stationary time-series models. We develop new estimation procedures …


A Statistical Model To Determine Multiple Binding Sites Of A Transcription Factor On Dna Using Chip-Seq Data, Rasika Jayatillake Jul 2012

A Statistical Model To Determine Multiple Binding Sites Of A Transcription Factor On Dna Using Chip-Seq Data, Rasika Jayatillake

Mathematics & Statistics Theses & Dissertations

Protein-DNA interaction is vital to many biological processes in cells such as cell division, embryo development and regulating gene expression. Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq) is a new technology that can reveal protein binding sites in genome with superior accuracy. Although many methods have been proposed to find binding sites for ChIP-seq data, they can find only one binding site within a short region of the genome. In this study we introduce a statistical model to identify multiple binding sites of a transcription factor within a short region of the genome using the ChIP-seq data. Mapped sequence …


Analysis Of Discrete Choice Probit Models With Structured Correlation Matrices, Bhaskara Ravi Jan 2012

Analysis Of Discrete Choice Probit Models With Structured Correlation Matrices, Bhaskara Ravi

Mathematics & Statistics Theses & Dissertations

Discrete choice models are very popular in Economics and the conditional logit model is the most widely used model to analyze consumer choice behavior, which was introduced in a seminal paper by McFadden (1974). This model is based on the assumption that the unobserved factors, which determine the consumer choices, are independent and follow a Gumbel distribution, widely known as the Independence of irrelevant Alternatives (IIA) assumption. Alternate models that relax IIA assumption are the Generalized Extreme Value (GEV) models, which allow dependency between unobserved factors. However, GEV models do not incorporate all dependency patterns, other choice behaviors such as …


The Doubly Inflated Poisson And Related Regression Models, Manasi Sheth-Chandra Jan 2011

The Doubly Inflated Poisson And Related Regression Models, Manasi Sheth-Chandra

Mathematics & Statistics Theses & Dissertations

Most real life count data consists of some values that are more frequent than allowed by the common parametric families of distributions. For data consisting of only excess zeros, in a seminal paper Lambert (1992) introduced Zero-Inflated Poisson (ZIP) model, which is a mixture model that accounts for the inflated zeros. In this thesis, two Doubly Inflated Poisson (DIP) probability models, DIP (p, λ) and DIP ( p1, p2, λ), are discussed for situations where there is another inflated value k > 0 besides the inflated zeros. The distributional properties such as identifiability, moments, and conditional probabilities …


Rao's Quadratic Entropy And Some New Applications, Yueqin Zhao Apr 2010

Rao's Quadratic Entropy And Some New Applications, Yueqin Zhao

Mathematics & Statistics Theses & Dissertations

Many problems in statistical inference are formulated as testing the diversity of populations. The entropy functions measure the similarity of a distribution function to the uniform distribution and hence can be used as a measure of diversity. Rao (1982a) proposed the concept of quadratic entropy. Its concavity property makes the decomposition similar to ANOVA for categorical data feasible. In this thesis, after reviewing the properties and providing a modification to quadratic entropy, various applications of quadratic entropy are explored. First, analysis of quadratic entropy with the suggested modification to analyze the contingency table data is explored. Then its application to …


Analysis Of Models For Longitudinal And Clustered Binary Data, Weiming Yang Jan 2010

Analysis Of Models For Longitudinal And Clustered Binary Data, Weiming Yang

Mathematics & Statistics Theses & Dissertations

This dissertation deals with modeling and statistical analysis of longitudinal and clustered binary data. Such data consists of observations on a dichotomous response variable generated from multiple time or cluster points, that exhibit either decaying correlation or equi-correlated dependence. The current literature addresses modeling the dependence using an appropriate correlation structure, but ignores the feasible bounds on the correlation parameter imposed by the marginal means.

The first part of this dissertation deals with two multivariate probability models, the first order Markov chain model and the multivariate probit model, that adhere to the feasible bounds on the correlation. For both the …


Canonical Correlation Analysis For Longitudinal Data, Raymond Mccollum Jan 2010

Canonical Correlation Analysis For Longitudinal Data, Raymond Mccollum

Mathematics & Statistics Theses & Dissertations

Data (multivariate data) on two sets of vectors commonly occur in applications. Statistical analysis of these data is usually done using a canonical correlation analysis (CCA). Occurrence of these data at multiple occasions or conditions leads to longitudinal multivariate data for a CCA. We address the problem of canonical correlation analysis on longitudinal data when the data have a Kronecker product covariance structure. Using structured correlation matrices we model the dependency of repeatedly observed data. Recent work of Srivastava, Nahtman, and von Rosen (2008) developed an iterative algorithm to determine the maximum likelihood estimate of the Kronecker product covariance structure …


Canonical Correlation And Correspondence Analysis Of Longitudinal Data, Jayesh Srivastava Apr 2007

Canonical Correlation And Correspondence Analysis Of Longitudinal Data, Jayesh Srivastava

Mathematics & Statistics Theses & Dissertations

Assessing the relationship between two sets of multivariate vectors is an important problem in statistics. Canonical correlation coefficients are used to study these relationships. Canonical correlation analysis (CCA) is a general multivariate method that is mainly used to study relationships when both sets of variables are quantitative. When the variables are qualitative (categorical), a technique called correspondence analysis (CA) is used. Canonical correspondence analysis (CCPA) is used to deal with the case when one set of variables is categorical and the other set is quantitative. By exploiting the interrelationships between these three techniques we first provide a theoretical basis for …


Modeling And Efficient Estimation Of Intra-Family Correlations, Roy Sabo Jan 2007

Modeling And Efficient Estimation Of Intra-Family Correlations, Roy Sabo

Mathematics & Statistics Theses & Dissertations

Familial data occur when observations are taken on multiple members of the same family. Due to relationships between these members, both genetic and by cohabitation, their response variables will likely exhibit some form of dependence. Most of the existing literature models this dependence with an equicorrelated structure. This structure is appropriate when the dependencies between family members are similar, such as in genetic studies, but not in cases where we expect the dependencies to differ, such as behavioral comparisons across different age groups. In this dissertation we first discuss an alternative structure based upon first-order autoregressive correlation. Specifically we create …


Efficient Unbiased Estimating Equations For Analyzing Structured Correlation Matrices, Yihao Deng Jul 2006

Efficient Unbiased Estimating Equations For Analyzing Structured Correlation Matrices, Yihao Deng

Mathematics & Statistics Theses & Dissertations

Analysis of dependent continuous and discrete data has become an active area of research. For normal data, correlations fully quantify the dependence. And historically, maximum likelihood method has been very successful to estimate the correlations and unbiased estimating equation approach has become a popular alternative when there may be a departure from normality. In this thesis we show that the optimal unbiased estimating equation coincides with the likelihood equations for normal data. We then introduce a general class of weighted unbiased estimating equations to estimate parameters in a structured correlation matrix. We derive expressions for asymptotic covariance of the estimates, …


Mark-Recapture Creel Survey And Survival Models, Shampa Saha Jul 1997

Mark-Recapture Creel Survey And Survival Models, Shampa Saha

Mathematics & Statistics Theses & Dissertations

In this dissertation, we consider a model based approach to the estimation of exploitation rate of a fish population by combining mark-recapture procedures with a creel survey. We also consider the analysis of a proportional hazards survival model for randomly censored observations, known as the Koziol-Green model. The model assumes that the lifetime survivor function is a power of the censored time survivor function.

In Chapter 2, we introduce the model based approach to the estimation of the exploitation rate of a fish population by combining mark-recapture procedures with a creel survey. We assume that in the beginning of a …


Analysis Of Repeated Measures Data Under Circular Covariance, Andrew Montgomery Hartley Jan 1997

Analysis Of Repeated Measures Data Under Circular Covariance, Andrew Montgomery Hartley

Mathematics & Statistics Theses & Dissertations

Circular covariance is important in modelling phenomena in epidemiological, communications and numerous physical contexts. We introduce and develop a variety of methods which make it a more versatile tool. First, we present two classes of estimators for use in the presence of missing observations. Using simulations, we show that the mean squared errors of the estimators of one of these classes are smaller than those of the Maximum Likelihood (ML) estimators under certain conditions. Next, we propose and discuss a parsimonious, autoregressive type of circular covariance structure which involves only two parameters. We specify ML and other types of estimators …


Estimation In A Marked Poisson Error Recapture Model Of Software Reliability, Rajan Gupta Jan 1991

Estimation In A Marked Poisson Error Recapture Model Of Software Reliability, Rajan Gupta

Mathematics & Statistics Theses & Dissertations

Nayak's (1988) model for the detection, removal, and recapture of the errors in a computer program is extended to a larger family of models in which the probabilities that the successive programs produce errors are described by the tail probabilities of discrete distribution on the positive integers. Confidence limits are derived for the probability that the final program produces errors. A comparison of the asymptotic variances of parameter estimates given by the error recapture and by the repetitive-run procedure of Nagel, Scholz, and Skrivan (1982) is made to determine which of these procedures efficiently uses the test time.


The Truncated Cauchy Distribution: Estimation Of Parameters And Application To Stock Returns, Paul G. Staneski Apr 1990

The Truncated Cauchy Distribution: Estimation Of Parameters And Application To Stock Returns, Paul G. Staneski

Mathematics & Statistics Theses & Dissertations

The problem addressed in this dissertation is the existence and estimation of the parameters of a truncated Cauchy distribution. It is known that when a number of distributions with infinite support are truncated to a finite interval that the maximum likelihood estimator of the scale parameter fails to exist with positive probability. In particular, necessary and sufficient conditions which give rise to instances of non-existence have been found for the exponential (Deemer and Votaw (1955)), gamma (Broeder (1955), Hegde and Dahiya (1989)), Weibull (Mittal and Dahiya (1989)) and normal distribution (Barndorff-Nielsen (1978), Mittal and Dahiya (1987), Hegde and Dahiya (1989)). …


Software Reliability Models, Syed Afzal Hossain Jul 1989

Software Reliability Models, Syed Afzal Hossain

Mathematics & Statistics Theses & Dissertations

The problem considered here is the building of Non-homogeneous Poisson Process (NHPP) model. Currently existing popular NHPP process models like Goel-Okumoto (G-O) and Yamada et al models suffer from the drawback that the probability density function of the inter-failure times is an improper density function. This is because the event no failure in (0, oo] is allowed in these models. In real life situations we cannot draw sample(s) from such a population and also none of the moments of inter-failure times exist. Therefore, these models are unsuitable for modelling real software error data. On the other hand if the density …


Optimal Row-Column Designs For Correlated Errors And Nested Row-Column Designs For Uncorrelated Errors, Nizam Uddin Apr 1989

Optimal Row-Column Designs For Correlated Errors And Nested Row-Column Designs For Uncorrelated Errors, Nizam Uddin

Mathematics & Statistics Theses & Dissertations

In this dissertation the design problems are considered in the row-column setting for second order autonormal errors when the treatment effects are estimated by generalized least squares, and in the nested row-column setting for uncorrelated errors when the treatment effects are estimated by ordinary least squares. In the former case, universal optimality conditions are derived separately for designs in the plane and on the torus using more general linear models than those considered elsewhere in the literature. Examples of universally optimum planar designs are given, and a method is developed for the construction of optimum and near optimum designs, that …


Large Deviation Local Limit Theorems For Ratio Statistics, Sanjeev V. Sabnis Jul 1987

Large Deviation Local Limit Theorems For Ratio Statistics, Sanjeev V. Sabnis

Mathematics & Statistics Theses & Dissertations

Let {T„, n > 1} be an arbitrary sequence of non-lattice random variables and {Sn, n > 1} be another sequence of positive non-lattice random variables. Let the two sequences be independent. Let Ø1n and Ø2n be the moment genereating functions of {Tn, n > 1} and { Sn,n > 1} respectively. Let {an} be a sequence of real numbers such that an —»• oo.


Statistical Calibration Theory, James John Mckeon Apr 1985

Statistical Calibration Theory, James John Mckeon

Mathematics & Statistics Theses & Dissertations

A calibration method substitutes for measurements, X(,i), that are accurate but impractical or costly, a set of measurements, Y(,i), that are less accurate but simpler or less costly. There are two general types of calibration methods. The classical approach in which once the calibration sample is drawn, the estimates of the X values for a given unit is found without any consideration of the distribution of X values for the other units to be measured. This corresponds best to the literal meaning of the word "calibration". Maximum likelihood estimation is the statistical formulation of the classical approach.

The second approach …