Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

PDF

Purdue University

Open Access Dissertations

2014

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Probabilistic Uncertainty Quantification And Experiment Design For Nonlinear Models: Applications In Systems Biology, Vu Cao Duy Thien Dinh Oct 2014

Probabilistic Uncertainty Quantification And Experiment Design For Nonlinear Models: Applications In Systems Biology, Vu Cao Duy Thien Dinh

Open Access Dissertations

Despite the ever-increasing interest in understanding biology at the system level, there are several factors that hinder studies and analyses of biological systems. First, unlike systems from other applied fields whose parameters can be effectively identified, biological systems are usually unidentifiable, even in the ideal case when all possible system outputs are known with high accuracy. Second, the presence of multivariate bifurcations often leads the system to behaviors that are completely different in nature. In such cases, system outputs (as function of parameters/inputs) are usually discontinuous or have sharp transitions across domains with different behaviors. Finally, models from systems biology …


On The Occurrences Of Motifs In Recursive Trees, With Applications To Random Structures, Mohan Gopaladesikan Oct 2014

On The Occurrences Of Motifs In Recursive Trees, With Applications To Random Structures, Mohan Gopaladesikan

Open Access Dissertations

In this dissertation we study three problems related to motifs and recursive trees. In the first problem we consider a collection of uncorrelated motifs and their occurrences on the fringe of random recursive trees. We compute the exact mean and variance of the multivariate random vector of the counts of occurrences of the motifs. We further use the Cramér-Wold device and the contraction method to show an asymptotic convergence in distribution to a multivariate normal random variable with this mean and variance. ^ The second problem we study is that of the probability that a collection of motifs (of the …


Divide And Recombine: Autoregressive Models And Stl+, Xiang Han Oct 2014

Divide And Recombine: Autoregressive Models And Stl+, Xiang Han

Open Access Dissertations

In this thesis multiple methods are proposed and applied to the Akamai CIDR time series data. The Akamai network is one of the world's largest distributed-computing platforms, with more than 250,000 servers in more than 80 countries. It is responsible for 15-20 percent of all web traffic. We obtained 110 GB raw CIDR data over a 18 month period, collected on the Akamai network from November 2011 to April 2013. ^ The Seasonal-Trend Decomposition procedure based on loess (STL+) is used to model the CIDR series. Motivated by the CIDR series analysis, we propose a general prediction based model selection …


Spatial Marked Point Processes: Models And Inferences, Yen-Ning E Huang Oct 2014

Spatial Marked Point Processes: Models And Inferences, Yen-Ning E Huang

Open Access Dissertations

A spatial marked point process describes the locations of randomly distributed events in a region, with a mark attached to each observed point. Nowadays, the availability of spatiotemporal data is increasing and many spatiotemporal models are studied with applications in a wide range of disciplines. Spatial marked point processes are then extended to spatiotemporal marked point processes if time component is taken into account. In general, the marks can be quantitative or categorical variables. Independence between points and marks is a convenient assumption, but may not be true in practice. Tests for independence between points and marks are proposed previously, …


The Tessera D&R Computational Environment: Designed Experiments For R-Hadoop Performance And Bitcoin Analysis, Jianfu Li Oct 2014

The Tessera D&R Computational Environment: Designed Experiments For R-Hadoop Performance And Bitcoin Analysis, Jianfu Li

Open Access Dissertations

D&R is a statistical framework for the analysis of large complex data that enables feasible and practical analysis of large complex data. The analyst selects a division method to divide the data into subsets, applies an analytic method of the analysis to each subset independently with no communication among subsets, selects a recombination method that is applied to the outputs across subsets to form a result of the analytic method for the entire data. The computational tasking of D&R is nearly embarrassingly parallel, so D&R can readily exploit distributed, parallel computational environments, such as our D&R computational environment, Tessera.^ In …


Modeling Spatial Covariance Functions, Inkyung Choi Jul 2014

Modeling Spatial Covariance Functions, Inkyung Choi

Open Access Dissertations

Covariance modeling plays a key role in the spatial data analysis as it provides important information about the dependence structure of underlying processes and determines performance of spatial prediction. Various parametric models have been developed to accommodate the idiosyncratic features of a given dataset. However, the parametric models may impose unjustified restrictions to the covariance structure and the procedure of choosing a specific model is often ad-hoc. In the first part of the dissertation, a new nonparametric covariance model that can avoid the choice of parametric forms is proposed. The estimator is obtained via a nonparametric approximation of completely monotone …


Identification Of Genomic Factors Using Family-Based Association Studies, Libo Wang Jan 2014

Identification Of Genomic Factors Using Family-Based Association Studies, Libo Wang

Open Access Dissertations

Genome-wide association studies become increasingly popular and important for detecting genetic associations of complex traits. However, it is well known that spurious associations could arise from statistical analysis without proper consideration of genetic relatedness of samples. Many methods have been proposed to guard against these spurious associations. Here we focus on multi-locus association studies of quantitative traits and the case-control status, and propose algorithms that take into consideration of genetic related samples to address possible confounding issues. As supervised dimension reduction methods, these algorithms performs well to conduct association studies with a large number of biomarkers but a relative small …