Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis Dec 2016

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis

Open Access Dissertations

Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to newly …


Group Transformation And Identification With Kernel Methods And Big Data Mixed Logistic Regression, Chao Pan Dec 2016

Group Transformation And Identification With Kernel Methods And Big Data Mixed Logistic Regression, Chao Pan

Open Access Dissertations

Exploratory Data Analysis (EDA) is a crucial step in the life cycle of data analysis. Exploring data with effective methods would reveal main characteristics of data and provides guidance for model building. The goal of this thesis is to develop effective and efficient methods for data exploration in the regression setting.

First, we propose to use optimal group transformations as a general approach for exploring the relationship between predictor variables X and the response Y. This approach can be considered an automatic procedure to identify the best characteristic of P( Y|X) under which the relationship …


Characterizing The Effects Of Repetitive Head Trauma In Female Soccer Athletes For Prevention Of Mild Traumatic Brain Injury, Diana Otero Svaldi Dec 2016

Characterizing The Effects Of Repetitive Head Trauma In Female Soccer Athletes For Prevention Of Mild Traumatic Brain Injury, Diana Otero Svaldi

Open Access Dissertations

As participation in women’s soccer continues to grow and the longevity of female athletes’ careers continues to increase, prevention of mTBI in women’s soccer has become a major concern for female athletes as the long-term risks associated with a history of mTBI are well documented. Among women’s sports, soccer exhibits the highest concussion rates, on par with those of men’s football at the collegiate level. Head impact monitoring technology has revealed that “concussive hits” occurring directly before symptomatic injury are not predictive of mTBI, suggesting that the cumulative effect of repetitive head impacts experienced by collision sport athletes should be …


Computational Environment For Modeling And Analysing Network Traffic Behaviour Using The Divide And Recombine Framework, Ashrith Barthur Dec 2016

Computational Environment For Modeling And Analysing Network Traffic Behaviour Using The Divide And Recombine Framework, Ashrith Barthur

Open Access Dissertations

There are two essential goals of this research. The first goal is to design and construct a computational environment that is used for studying large and complex datasets in the cybersecurity domain. The second goal is to analyse the Spamhaus blacklist query dataset which includes uncovering the properties of blacklisted hosts and understanding the nature of blacklisted hosts over time.

The analytical environment enables deep analysis of very large and complex datasets by exploiting the divide and recombine framework. The capability to analyse data in depth enables one to go beyond just summary statistics in research. This deep analysis is …


Functional Regression Models In The Frame Work Of Reproducing Kernel Hilbert Space, Simeng Qu Dec 2016

Functional Regression Models In The Frame Work Of Reproducing Kernel Hilbert Space, Simeng Qu

Open Access Dissertations

The aim of this thesis is to systematically investigate some functional regression models for accurately quantifying the effect of functional predictors. In particular, three functional models are studied: functional linear regression model, functional Cox model, and function-on-scalar model. Both theoretical properties and numerical algorithms are studied in depth. The new models find broad applications in many areas.

For the functional linear regression model, the focus is on testing the nullity of the slope function, and a generalized likelihood ratio test based on easily implementable data-driven estimate is proposed. The quality of the test is measured by the minimal distance between …


Divide And Recombined For Large Complex Data: Nonparametric-Regression Modelling Of Spatial And Seasonal-Temporal Time Series, Xiaosu Tong Dec 2016

Divide And Recombined For Large Complex Data: Nonparametric-Regression Modelling Of Spatial And Seasonal-Temporal Time Series, Xiaosu Tong

Open Access Dissertations

In the first chapter of this dissertation, I briefly introduce one type of nonparametric regression method, namely local polynomial regression, followed by emphasis on one specific application of loess on time series decomposition, called Seasonal Trend Loess (STL). The chapter is closed by the introduction of D\&R; (Divide and Recombined) statistical framework. Data can be divided into subsets, each of which is applied with a statistical analysis method. This is an embarrassing parallel procedure since there is no communication between each subset. Then the analysis result for each subset are combined together to be the final analysis outcome for the …


Controlling For Confounding Network Properties In Hypothesis Testing And Anomaly Detection, Timothy La Fond Aug 2016

Controlling For Confounding Network Properties In Hypothesis Testing And Anomaly Detection, Timothy La Fond

Open Access Dissertations

An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process. Unfortunately, choosing network statistics that are dependent on confounding factors like the total number of nodes or edges can lead to incorrect conclusions (e.g., false positives and false negatives). In this dissertation we describe the challenges that face …


Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier Aug 2016

Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier

Open Access Dissertations

Increasingly, new sources of data are being incorporated into plant breeding pipelines. Enormous amounts of data from field phenomics and genotyping technologies places data mining and analysis into a completely different level that is challenging from practical and theoretical standpoints. Intelligent decision-making relies on our capability of extracting from data useful information that may help us to achieve our goals more efficiently. Many plant breeders, agronomists and geneticists perform analyses without knowing relevant underlying assumptions, strengths or pitfalls of the employed methods. The study endeavors to assess statistical learning properties and plant breeding applications of supervised and unsupervised machine learning …


Extreme-Strike And Small-Time Asymptotics For Gaussian Stochastic Volatility Models, Xin Zhang Aug 2016

Extreme-Strike And Small-Time Asymptotics For Gaussian Stochastic Volatility Models, Xin Zhang

Open Access Dissertations

Asymptotic behavior of implied volatility is of our interest in this dissertation. For extreme strike, we consider a stochastic volatility asset price model in which the volatility is the absolute value of a continuous Gaussian process with arbitrary prescribed mean and covariance. By exhibiting a Karhunen-Loève expansion for the integrated variance, and using sharp estimates of the density of a general second-chaos variable, we derive asymptotics for the asset price density for large or small values of the variable, and study the wing behavior of the implied volatility in these models. Our main result provides explicit expressions for the first …


The Design And Statistical Analysis Of Single-Cell Rna-Sequencing Experiments, Faye H. Zheng Aug 2016

The Design And Statistical Analysis Of Single-Cell Rna-Sequencing Experiments, Faye H. Zheng

Open Access Dissertations

Next-generation DNA- and RNA-sequencing (RNA-seq) technologies have expanded rapidly in both throughput and accuracy within the last decade. The momentum continues as emerging techniques become increasingly capable of profiling molecular content at the level of individual cells. One goal of this research is to put forward best practices in the design of single-cell RNA-sequencing (scRNA-seq) experiments, specifically as it relates to choices regarding the trade-off between sequencing depth and sample size. In addition to general guidelines, an interactive tool is presented to aid researchers in making experiment-specific decisions that are informed by real data and practical constraints. Further, a new …


Model-Free Variable Screening, Sparse Regression Analysis And Other Applications With Optimal Transformations, Qiming Huang Aug 2016

Model-Free Variable Screening, Sparse Regression Analysis And Other Applications With Optimal Transformations, Qiming Huang

Open Access Dissertations

Variable screening and variable selection methods play important roles in modeling high dimensional data. Variable screening is the process of filtering out irrelevant variables, with the aim to reduce the dimensionality from ultrahigh to high while retaining all important variables. Variable selection is the process of selecting a subset of relevant variables for use in model construction. The main theme of this thesis is to develop variable screening and variable selection methods for high dimensional data analysis. In particular, we will present two relevant methods for variable screening and selection under a unified framework based on optimal transformations.

In the …


Maximum Empirical Likelihood Estimation In U-Statistics Based General Estimating Equations, Lingnan Li Aug 2016

Maximum Empirical Likelihood Estimation In U-Statistics Based General Estimating Equations, Lingnan Li

Open Access Dissertations

In the first part of this thesis, we study maximum empirical likelihood estimates (MELE's) in U-statistics based general estimating equations (UGEE's). Our technical maneuver is the jackknife empirical likelihood (JEL) approach. We give the local uniform asymptotic normality condition for the log-JEL for UGEE's. We derive the estimating equations for finding MELE's and provide their asymptotic normality. We obtain easy MELE's which have less computational burden than the usual MELE's and can be easily implemented using existing software. We investigate the use of side information of the data to improve efficiency. We exhibit that the MELE's are fully efficient, and …


Is Metabolism Goal-Directed? Investigating The Validity Of Modeling Biological Systems With Cybernetic Control Via Omic Data, Frank T. Devilbiss Apr 2016

Is Metabolism Goal-Directed? Investigating The Validity Of Modeling Biological Systems With Cybernetic Control Via Omic Data, Frank T. Devilbiss

Open Access Dissertations

Cybernetic models are uniquely juxtaposed to other metabolic modeling frameworks in that they describe the time-dependent regulation of cellular reactions in terms of dynamic "metabolic goals." This approach contrasts starkly with purely mechanistic descriptions of metabolic regulation which seek to explain metabolic processes in high resolution — a clearly daunting undertaking. Over a span of three decades, cybernetic models have been used to predict metabolic phenomena ranging from resource consumption in mixed-substrate environments to intracellular reaction fluxes of intricate metabolic networks. While the cybernetic approach has been validated in its utility for the prediction of metabolic phenomena, its central feature, …


A Flexible And Versatile Framework For Statistical Design And Analysis Of Quantitative Mass Spectrometry-Based Proteomic Experiments, Meena Choi Feb 2016

A Flexible And Versatile Framework For Statistical Design And Analysis Of Quantitative Mass Spectrometry-Based Proteomic Experiments, Meena Choi

Open Access Dissertations

Quantitative mass spectrometry (MS)-based proteomics is an indispensable technology for biological and clinical research. As the proteomics field grows, MS-based proteomic workflows are becoming more complex and diverse. The accuracy and the throughput of the MS measurements and of the signal processing tools dramatically increased. However, many existing statistical tools and workflows have not followed the technological development. Therefore, there is a need for flexible statistical tools, which reflect diverse and complex workflows, are computationally efficient for large datasets, and maximize the reproducibility of the results.

We propose a family of linear mixed effects models, and a split-plot view of …