Open Access. Powered by Scholars. Published by Universities.®

Multivariate Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Institution
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 54

Full-Text Articles in Multivariate Analysis

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin Dec 2023

The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin

Theses and Dissertations

This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …


Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar May 2022

How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar

Information Systems Undergraduate Honors Theses

Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake Dec 2020

Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake

Theses & Dissertations

Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …


Theory Of Principal Components For Applications In Exploratory Crime Analysis And Clustering, Daniel Silva Jan 2020

Theory Of Principal Components For Applications In Exploratory Crime Analysis And Clustering, Daniel Silva

All Graduate Theses, Dissertations, and Other Capstone Projects

The purpose of this paper is to develop the theory of principal components analysis succinctly from the fundamentals of matrix algebra and multivariate statistics. Principal components analysis is sometimes used as a descriptive technique to explain the variance-covariance or correlation structure of a dataset. However, most often, it is used as a dimensionality reduction technique to visualize a high dimensional dataset in a lower dimensional space. Principal components analysis accomplishes this by using the first few principal components, provided that they account for a substantial proportion of variation in the original dataset. In the same way, the first few principal …


Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma Dec 2019

Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma

UW Biostatistics Working Paper Series

Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. …


Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie Dec 2019

Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie

UW Biostatistics Working Paper Series

Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this …


Optimal Design For A Causal Structure, Zaher Kmail Aug 2019

Optimal Design For A Causal Structure, Zaher Kmail

Department of Statistics: Dissertations, Theses, and Student Work

Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.

Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Regression Analysis For Ordinal Outcomes In Matched Study Design: Applications To Alzheimer's Disease Studies, Elizabeth Austin Jul 2018

Regression Analysis For Ordinal Outcomes In Matched Study Design: Applications To Alzheimer's Disease Studies, Elizabeth Austin

Masters Theses

Alzheimer's Disease (AD) affects nearly 5.4 million Americans as of 2016 and is the most common form of dementia. The disease is characterized by the presence of neurofibrillary tangles and amyloid plaques [1]. The amount of plaques are measured by Braak stage, post-mortem. It is known that AD is positively associated with hypercholesterolemia [16]. As statins are the most widely used cholesterol-lowering drug, there may be associations between statin use and AD. We hypothesize that those who use statins, specifically lipophilic statins, are more likely to have a low Braak stage in post-mortem analysis.

In order to address this hypothesis, …


On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar Mar 2018

On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar

FIU Electronic Theses and Dissertations

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …


Modelling The Common Risk Among Equities Using A New Time Series Model, Jingjia Chu Feb 2018

Modelling The Common Risk Among Equities Using A New Time Series Model, Jingjia Chu

Electronic Thesis and Dissertation Repository

A new additive structure of multivariate GARCH model is proposed where the dynamic changes of the conditional correlation between the stocks are aggregated by the common risk term. The observable sequence is divided into two parts, a common risk term and an individual risk term, both following a GARCH type structure. The conditional volatility of each stock will be the sum of these two conditional variance terms. All the conditional volatility of the stock can shoot up together because a sudden peak of the common volatility is a sign of the system shock.

We provide sufficient conditions for strict stationarity …


Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan Jan 2017

Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan

Theses and Dissertations--Statistics

We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.

We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee Jan 2016

Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee

Theses and Dissertations

High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. …


Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje Dec 2015

Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje

Mark Fiecas

Motivated from a changing market environment over time, we consider high-dimensional data such as financial returns, generated by a hidden Markov model which allows for switching between different regimes or states. To get more stable estimates of the covariance matrices of the different states, potentially driven by a number of observations which is small compared to the dimension, we apply shrinkage and combine it with an EM-type algorithm. This approach will yield better estimates a more stable estimates of the covariance matrix, which allows for improved reconstruction of the hidden Markov chain. In addition to a simulation study and the …


Bootstrapping Vs. Asymptotic Theory In Property And Casualty Loss Reserving, Andrew J. Difronzo Jr. Apr 2015

Bootstrapping Vs. Asymptotic Theory In Property And Casualty Loss Reserving, Andrew J. Difronzo Jr.

Honors Projects in Mathematics

One of the key functions of a property and casualty (P&C) insurance company is loss reserving, which calculates how much money the company should retain in order to pay out future claims. Most P&C insurance companies use non-stochastic (non-random) methods to estimate these future liabilities. However, future loss data can also be projected using generalized linear models (GLMs) and stochastic simulation. Two simulation methods that will be the focus of this project are: bootstrapping methodology, which resamples the original loss data (creating pseudo-data in the process) and fits the GLM parameters based on the new data to estimate the sampling …


Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs Dec 2013

Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs

Mark Fiecas

Time series data obtained from neurophysiological signals is often high-dimensional and the length of the time series is often short relative to the number of dimensions. Thus, it is difficult or sometimes impossible to compute statistics that are based on the spectral density matrix because these matrices are numerically unstable. In this work, we discuss the importance of regularization for spectral analysis of high-dimensional time series and propose shrinkage estimation for estimating high-dimensional spectral density matrices. The shrinkage estimator is derived from a penalized log-likelihood, and the optimal penalty parameter has a closed-form solution, which can be estimated using the …


A Study On The Correlation Of Bivariate And Trivariate Normal Models, Maria Del Pilar Orjuela Nov 2013

A Study On The Correlation Of Bivariate And Trivariate Normal Models, Maria Del Pilar Orjuela

FIU Electronic Theses and Dissertations

Suppose two or more variables are jointly normally distributed. If there is a common relationship between these variables it would be very important to quantify this relationship by a parameter called the correlation coefficient which measures its strength, and the use of it can develop an equation for predicting, and ultimately draw testable conclusion about the parent population.

This research focused on the correlation coefficient ρ for the bivariate and trivariate normal distribution when equal variances and equal covariances are considered. Particularly, we derived the maximum Likelihood Estimators (MLE) of the distribution parameters assuming all of them are unknown, and …


Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer Aug 2011

Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer

Harvard University Biostatistics Working Paper Series

No abstract provided.


On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei Jul 2011

On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi Jul 2011

A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi

COBRA Preprint Series

Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …


Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler Jul 2011

Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler

Harvard University Biostatistics Working Paper Series

No abstract provided.


Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale Jun 2011

Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale

Johns Hopkins University, Dept. of Biostatistics Working Papers

Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …


Mixture Of Factor Analyzers With Information Criteria And The Genetic Algorithm, Esra Turan Aug 2010

Mixture Of Factor Analyzers With Information Criteria And The Genetic Algorithm, Esra Turan

Doctoral Dissertations

In this dissertation, we have developed and combined several statistical techniques in Bayesian factor analysis (BAYFA) and mixture of factor analyzers (MFA) to overcome the shortcoming of these existing methods. Information Criteria are brought into the context of the BAYFA model as a decision rule for choosing the number of factors m along with the Press and Shigemasu method, Gibbs Sampling and Iterated Conditional Modes deterministic optimization. Because of sensitivity of BAYFA on the prior information of the factor pattern structure, the prior factor pattern structure is learned directly from the given sample observations data adaptively using Sparse Root algorithm. …


A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles Nov 2009

A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles

Sunduz Keles

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.

We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …


The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin Jul 2009

The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.