Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- COBRA (34)
- Selected Works (3)
- University of Massachusetts Amherst (3)
- Florida International University (2)
- Bryant University (1)
-
- Florida Institute of Technology (1)
- Minnesota State University, Mankato (1)
- Southern Methodist University (1)
- University of Arkansas, Fayetteville (1)
- University of Central Florida (1)
- University of Kentucky (1)
- University of Nebraska - Lincoln (1)
- University of Nebraska Medical Center (1)
- University of Tennessee, Knoxville (1)
- Virginia Commonwealth University (1)
- Western University (1)
- Keyword
-
- Prediction (5)
- Classification (4)
- Bootstrap (3)
- Cluster analysis (3)
- Gene expression (3)
-
- Genetics (3)
- Multiple hypothesis testing (3)
- Adjusted p-value (2)
- Censored data (2)
- Comparative genomic hybridization (2)
- Correlation (2)
- Cross-validation (2)
- Density estimation (2)
- Distance (2)
- High-dimensional Time Series (2)
- High-dimensional inference (2)
- Linear regression (2)
- Loss function (2)
- Microarray (2)
- Model selection (2)
- Multivariate outcome (2)
- Null distribution (2)
- Partitioning (2)
- Permutation (2)
- Poisson (2)
- Power (2)
- Regression trees (2)
- Rejection region (2)
- Resampling (2)
- Statistics (2)
- Publication Year
- Publication
-
- U.C. Berkeley Division of Biostatistics Working Paper Series (11)
- Harvard University Biostatistics Working Paper Series (9)
- UW Biostatistics Working Paper Series (9)
- COBRA Preprint Series (4)
- Doctoral Dissertations (3)
-
- FIU Electronic Theses and Dissertations (2)
- Mark Fiecas (2)
- Theses and Dissertations (2)
- All Graduate Theses, Dissertations, and Other Capstone Projects (1)
- Data Science and Data Mining (1)
- Department of Statistics: Dissertations, Theses, and Student Work (1)
- Electronic Thesis and Dissertation Repository (1)
- Honors Projects in Mathematics (1)
- Information Systems Undergraduate Honors Theses (1)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (1)
- Masters Theses (1)
- Statistical Science Theses and Dissertations (1)
- Sunduz Keles (1)
- Theses & Dissertations (1)
- Theses and Dissertations--Statistics (1)
- Publication Type
Articles 1 - 30 of 54
Full-Text Articles in Multivariate Analysis
Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe
Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe
Data Science and Data Mining
This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.
The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin
The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin
Theses and Dissertations
This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …
Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako
Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako
Doctoral Dissertations
This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …
How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar
How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar
Information Systems Undergraduate Honors Theses
Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …
Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang
Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang
Doctoral Dissertations
In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …
Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake
Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake
Theses & Dissertations
Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …
Theory Of Principal Components For Applications In Exploratory Crime Analysis And Clustering, Daniel Silva
Theory Of Principal Components For Applications In Exploratory Crime Analysis And Clustering, Daniel Silva
All Graduate Theses, Dissertations, and Other Capstone Projects
The purpose of this paper is to develop the theory of principal components analysis succinctly from the fundamentals of matrix algebra and multivariate statistics. Principal components analysis is sometimes used as a descriptive technique to explain the variance-covariance or correlation structure of a dataset. However, most often, it is used as a dimensionality reduction technique to visualize a high dimensional dataset in a lower dimensional space. Principal components analysis accomplishes this by using the first few principal components, provided that they account for a substantial proportion of variation in the original dataset. In the same way, the first few principal …
Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma
Generalized Matrix Decomposition Regression: Estimation And Inference For Two-Way Structured Data, Yue Wang, Ali Shojaie, Tim Randolph, Jing Ma
UW Biostatistics Working Paper Series
Analysis of two-way structured data, i.e., data with structures among both variables and samples, is becoming increasingly common in ecology, biology and neuro-science. Classical dimension-reduction tools, such as the singular value decomposition (SVD), may perform poorly for two-way structured data. The generalized matrix decomposition (GMD, Allen et al., 2014) extends the SVD to two-way structured data and thus constructs singular vectors that account for both structures. While the GMD is a useful dimension-reduction tool for exploratory analysis of two-way structured data, it is unsupervised and cannot be used to assess the association between such data and an outcome of interest. …
Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie
Statistical Inference For Networks Of High-Dimensional Point Processes, Xu Wang, Mladen Kolar, Ali Shojaie
UW Biostatistics Working Paper Series
Fueled in part by recent applications in neuroscience, high-dimensional Hawkes process have become a popular tool for modeling the network of interactions among multivariate point process data. While evaluating the uncertainty of the network estimates is critical in scientific applications, existing methodological and theoretical work have only focused on estimation. To bridge this gap, this paper proposes a high-dimensional statistical inference procedure with theoretical guarantees for multivariate Hawkes process. Key to this inference procedure is a new concentration inequality on the first- and second-order statistics for integrated stochastic processes, which summarizes the entire history of the process. We apply this …
Optimal Design For A Causal Structure, Zaher Kmail
Optimal Design For A Causal Structure, Zaher Kmail
Department of Statistics: Dissertations, Theses, and Student Work
Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.
Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …
Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan
Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan
COBRA Preprint Series
One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …
Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane
Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane
Statistical Science Theses and Dissertations
If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?
We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …
Regression Analysis For Ordinal Outcomes In Matched Study Design: Applications To Alzheimer's Disease Studies, Elizabeth Austin
Regression Analysis For Ordinal Outcomes In Matched Study Design: Applications To Alzheimer's Disease Studies, Elizabeth Austin
Masters Theses
Alzheimer's Disease (AD) affects nearly 5.4 million Americans as of 2016 and is the most common form of dementia. The disease is characterized by the presence of neurofibrillary tangles and amyloid plaques [1]. The amount of plaques are measured by Braak stage, post-mortem. It is known that AD is positively associated with hypercholesterolemia [16]. As statins are the most widely used cholesterol-lowering drug, there may be associations between statin use and AD. We hypothesize that those who use statins, specifically lipophilic statins, are more likely to have a low Braak stage in post-mortem analysis.
In order to address this hypothesis, …
On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar
On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar
FIU Electronic Theses and Dissertations
Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …
Modelling The Common Risk Among Equities Using A New Time Series Model, Jingjia Chu
Modelling The Common Risk Among Equities Using A New Time Series Model, Jingjia Chu
Electronic Thesis and Dissertation Repository
A new additive structure of multivariate GARCH model is proposed where the dynamic changes of the conditional correlation between the stocks are aggregated by the common risk term. The observable sequence is divided into two parts, a common risk term and an individual risk term, both following a GARCH type structure. The conditional volatility of each stock will be the sum of these two conditional variance terms. All the conditional volatility of the stock can shoot up together because a sudden peak of the common volatility is a sign of the system shock.
We provide sufficient conditions for strict stationarity …
Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan
Informational Index And Its Applications In High Dimensional Data, Qingcong Yuan
Theses and Dissertations--Statistics
We introduce a new class of measures for testing independence between two random vectors, which uses expected difference of conditional and marginal characteristic functions. By choosing a particular weight function in the class, we propose a new index for measuring independence and study its property. Two empirical versions are developed, their properties, asymptotics, connection with existing measures and applications are discussed. Implementation and Monte Carlo results are also presented.
We propose a two-stage sufficient variable selections method based on the new index to deal with large p small n data. The method does not require model specification and especially focuses …
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …
Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee
Dimension Reduction And Variable Selection, Hossein Moradi Rekabdarkolaee
Theses and Dissertations
High-dimensional data are becoming increasingly available as data collection technology advances. Over the last decade, significant developments have been taking place in high-dimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics, signal processing, and environmental studies. Statistical techniques such as dimension reduction and variable selection play important roles in high dimensional data analysis. Sufficient dimension reduction provides a way to find the reduced space of the original space without a parametric model. This method has been widely applied in many scientific fields such as genetics, brain imaging analysis, econometrics, environmental sciences, etc. …
Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje
Shrinkage Estimation For Multivariate Hidden Markov Mixture Models, Mark Fiecas, Jürgen Franke, Rainer Von Sachs, Joseph Tadjuidje
Mark Fiecas
Bootstrapping Vs. Asymptotic Theory In Property And Casualty Loss Reserving, Andrew J. Difronzo Jr.
Bootstrapping Vs. Asymptotic Theory In Property And Casualty Loss Reserving, Andrew J. Difronzo Jr.
Honors Projects in Mathematics
One of the key functions of a property and casualty (P&C) insurance company is loss reserving, which calculates how much money the company should retain in order to pay out future claims. Most P&C insurance companies use non-stochastic (non-random) methods to estimate these future liabilities. However, future loss data can also be projected using generalized linear models (GLMs) and stochastic simulation. Two simulation methods that will be the focus of this project are: bootstrapping methodology, which resamples the original loss data (creating pseudo-data in the process) and fits the GLM parameters based on the new data to estimate the sampling …
Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs
Spectral Density Shrinkage For High-Dimensional Time Series, Mark Fiecas, Rainer Von Sachs
Mark Fiecas
A Study On The Correlation Of Bivariate And Trivariate Normal Models, Maria Del Pilar Orjuela
A Study On The Correlation Of Bivariate And Trivariate Normal Models, Maria Del Pilar Orjuela
FIU Electronic Theses and Dissertations
Suppose two or more variables are jointly normally distributed. If there is a common relationship between these variables it would be very important to quantify this relationship by a parameter called the correlation coefficient which measures its strength, and the use of it can develop an equation for predicting, and ultimately draw testable conclusion about the parent population.
This research focused on the correlation coefficient ρ for the bivariate and trivariate normal distribution when equal variances and equal covariances are considered. Particularly, we derived the maximum Likelihood Estimators (MLE) of the distribution parameters assuming all of them are unknown, and …
Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer
Multiple Testing Of Local Maxima For Detection Of Peaks In Chip-Seq Data, Armin Schwartzman, Andrew Jaffe, Yulia Gavrilov, Clifford A. Meyer
Harvard University Biostatistics Working Paper Series
No abstract provided.
On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei
On The Covariate-Adjusted Estimation For An Overall Treatment Difference With Data From A Randomized Comparative Clinical Trial, Lu Tian, Tianxi Cai, Lihui Zhao, L. J. Wei
Harvard University Biostatistics Working Paper Series
No abstract provided.
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
A Unified Approach To Non-Negative Matrix Factorization And Probabilistic Latent Semantic Indexing, Karthik Devarajan, Guoli Wang, Nader Ebrahimi
COBRA Preprint Series
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two matrices, W and H, each with nonnegative entries, V ~ WH. NMF has been shown to have a unique parts-based, sparse representation of the data. The nonnegativity constraints in NMF allow only additive combinations of the data which enables it to learn parts that have distinct physical representations in reality. In the last few years, NMF has been successfully applied in a variety of areas such as natural language processing, information retrieval, image processing, speech recognition …
Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler
Multiple Testing Of Local Maxima For Detection Of Unimodal Peaks In 1d, Armin Schwartzman, Yulia Gavrilov, Robert J. Adler
Harvard University Biostatistics Working Paper Series
No abstract provided.
Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale
Component Extraction Of Complex Biomedical Signal And Performance Analysis Based On Different Algorithm, Hemant Pasusangai Kasturiwale
Johns Hopkins University, Dept. of Biostatistics Working Papers
Biomedical signals can arise from one or many sources including heart ,brains and endocrine systems. Multiple sources poses challenge to researchers which may have contaminated with artifacts and noise. The Biomedical time series signal are like electroencephalogram(EEG),electrocardiogram(ECG),etc The morphology of the cardiac signal is very important in most of diagnostics based on the ECG. The diagnosis of patient is based on visual observation of recorded ECG,EEG,etc, may not be accurate. To achieve better understanding , PCA (Principal Component Analysis) and ICA algorithms helps in analyzing ECG signals . The immense scope in the field of biomedical-signal processing Independent Component Analysis( …
Mixture Of Factor Analyzers With Information Criteria And The Genetic Algorithm, Esra Turan
Mixture Of Factor Analyzers With Information Criteria And The Genetic Algorithm, Esra Turan
Doctoral Dissertations
In this dissertation, we have developed and combined several statistical techniques in Bayesian factor analysis (BAYFA) and mixture of factor analyzers (MFA) to overcome the shortcoming of these existing methods. Information Criteria are brought into the context of the BAYFA model as a decision rule for choosing the number of factors m along with the Press and Shigemasu method, Gibbs Sampling and Iterated Conditional Modes deterministic optimization. Because of sensitivity of BAYFA on the prior information of the factor pattern structure, the prior factor pattern structure is learned directly from the given sample observations data adaptively using Sparse Root algorithm. …
A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles
A Statistical Framework For The Analysis Of Chip-Seq Data, Pei Fen Kuan, Dongjun Chung, Guangjin Pan, James A. Thomson, Ron Stewart, Sunduz Keles
Sunduz Keles
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data.
We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and …
The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin
The Effect Of Correlation In False Discovery Rate Estimation, Armin Schwartzman, Xihong Lin
Harvard University Biostatistics Working Paper Series
No abstract provided.