Testing Hypotheses Of Covariance Structure In Multivariate Data, 2018 NOVA University of Lisbon
Testing Hypotheses Of Covariance Structure In Multivariate Data, Miguel Fonseca, Arkadiusz Koziol, Roman Zmyslony
Electronic Journal of Linear Algebra
In this paper there is given a new approach for testing hypotheses on the structure of covariance matrices in double multivariate data. It is proved that ratio of positive and negative parts of best unbiased estimators (BUE) provide an F-test for independence of blocks variables in double multivariate models.
A Comparison Of R, Sas, And Python Implementations Of Random Forests, 2018 Utah State University
A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua
All Graduate Plan B and other Reports
The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.
Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, 2018 Southern Methodist University
Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin
SMU Data Science Review
In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from . We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization . This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted ...
Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, 2018 Southern Methodist University
Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis
SMU Data Science Review
A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.
Combining Academics And Social Engagement: A Major-Specific Early Alert Method To Counter Student Attrition In Science, Technology, Engineering, And Mathematics, Andrew J. Sage, Cinzia Cervato, Ulrike Genschel, Craig Ogilvie
Geological and Atmospheric Sciences Publications
Students are most likely to leave science, technology, engineering, and mathematics (STEM) majors during their first year of college. We developed an analytic approach using random forests to identify at-risk students. This method is deployable midway through the first semester and accounts for academic preparation, early engagement in university life, and performance on midterm exams. By accounting for cognitive and noncognitive factors, our method achieves stronger predictive performance than would be possible using cognitive or noncognitive factors alone. We show that it is more difficult to predict whether students will leave STEM than whether they will leave the institution. More ...
Improving Shewhart Control Chart Performance In The Presence Of Measurement Error Using Multiple Measurements And Two-Stage Sampling, 2018 Auburn University Montgomery
Improving Shewhart Control Chart Performance In The Presence Of Measurement Error Using Multiple Measurements And Two-Stage Sampling, Kenneth W. Linna
Journal of International & Interdisciplinary Business Research
The usual Shewhart control chart efficiently detects large shifts in the mean of a quality characteristic and has been extensively studied in the literature. Most proposed alternatives to the Shewhart chart aim to improve either the signal performance for smaller mean shifts or reduce the sampling effort required to detect a larger shift. Measurement error has been shown in the literature to result in reduced power to detect process shifts. The combination of multiple measurements and two-stage sampling is considered here as a strategy for both regaining power lost due to measurement error and specifically tuning the charts for shifts ...
An Empirical Analysis Of Climatic, Geographic, And Cultural Determinants Of International Tourism, Ethan Straus
Each year, billions of people visit different countries all around the world. For many of those countries, tourism is their primary industry, leading to millions of jobs and dollars in revenue. It is expected that by 2020 total International Tourism Receipts will reach 2 trillion US dollars annually. Currently, tourism employs an estimated 200 million people around the world. With the continued progression of climate change, the tourism industry is facing a newfound threat. Global temperatures and the seal level are both expected to rise significantly by the end of the century. Additionally, the Intergovernmental Panel on Climate Change has ...
Inversion Copulas From Nonlinear State Space Models With An Application To Inflation Forecasting, 2018 Melbourne Business School
Inversion Copulas From Nonlinear State Space Models With An Application To Inflation Forecasting, Michael S. Smith, Worapree Ole Maneesoonthorn
Michael Stanley Smith
Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish
Electronic Theses and Dissertations
Of the most iconic fish species in the world, the Atlantic cod (Gadus morhua, hereafter, cod) has been a mainstay in the North Atlantic for centuries. While many global fish stocks have received increased pressure with the advent of new, more efficient fishing technology in the mid-20th century, exceptional pressure has been placed on this prized gadoid. Bycatch, or the unintended catch of organisms, is one of the biggest global fisheries issues. Directly resulting from the failed recovery of cod in the GoM, attention has been placed as to possible sources of unaccounted catch. Among the most prominent is ...
Discrete Ranked Set Sampling, 2018 Southern Methodist University
Discrete Ranked Set Sampling, Heng Cui
Statistical Science Theses and Dissertations
Ranked set sampling (RSS) is an efficient data collection framework compared to simple random sampling (SRS). It is widely used in various application areas such as agriculture, environment, sociology, and medicine, especially in situations where measurement is expensive but ranking is less costly. Most past research in RSS focused on situations where the underlying distribution is continuous. However, it is not unusual to have a discrete data generation mechanism. Estimating statistical functionals are challenging as ties may truly exist in discrete RSS. In this thesis, we started with estimating the cumulative distribution function (CDF) in discrete RSS. We proposed two ...
Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, 2018 Stephen F Austin State University
Evaluation Of Using The Bootstrap Procedure To Estimate The Population Variance, Nghia Trong Nguyen
Electronic Theses and Dissertations
The bootstrap procedure is widely used in nonparametric statistics to generate an empirical sampling distribution from a given sample data set for a statistic of interest. Generally, the results are good for location parameters such as population mean, median, and even for estimating a population correlation. However, the results for a population variance, which is a spread parameter, are not as good due to the resampling nature of the bootstrap method. Bootstrap samples are constructed using sampling with replacement; consequently, groups of observations with zero variance manifest in these samples. As a result, a bootstrap variance estimator will carry a ...
Analysis Challenges For High Dimensional Data, 2018 The University of Western Ontario
Analysis Challenges For High Dimensional Data, Bangxin Zhao
Electronic Thesis and Dissertation Repository
In this thesis, we propose new methodologies targeting the areas of high-dimensional variable screening, influence measure and post-selection inference. We propose a new estimator for the correlation between the response and high-dimensional predictor variables, and based on the estimator we develop a new screening technique termed Dynamic Tilted Current Correlation Screening (DTCCS) for high dimensional variables screening. DTCCS is capable of picking up the relevant predictor variables within a finite number of steps. The DTCCS method takes the popular used sure independent screening (SIS) method and the high-dimensional ordinary least squares projection (HOLP) approach as its special cases.
Two methods ...
Initial Evidence Of Construct Validity Of Data From A Self-Assessment Instrument Of Technological Pedagogical Content Knowledge (Tpack) In 2-Year Public College Faculty In Texas, 2018 University of Texas at Tyler
Initial Evidence Of Construct Validity Of Data From A Self-Assessment Instrument Of Technological Pedagogical Content Knowledge (Tpack) In 2-Year Public College Faculty In Texas, Kristin C. Scott
Human Resource Development Theses and Dissertations
Technological pedagogical content knowledge (TPACK) has been studied in K-12 faculty in the U.S. and around the world using survey methodology. Very few studies of TPACK in post-secondary faculty have been conducted and no peer-reviewed studies in U.S. post-secondary faculty have been published to date. The present study is the first reliability and validity of data from a TPACK survey to be conducted with a large sample of U.S. post-secondary faculty. The professorate of 2-year public college faculty in Texas will help their institutions meet the goals of the state’s higher education strategic plan, 60x30TX. In ...
Using Random Forests To Describe Equity In Higher Education: A Critical Quantitative Analysis Of Utah’S Postsecondary Pipelines, Tyler Mcdaniel
Butler Journal of Undergraduate Research
The following work examines the Random Forest (RF) algorithm as a tool for predicting student outcomes and interrogating the equity of postsecondary education pipelines. The RF model, created using longitudinal data of 41,303 students from Utah's 2008 high school graduation cohort, is compared to logistic and linear models, which are commonly used to predict college access and success. Substantially, this work finds High School GPA to be the best predictor of postsecondary GPA, whereas commonly used ACT and AP test scores are not nearly as important. Each model identified several demographic disparities in higher education access, most significantly ...
The Devil You Don’T Know: A Spatial Analysis Of Crime At Newark’S Prudential Center On Hockey Game Days, 2018 Institute for Security and Crime Science - University of Waikato
The Devil You Don’T Know: A Spatial Analysis Of Crime At Newark’S Prudential Center On Hockey Game Days, Justin Kurland, Eric Piza
Journal of Sport Safety and Security
Inspired by empirical research on spatial crime patterns in and around sports venues in the United Kingdom, this paper sought to measure the criminogenic extent of 216 hockey games that took place at the Prudential Center in Newark, NJ between 2007-2016. Do games generate patterns of crime in the areas beyond the arena, and if so, for what type of crime and how far? Police-recorded data for Newark are examined using a variety of exploratory methods and non-parametric permutation tests to visualize differences in crime patterns between game and non-game days across all of Newark and the downtown area. Change ...
Developing Statistical Methods For Data From Platforms Measuring Gene Expression, 2018 Southern Methodist University
Developing Statistical Methods For Data From Platforms Measuring Gene Expression, Gaoxiang Jia
Statistical Science Theses and Dissertations
This research contains two topics: (1) PBNPA: a permutation-based non-parametric analysis of CRISPR screen data; (2) RCRnorm: an integrated system of random-coefficient hierarchical regression models for normalizing NanoString nCounter data from FFPE samples.
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single spe- cific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level ...
The Influence Of A Proposed Margin Criterion On The Accuracy Of Parallel Analysis In Conditions Engendering Underextraction, 2018 Western Kentucky University
The Influence Of A Proposed Margin Criterion On The Accuracy Of Parallel Analysis In Conditions Engendering Underextraction, Justin M. Jones
Masters Theses & Specialist Projects
One of the most important decisions to make when performing an exploratory factor or principal component analysis regards the number of factors to retain. Parallel analysis is considered to be the best course of action in these circumstances as it consistently outperforms other factor extraction methods (Zwick & Velicer, 1986). Even so, parallel analysis could benefit from further research and refinement to improve its accuracy. Characteristics such as factor loadings, correlations between factors, and number of variables per factor all have been shown to adversely impact the effectiveness of parallel analysis as a means of identifying the number of factors (Pearson ...
Robust Estimation Of The Average Treatment Effect In Alzheimer's Disease Clinical Trials, 2018 Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
Robust Estimation Of The Average Treatment Effect In Alzheimer's Disease Clinical Trials, Michael Rosenblum, Aidan Mcdermont, Elizabeth Colantuoni
Johns Hopkins University, Dept. of Biostatistics Working Papers
The primary analysis of Alzheimer's disease clinical trials often involves a mixed-model repeated measure (MMRM) approach. We consider another estimator of the average treatment effect, called targeted minimum loss based estimation (TMLE). This estimator is more robust to violations of assumptions about missing data than MMRM.
We compare TMLE versus MMRM by analyzing data from a completed Alzheimer's disease trial data set and by simulation studies. The simulations involved different missing data distributions, where loss to followup at a given visit could depend on baseline variables, treatment assignment, and the outcome measured at previous visits. The TMLE generally ...
Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, 2018 Wesleyan University
Multivariate Spectral Analysis Of Crism Data To Characterize The Composition Of Mawrth Vallis, Melissa Luna
No abstract provided.
Incorporating Historical Models With Adaptive Bayesian Updates, 2018 The University Of Michigan
Incorporating Historical Models With Adaptive Bayesian Updates, Philip S. Boonstra, Ryan P. Barbaro
The University of Michigan Department of Biostatistics Working Paper Series
This paper considers Bayesian approaches for incorporating information from a historical model into a current analysis when the historical model includes only a subset of covariates currently of interest. The statistical challenge is two-fold. First, the parameters in the nested historical model are not generally equal to their counterparts in the larger current model, neither in value nor interpretation. Second, because the historical information will not be equally informative for all parameters in the current analysis, additional regularization may be required beyond that provided by the historical information. We propose several novel extensions of the so-called power prior that adaptively ...