Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

2021

Statistics

Discipline
Institution
Publication

Articles 1 - 24 of 24

Full-Text Articles in Physical Sciences and Mathematics

A Brief Treatise On Bayesian Inverse Regression., Debashis Chatterjee Dr. Dec 2021

A Brief Treatise On Bayesian Inverse Regression., Debashis Chatterjee Dr.

Doctoral Theses

Inverse problems, where in a broad sense the task is to learn from the noisy response about some unknown function, usually represented as the argument of some known functional form, has received wide attention in the general scientific disciplines. However, apart from the class of traditional inverse problems, there exists another class of inverse problems, which qualify as more authentic class of inverse problems, but unfortunately did not receive as much attention.In a nutshell, the other class of inverse problems can be described as the problem of predicting the covariates corresponding to given responses and the rest of the data. …


Exploring Improvements To The Convergence Of Reconstructing Historical Destructive Earthquakes, Kameron Lightheart Nov 2021

Exploring Improvements To The Convergence Of Reconstructing Historical Destructive Earthquakes, Kameron Lightheart

Theses and Dissertations

Determining risk to human populations due to natural disasters has been a topic of interest in the STEM fields for centuries. Earthquakes and the tsunamis they cause are of particular interest due to their repetition cycles. These cycles can last hundreds of years but we have only had modern measuring instruments for the last century or so which makes analysis difficult. In this document, we explore ways to improve upon an existing method for reconstructing earthquakes from historical accounts of tsunamis. This method was designed and implemented by Jared P Whitehead's research group over the last 5 years. The issue …


Some Nonparametric Hybrid Predictive Models : Asymptotic Properties And Applications., Tanujit Chakraborty Dr. Nov 2021

Some Nonparametric Hybrid Predictive Models : Asymptotic Properties And Applications., Tanujit Chakraborty Dr.

Doctoral Theses

Prediction problems like classification, regression, and time series forecasting have always attracted both the statisticians and computer scientists worldwide to take up the challenges of data science and implementation of complicated models using modern computing facilities. But most traditional statistical and machine learning models assume the available data to be well-behaved in terms of the presence of a full set of essential features, equal size of classes, and stationary data structures in all data instances, etc. Practical data sets from the domain of business analytics, process and quality control, software reliability, and macroeconomics, to name a few, suffer from various …


The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi Oct 2021

The Classification Of Basket Neural Cells In The Mammalian Neocortex, Sreya Pudi

Senior Theses

Basket neuronal cells of the mammalian neocortex have been classically categorized into two or more groups. Originally, it was thought that the large and small types are the naturally occurring groups that emerge from reasons that relate to neurobiological function and anatomical position. Later, a study based on anatomical and physiological features of these neurons introduced a third type, the net basket cell which is intermediate in size as compared to the large and small types. In this study, multivariate analysis was used to test the hypothesis that the large and small types are morphologically distinct groups. The results of …


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki Jun 2021

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds of articles, infographics, …


Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell May 2021

Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell

Undergraduate Theses and Capstone Projects

This thesis analyzes the correlation between a team’s statistics and the success of their performances, and develops a predictive model that can be used to forecast final season results for that team. Data from the 2017-2018 Premier League season is to be gathered and broken down within R to highlight what factors and variables are largely contributing to the success or downfall of a team. A multiple linear regression model and stepwise selection process is then used to include any factors that are significant in predicting in match results.

The predictions about the 17-18 season results based on the model …


Machine Learning With Topological Data Analysis, Ephraim Robert Love May 2021

Machine Learning With Topological Data Analysis, Ephraim Robert Love

Doctoral Dissertations

Topological Data Analysis (TDA) is a relatively new focus in the fields of statistics and machine learning. Methods of exploiting the geometry of data, such as clustering, have proven theoretically and empirically invaluable. TDA provides a general framework within which to study topological invariants (shapes) of data, which are more robust to noise and can recover information on higher dimensional features than immediately apparent in the data. A common tool for conducting TDA is persistence homology, which measures the significance of these invariants. Persistence homology has prominent realizations in methods of data visualization, statistics and machine learning. Extending ML with …


The Effect Of Initial Conditions On The Weather Research And Forecasting Model, Aaron D. Baker May 2021

The Effect Of Initial Conditions On The Weather Research And Forecasting Model, Aaron D. Baker

Electronic Theses and Dissertations

Modeling our atmosphere and determining forecasts using numerical methods has been a challenge since the early 20th Century. Most models use a complex dynamical system of equations that prove difficult to solve by hand as they are chaotic by nature. When computer systems became more widely adopted and available, approximating the solution of these equations, numerically, became easier as computational power increased. This advancement in computing has caused numerous weather models to be created and implemented across the world. However a challenge of approximating these solutions accurately still exists as each model have varying set of equations and variables to …


On Tests Of Independence Among Multiplerandom Vectors Of Arbitrary Dimensions., Angshuman Roy Dr. Apr 2021

On Tests Of Independence Among Multiplerandom Vectors Of Arbitrary Dimensions., Angshuman Roy Dr.

Doctoral Theses

Measures of dependence among several random vectors and associated tests of independence play a major role in different statistical applications. Blind source separation or independent component analysis (see, e.g., Hyv¨arinen et al., 2001; Shen et al., 2009), feature selection and feature extraction (see, e.g., Li et al., 2012), detection of serial correlation in time series (see, e.g., Ghoudi et al., 2001) and finding the causal relationships among the variables (see, e.g., Chakraborty and Zhang, 2019) are some examples of their wide-spread applications. Tests of independence has vast applications in other areas of sciences as well. For instance, to characterize the …


Does Defense Actually Win Championships? Using Statistics To Examine One Of The Greatest Stereotypes In Sports, Thomas Burkett Apr 2021

Does Defense Actually Win Championships? Using Statistics To Examine One Of The Greatest Stereotypes In Sports, Thomas Burkett

Senior Theses

A common saying in sports is that “defense wins championships.” However, the past decade of play in the modern NBA has seen a rise and focus in offensive efficiency and 3-pointers. This thesis tests whether defense can truly predict a championship winning team in today’s NBA through two-sample hypothesis testing and multiple logistic regression models. The results found that both defensive and offensive statistics were significant predictors of championship teams, meaning that a balanced team, rather than one specialized in defense alone, is a more accurate predictor of championship success.


The Wargaming Commodity Course Of Action Automated Analysis Method, William T. Deberry Mar 2021

The Wargaming Commodity Course Of Action Automated Analysis Method, William T. Deberry

Theses and Dissertations

This research presents the Wargaming Commodity Course of Action Automated Analysis Method (WCCAAM), a novel approach to assist wargame commanders in developing and analyzing courses of action (COAs) through semi-automation of the Military Decision Making Process (MDMP). MDMP is a seven-step iterative method that commanders and mission partners follow to build an operational course of action to achieve strategic objectives. MDMP requires time, resources, and coordination – all competing items the commander weighs to make the optimal decision. WCCAAM receives the MDMP's Mission Analysis phase as input, converts the wargame into a directed graph, processes a multi-commodity flow algorithm on …


Clustering Web Users By Mouse Movement To Detect Bots And Botnet Attacks, Justin L. Morgan Mar 2021

Clustering Web Users By Mouse Movement To Detect Bots And Botnet Attacks, Justin L. Morgan

Master's Theses

The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing …


Essays In Social Choice Theory., Dipjyoti Majumdar Dr. Feb 2021

Essays In Social Choice Theory., Dipjyoti Majumdar Dr.

Doctoral Theses

The purpose of this thesis is to explore some issues in social choice theory and decision theory. Social choice theory provides the theoretical foundations for the field of public choice and welfare economics. It tries to bring together normative aspects like perspective value judgements and positive aspects, like strategic con- siderations. The second feature which is our focus, is closely related to the problem of providing appropriate incentives to agents, an issue of prime importance in eco- nomics.Consider for example, a set of agents who must elect one among a set of can- didates. These candidates may be physical agents …


Genetics Of Pediatric Musculoskeletal Disorders, Lilian Antunes Jan 2021

Genetics Of Pediatric Musculoskeletal Disorders, Lilian Antunes

Arts & Sciences Electronic Theses and Dissertations

Pediatric musculoskeletal disorders are an extremely broad category of diseases that are often inherited. While individually rare, collectively these disorders are common, affecting around 3% of live births in the US. Despite the mounting clinical and molecular evidence for a genetic etiology, the cause for many patients with pediatric musculoskeletal disorders remain largely unknown. Major challenges in rare pediatric diseases include recruiting large numbers of patients and determining the significance and functional impacts of variants associated with disease within individuals or families. Whole exome sequencing (WES) is a powerful tool to identify coding variants that are associated with rare pediatric …


Machine Learning Morphisms: A Framework For Designing And Analyzing Machine Learning Work Ows, Applied To Separability, Error Bounds, And 30-Day Hospital Readmissions, Eric Zenon Cawi Jan 2021

Machine Learning Morphisms: A Framework For Designing And Analyzing Machine Learning Work Ows, Applied To Separability, Error Bounds, And 30-Day Hospital Readmissions, Eric Zenon Cawi

McKelvey School of Engineering Theses & Dissertations

A machine learning workflow is the sequence of tasks necessary to implement a machine learning application, including data collection, preprocessing, feature engineering, exploratory analysis, and model training/selection. In this dissertation we propose the Machine Learning Morphism (MLM) as a mathematical framework to describe the tasks in a workflow. The MLM is a tuple consisting of: Input Space, Output Space, Learning Morphism, Parameter Prior, Empirical Risk Function. This contains the information necessary to learn the parameters of the learning morphism, which represents a workflow task. In chapter 1, we give a short review of typical tasks present in a workflow, as …


Energy And Greenhouse Gas Savings For Leed-Certified U.S. Office Buildings Using Weighted Regression, Tian Liang Jan 2021

Energy And Greenhouse Gas Savings For Leed-Certified U.S. Office Buildings Using Weighted Regression, Tian Liang

Honors Papers

In this study, we studied the energy consumption and greenhouse gas emission performance of LEED-certified office buildings. We obtained the 2016 energy consumption and greenhouse gas emission data for 4002 office buildings from nine major US cities, including 522 buildings that we identified as LEED-certified. We discovered that LEED buildings used significantly more electricity percentagewise as their energy source. We also discovered that the locations and ages of buildings have significant effect on their performance. We removed the effect of locations and building ages using weighted regression. Our result showed that LEED office buildings used 11% less site energy, 9% …


Evaluation Of The Effect Of The Clinical-Decision-Support Systems On Diabetes Management: A Multivariate Meta-Analysis Comparison With Univariate Meta-Analysis, Abdelfattah Elbarsha Jan 2021

Evaluation Of The Effect Of The Clinical-Decision-Support Systems On Diabetes Management: A Multivariate Meta-Analysis Comparison With Univariate Meta-Analysis, Abdelfattah Elbarsha

Electronic Theses and Dissertations

The advantage of using meta-analysis lies in its ability in providing a quantitative summary of the findings from multiple studies. The aim of this dissertation was first to conduct a simulation study in order to understand what factors (sample size, between-study correlation, and percent of missing data) have a significant effect on meta-analysis estimates and whether using univariate or multivariate meta-analysis would produce different estimates.

The second goal of this study was to evaluate the effect of clinical decision support systems CDSS on diabetes care management by conducting three separate univariate meta-analyses and one multivariate meta-analysis. CDSS are health information …


The Combined Impact Of Continuous And Ordinal Auxiliary Variables On Missing Data Imputation In Sem, Salina Wu Whitaker Jan 2021

The Combined Impact Of Continuous And Ordinal Auxiliary Variables On Missing Data Imputation In Sem, Salina Wu Whitaker

Electronic Theses and Dissertations

“Modern” methods of addressing missing data using full-information maximum-likelihood (FIML) have become mainstays in SEM analyses. FIML allows the inclusion of auxiliary variables which carry information that is related to missing values and can reduce bias in parameter estimates. Past research has illustrated the benefits of auxiliary variable inclusion under different missingness conditions (MCAR and MNAR; e.g., Enders, 2008), missingness proportions (e.g., Collins et al., 2001), and although limited, missingness patterns (e.g., Yoo, 2009) in FIML analyses. While past studies have focused on the effects of either continuous or ordinal auxiliary variables, no study has included both types in their …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Statistical Modeling Of Hpc Performance Variability And Communication, Jered B. Dominguez-Trujillo Jan 2021

Statistical Modeling Of Hpc Performance Variability And Communication, Jered B. Dominguez-Trujillo

Computer Science ETDs

Understanding the performance of parallel and distributed programs remains a focal point in determining how compute systems can be optimized to achieve exascale performance. Lightweight, statistical models allow developers to both characterize and predict performance trade-offs, especially as HPC systems become more heterogeneous with many-core CPUs and GPUs. This thesis presents a lightweight, statistical modeling approach of performance variation which leverages extreme value theory by focusing on the maximum length of distributed workload intervals. This approach was implemented in MPI and evaluated on several HPC systems and workloads. I then present a performance model of partitioned communication which also uses …


Computational Simulation And Analysis Of Neuroplasticity, Madison E. Yancey Jan 2021

Computational Simulation And Analysis Of Neuroplasticity, Madison E. Yancey

Browse all Theses and Dissertations

Homeostatic synaptic plasticity is the process by which neurons alter their activity in response to changes in network activity. Neuroscientists attempting to understand homeostatic synaptic plasticity have developed three different mathematical methods to analyze collections of event recordings from neurons acting as a proxy for neuronal activity. These collections of events are from control data and treatment data, referring to the treatment of neuron cultures with pharmacological agents that augment or inhibit network activity. If the distribution of control events can be functionally mapped to the distribution of treatment events, a better understanding of the biological processes underlying homeostatic synaptic …


Assessing And Forecasting Chlorophyll Abundances In Minnesota Lake Using Remote Sensing And Statistical Approaches, Ben Von Korff Jan 2021

Assessing And Forecasting Chlorophyll Abundances In Minnesota Lake Using Remote Sensing And Statistical Approaches, Ben Von Korff

All Graduate Theses, Dissertations, and Other Capstone Projects

Harmful algae blooms (HABs) can negatively impact water quality, lake aesthetics, and can harm human and animal health. However, monitoring for HABs is rare in Minnesota. Detecting blooms which can vary spatially and may only be present briefly is challenging, so expanding monitoring in Minnesota would require the use of new and cost efficient technologies. Unmanned aerial vehicles (UAVs) were used for bloom mapping using RGB and near-infrared imagery. Real time monitoring was conducted in Bass Lake, in Faribault County, MN using trail cameras. Time series forecasting was conducted with high frequency chlorophyll-a data from a water quality sonde. Normalized …


Improving The Data Quality In Gravitation-Wave Detectors By Mitigating Transient Noise Artifacts, Kentaro Mogushi Jan 2021

Improving The Data Quality In Gravitation-Wave Detectors By Mitigating Transient Noise Artifacts, Kentaro Mogushi

Doctoral Dissertations

“The existence of gravitational waves (GWs), small perturbations in spacetime produced by accelerating massive objects was first predicted in 1916 as solutions of Einstein’s Theory of General Relativity (Einstein, 1916). Detecting and analyzing GWs produced by sources allows us to probe astrophysical phenomena.

The era of GW astronomy began from the first direct detection of the coalescence of a binary black hole in 2015 by the collaboration of the advanced Laser Interferometer Gravitational-wave Observatory (LIGO) (Aasi et al., 2015) and advanced Virgo (Abbott et al., 2016a). Since 2015, LIGO-Virgo detected about 50 confident transient events of GW signals (Abbott et …


Bickel-Rosenblatt Test Based On Tilted Estimation For Autoregressive Models & Deep Merged Survival Analysis On Cancer Study Using Multiple Types Of Bioinformatic Data, Yan Su Jan 2021

Bickel-Rosenblatt Test Based On Tilted Estimation For Autoregressive Models & Deep Merged Survival Analysis On Cancer Study Using Multiple Types Of Bioinformatic Data, Yan Su

Browse all Theses and Dissertations

This dissertation includes two topics, Bickel-Rosenblatt test based on tilted density estimation for autoregressive models and deep merged survival analysis on cancer study using multiple types of bioinformatic data. In the first topic study, we consider the goodness of fit test the error density of linear and nonlinear autoregressive models using tilted kernel density estimation based on residuals. Bickel-Rosenblatt test statistic is based on the integrated square error of non-parametric error density estimation and a smoothed version of the parametric fit of the density. It is shown that the new type of Bickel-Rosenblatt test statistics behaves asymptotically the same as …