Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology Commons

Open Access. Powered by Scholars. Published by Universities.®

Discipline
Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 954

Full-Text Articles in Statistical Methodology

Measuring Clinical Weight Loss In Young Children With Severe Obesity: Comparison Of Outcomes Using Zbmi, Modified Zbmi, And Percent Of 95th Percentile, Carolyn Bates May 2019

Measuring Clinical Weight Loss In Young Children With Severe Obesity: Comparison Of Outcomes Using Zbmi, Modified Zbmi, And Percent Of 95th Percentile, Carolyn Bates

Research Days

No abstract provided.


Characterizing The Tails Of Degree Distributions In Real-World Networks, Anna Broido May 2019

Characterizing The Tails Of Degree Distributions In Real-World Networks, Anna Broido

Applied Mathematics Graduate Theses & Dissertations

This is a thesis about how to characterize the statistical structure of the tails of degree distributions of real-world networks. The primary contribution is a statistical test of the prevalence of scale-free structure in real-world networks. A central claim in modern network science is that real-world networks are typically "scale free," meaning that the fraction of nodes with degree k follows a power law, decaying like k-a, often with 2 < a< 3. However, empirical evidence for this belief derives from a relatively small number of real-world networks. In the first section, we test the universality of scale-free structure by applying state-of-the-art statistical tools to a large corpus of nearly 1000 network data sets drawn from social, biological, technological, and informational sources. We fit the power-law model to each degree distribution, test its statistical plausibility, and compare it via a likelihood ratio test to alternative, non-scale-free models, e.g., the log-normal. Across domains, we find that scale-free networks are rare, with only 4% exhibiting the strongest-possible evidence of scale-free structure and 52% exhibiting the weakest-possible evidence. Furthermore, evidence of scale-free structure is not uniformly distributed across sources: social networks are at best weakly scale free, while a handful of technological and biological networks can be called strongly scale free. These results undermine the universality of scale-free networks and reveal that real-world networks exhibit a rich structural diversity that will likely require new ideas and mechanisms to explain. A core methodological component of addressing the ubiquity of scale-free structure in real-world networks is an ability to fit a power law to the degree distribution. In the second section, we numerically evaluate and compare, using both synthetic data with known structure and real-world data with unknown structure, two statistically principled methods for estimating the tail parameters for power-law distributions, showing that in practice, a method based on extreme value theory and a sophisticated bootstrap and the more commonly used method based an empirical minimization approach exhibit similar accuracy.


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt May 2019

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of ...


Large Scale Dynamical Model Of Macrophage/Hiv Interactions, Sean T. Bresnahan, Matthew M. Froid Mar 2019

Large Scale Dynamical Model Of Macrophage/Hiv Interactions, Sean T. Bresnahan, Matthew M. Froid

Student Research and Creative Activity Fair

Properties emerge from the dynamics of large-scale molecular networks that are not discernible at the individual gene or protein level. Mathematical models - such as probabilistic Boolean networks - of molecular systems offer a deeper insight into how these emergent properties arise. Here, we introduce a non-linear, deterministic Boolean model of protein, gene, and chemical interactions in human macrophage cells during HIV infection. Our model is composed of 713 nodes with 1583 interactions between nodes and is responsive to 38 different inputs including signaling molecules, bacteria, viruses, and HIV viral particles. Additionally, the model accurately simulates the dynamics of over 50 different ...


Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus Feb 2019

Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus

SDSU Data Science Symposium

Decision trees are a method commonly used in machine learning to either predict a categorical response or a continuous response variable. Once the tree partitions the space, the response is either determined by the majority vote – classification trees, or by averaging the response values – regression trees. This research builds a standard regression tree and then instead of averaging the responses, we train a neural network to determine the response value. We have found that our approach typically increases the predicative capability of the decision tree. We have 2 demonstrations of this approach that we wish to present as a poster ...


Multi-Linear Algebraic Eigendecompositions And Their Application In Data Science, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr. Feb 2019

Multi-Linear Algebraic Eigendecompositions And Their Application In Data Science, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr.

SDSU Data Science Symposium

Multi-dimensional data analysis has seen increased interest in recent years. With more and more data arriving as 2-dimensional arrays (images) as opposed to 1-dimensioanl arrays (signals), new methods for dimensionality reduction, data analysis, and machine learning have been pursued. Most notably have been the Canonical Decompositions/Parallel Factors (commonly referred to as CP) and Tucker decompositions (commonly regarded as a high order SVD: HOSVD). In the current research we present an alternate method for computing singular value and eigenvalue decompositions on multi-way data through an algebra of circulants and illustrate their application to two well-known machine learning methods: Multi-Linear Principal ...


A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan Jan 2019

A Novel Pathway-Based Distance Score Enhances Assessment Of Disease Heterogeneity In Gene Expression, Yunqing Liu, Xiting Yan

Yale Day of Data

Distance-based unsupervised clustering of gene expression data is commonly used to identify heterogeneity in biologic samples. However, high noise levels in gene expression data and the relatively high correlation between genes are often encountered, so traditional distances such as Euclidean distance may not be effective at discriminating the biological differences between samples. In this study, we developed a novel computational method to assess the biological differences based on pathways by assuming that ontologically defined biological pathways in biologically similar samples have similar behavior. Application of this distance score results in more accurate, robust, and biologically meaningful clustering results in both ...


Application Of Bradford’S Law Of Scattering On Research Publication In Astronomy & Astrophysics Of India, Satish Kumar, Senthilkumar R. Dec 2018

Application Of Bradford’S Law Of Scattering On Research Publication In Astronomy & Astrophysics Of India, Satish Kumar, Senthilkumar R.

Library Philosophy and Practice (e-journal)

The present study is focused on examining the application of Bradford’s law of scattering on research articles published in the field of Astronomy & Astrophysics by Indian scientist during 1988-2017. The bibliographic data was retrieved from Web of Science (WoS) bibliographic data base for different period of time. Total 18,877 journal’s article have been published by Indian scientist in the field of Astronomy & Astrophysics during 1988-2017 which was further retrieved and analyzed separately for different blocks of 10 years as well as for 30 years consolidated too. The core journal of the field was identified. The Bradford law ...


Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An Dec 2018

Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An

Arts & Sciences Electronic Theses and Dissertations

Inspired by classic cocktail-party problem, the basic Independent Component Analysis (ICA) model is created. What differs Independent Component Analysis (ICA) from other kinds of analysis is the intrinsic non-Gaussian assumption of the data. Several approaches are proposed based on maximizing the non-Gaussianity of the data, which is measured by kurtosis, mutual information, and others. With each estimation, we need to optimize the functions of expectations of non-quadratic functions since it can help us to access the higher-order statistics of non-Gaussian part of the data. In this thesis, our goal is to review the one of the most efficient estimation methods ...


Anisotropic Kernel Smoothing For Change-Point Data With An Analysis Of Fire Spread Rate Variability, John Ronald James Thompson Nov 2018

Anisotropic Kernel Smoothing For Change-Point Data With An Analysis Of Fire Spread Rate Variability, John Ronald James Thompson

Electronic Thesis and Dissertation Repository

Wildland fires are natural disturbances that enable the renewal of forests. However, these fires also place public safety and property at risk. Understanding forest fire spread in any region of Canada is critical to promoting forest health, and protecting human life and infrastructure. In 2014, Ontario updated its Wildland Fire Management Strategy, moving away from ``zone-based" decision making to ``appropriate response" decision making. This new strategy calls for an assessment of the risks and benefits of every wildland fire reported in the province. My research places the emphasis on the knowledge and understanding of fire spread rates and their variabilities ...


Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma Nov 2018

Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma

Electronic Thesis and Dissertation Repository

When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a ...


Analysis Of Covariance (Ancova) In Randomized Trials: More Precision, Less Conditional Bias, And Valid Confidence Intervals, Without Model Assumptions, Bingkai Wang, Elizabeth Ogburn, Michael Rosenblum Oct 2018

Analysis Of Covariance (Ancova) In Randomized Trials: More Precision, Less Conditional Bias, And Valid Confidence Intervals, Without Model Assumptions, Bingkai Wang, Elizabeth Ogburn, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

Covariate adjustment" in the randomized trial context refers to an estimator of the average treatment effect that adjusts for chance imbalances between study arms in baseline variables (called “covariates"). The baseline variables could include, e.g., age, sex, disease severity, and biomarkers. According to two surveys of clinical trial reports, there is confusion about the statistical properties of covariate adjustment. We focus on the ANCOVA estimator, which involves fitting a linear model for the outcome given the treatment arm and baseline variables, and trials with equal probability of assignment to treatment and control. We prove the following new (to the ...


Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao Oct 2018

Estimation In High-Dimensional Factor Models With Structural Instabilities, Wen Gao

Major Papers

In this major paper, we use high-dimensional models to analyze macroeconomic data which is in influenced by the break point. In particular, we consider to detect the break point and study the changes of the number of factors and the factor loadings with the structural instability.

Concretely, we propose two factor models which explain the processes of pre- and post- break periods. Then, we consider the break point as known or unknown. In both situations, we derive the shrinkage estimators by minimizing the penalized least square function and calculate the estimators of the numbers of pre- and post- break factors ...


Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri Oct 2018

Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri

Publications and Research

Water risk management is a ubiquitous challenge faced by stakeholders in the water or agricultural sector. We present a methodological framework for forecasting water storage requirements and present an application of this methodology to risk assessment in India. The application focused on forecasting crop water stress for potatoes grown during the monsoon season in the Satara district of Maharashtra. Pre-season large-scale climate predictors used to forecast water stress were selected based on an exhaustive search method that evaluates for highest ranked probability skill score and lowest root-mean-squared error in a leave-one-out cross-validation mode. Adaptive forecasts were made in the years ...


Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John Aug 2018

Minimizing The Perceived Financial Burden Due To Cancer, Hassan Azhar, Zoheb Allam, Gino Varghese, Daniel W. Engels, Sajiny John

SMU Data Science Review

In this paper, we present a regression model that predicts perceived financial burden that a cancer patient experiences in the treatment and management of the disease. Cancer patients do not fully understand the burden associated with the cost of cancer, and their lack of understanding can increase the difficulties associated with living with the disease, in particular coping with the cost. The relationship between demographic characteristics and financial burden were examined in order to better understand the characteristics of a cancer patient and their burden, while all subsets regression was used to determine the best predictors of financial burden. Age ...


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that ...


Testing Hypotheses Of Covariance Structure In Multivariate Data, Miguel Fonseca, Arkadiusz Koziol, Roman Zmyslony Aug 2018

Testing Hypotheses Of Covariance Structure In Multivariate Data, Miguel Fonseca, Arkadiusz Koziol, Roman Zmyslony

Electronic Journal of Linear Algebra

In this paper there is given a new approach for testing hypotheses on the structure of covariance matrices in double multivariate data. It is proved that ratio of positive and negative parts of best unbiased estimators (BUE) provide an F-test for independence of blocks variables in double multivariate models.


Robust Inference For The Stepped Wedge Design, James P. Hughes, Patrick J. Heagerty, Fan Xia, Yuqi Ren Aug 2018

Robust Inference For The Stepped Wedge Design, James P. Hughes, Patrick J. Heagerty, Fan Xia, Yuqi Ren

UW Biostatistics Working Paper Series

Based on a permutation argument, we derive a closed form expression for an estimate of the treatment effect, along with its standard error, in a stepped wedge design. We show that these estimates are robust to misspecification of both the mean and covariance structure of the underlying data-generating mechanism, thereby providing a robust approach to inference for the treatment effect in stepped wedge designs. We use simulations to evaluate the type I error and power of the proposed estimate and to compare the performance of the proposed estimate to the optimal estimate when the correct model specification is known. The ...


The U.S. Census Bureau Adopts Differential Privacy, John M. Abowd Aug 2018

The U.S. Census Bureau Adopts Differential Privacy, John M. Abowd

Labor Dynamics Institute

The U.S. Census Bureau announced, via its Scientific Advisory Committee, that it would protect the publications of the 2018 End-to-End Census Test (E2E) using differential privacy. The E2E test is a dress rehearsal for the 2020 Census, the constitutionally mandated enumeration of the population used to reapportion the House of Representatives and redraw every legislative district in the country. Systems that perform successfully in the E2E test are then used in the production of the 2020 Census.
Motivation: The Census Bureau conducted internal research that confirmed that the statistical disclosure limitation systems used for the 2000 and 2010 Censuses ...


Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor Aug 2018

Bayesian Analytical Approaches For Metabolomics : A Novel Method For Molecular Structure-Informed Metabolite Interaction Modeling, A Novel Diagnostic Model For Differentiating Myocardial Infarction Type, And Approaches For Compound Identification Given Mass Spectrometry Data., Patrick J. Trainor

Electronic Theses and Dissertations

Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics ...


Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal Aug 2018

Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal

Electronic Theses and Dissertations

This dissertation consists of three projects and can be categorized in two broad research areas: generalized spatiotemporal modeling and causal inference based on observational data. In the first project, I introduce a Bayesian hierarchical mixed effect hurdle model with a nested random effect structure to model the count for primary care providers and understand their spatial and temporal variation. This study further enables us to identify the health professional shortage areas and the possible impacting factors. In the second project, I have unified popular parametric and nonparametric propensity score-based methods to assess the treatment effect of multiple groups for ordinal ...


A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua Aug 2018

A Comparison Of R, Sas, And Python Implementations Of Random Forests, Breckell Soifua

All Graduate Plan B and other Reports

The Random Forest method is a useful machine learning tool developed by Leo Breiman. There are many existing implementations across different programming languages; the most popular of which exist in R, SAS, and Python. In this paper, we conduct a comprehensive comparison of these implementations with regards to the accuracy, variable importance measurements, and timing. This comparison was done on a variety of real and simulated data with different classification difficulty levels, number of predictors, and sample sizes. The comparison shows unexpectedly different results between the three implementations.


Bayesian Sparse Propensity Score Estimation For Unit Nonresponse, Hejian Sang, Gyuhyeong Goh, Jae Kwang Kim Jul 2018

Bayesian Sparse Propensity Score Estimation For Unit Nonresponse, Hejian Sang, Gyuhyeong Goh, Jae Kwang Kim

Statistics Preprints

Nonresponse weighting adjustment using propensity score is a popular method for handling unit nonresponse. However, including all available auxiliary variables into the propensity model can lead to inefficient and inconsistent estimation, especially with high-dimensional covariates. In this paper, a new Bayesian method using the Spike-and-Slab prior is proposed for sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. Instead of doing model selec- tion and parameter estimation separately as in many frequentist methods, the proposed method simultaneously selects the sparse response probability model and provides consistent parameter ...


Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin Jul 2018

Predictions Generated From A Simulation Engine For Gene Expression Micro-Arrays For Use In Research Laboratories, Gopinath R. Mavankal, John Blevins, Dominique Edwards, Monnie Mcgee, Andrew Hardin

SMU Data Science Review

In this paper we introduce the technical components, the biology and data science involved in the use of microarray technology in biological and clinical research. We discuss how laborious experimental protocols involved in obtaining this data used in laboratories could benefit from using simulations of the data. We discuss the approach used in the simulation engine from [7]. We use this simulation engine to generate a prediction tool in Power BI, a Microsoft, business intelligence tool for analytics and data visualization [22]. This tool could be used in any laboratory using micro-arrays to improve experimental design by comparing how predicted ...


Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis Jul 2018

Data Scientist’S Analysis Toolbox: Comparison Of Python, R, And Sas Performance, Jim Brittain, Mariana Cendon, Jennifer Nizzi, John Pleis

SMU Data Science Review

A quantitative analysis will be performed on experiments utilizing three different tools used for Data Science. The analysis will include replication of analysis along with comparisons of code length, output, and results. Qualitative data will supplement the quantitative findings. The conclusion will provide data support guidance on the correct tool to use for common situations in the field of Data Science.


Hierarchical Bayesian Data Fusion Using Autoencoders, Yevgeniy Vladimirovich Reznichenko Jul 2018

Hierarchical Bayesian Data Fusion Using Autoencoders, Yevgeniy Vladimirovich Reznichenko

Master's Theses (2009 -)

In this thesis, a novel method for tracker fusion is proposed and evaluated for vision-based tracking. This work combines three distinct popular techniques into a recursive Bayesian estimation algorithm. First, semi supervised learning approaches are used to partition data and to train a deep neural network that is capable of capturing normal visual tracking operation and is able to detect anomalous data. We compare various methods by examining their respective receiver operating conditions (ROC) curves, which represent the trade off between specificity and sensitivity for various detection threshold levels. Next, we incorporate the trained neural networks into an existing data ...


Combining Academics And Social Engagement: A Major-Specific Early Alert Method To Counter Student Attrition In Science, Technology, Engineering, And Mathematics, Andrew J. Sage, Cinzia Cervato, Ulrike Genschel, Craig Ogilvie Jun 2018

Combining Academics And Social Engagement: A Major-Specific Early Alert Method To Counter Student Attrition In Science, Technology, Engineering, And Mathematics, Andrew J. Sage, Cinzia Cervato, Ulrike Genschel, Craig Ogilvie

Geological and Atmospheric Sciences Publications

Students are most likely to leave science, technology, engineering, and mathematics (STEM) majors during their first year of college. We developed an analytic approach using random forests to identify at-risk students. This method is deployable midway through the first semester and accounts for academic preparation, early engagement in university life, and performance on midterm exams. By accounting for cognitive and noncognitive factors, our method achieves stronger predictive performance than would be possible using cognitive or noncognitive factors alone. We show that it is more difficult to predict whether students will leave STEM than whether they will leave the institution. More ...


Improving Shewhart Control Chart Performance In The Presence Of Measurement Error Using Multiple Measurements And Two-Stage Sampling, Kenneth W. Linna Jun 2018

Improving Shewhart Control Chart Performance In The Presence Of Measurement Error Using Multiple Measurements And Two-Stage Sampling, Kenneth W. Linna

Journal of International & Interdisciplinary Business Research

The usual Shewhart control chart efficiently detects large shifts in the mean of a quality characteristic and has been extensively studied in the literature. Most proposed alternatives to the Shewhart chart aim to improve either the signal performance for smaller mean shifts or reduce the sampling effort required to detect a larger shift. Measurement error has been shown in the literature to result in reduced power to detect process shifts. The combination of multiple measurements and two-stage sampling is considered here as a strategy for both regaining power lost due to measurement error and specifically tuning the charts for shifts ...


An Empirical Analysis Of Climatic, Geographic, And Cultural Determinants Of International Tourism, Ethan Straus Jun 2018

An Empirical Analysis Of Climatic, Geographic, And Cultural Determinants Of International Tourism, Ethan Straus

Honors Theses

Each year, billions of people visit different countries all around the world. For many of those countries, tourism is their primary industry, leading to millions of jobs and dollars in revenue. It is expected that by 2020 total International Tourism Receipts will reach 2 trillion US dollars annually. Currently, tourism employs an estimated 200 million people around the world. With the continued progression of climate change, the tourism industry is facing a newfound threat. Global temperatures and the seal level are both expected to rise significantly by the end of the century. Additionally, the Intergovernmental Panel on Climate Change has ...


A 3d Characteristics Database Of Land Engraved Areas With Known Subclass, Entni Lin Jun 2018

A 3d Characteristics Database Of Land Engraved Areas With Known Subclass, Entni Lin

Student Theses

Subclass characteristics on bullets may mislead firearm examiners when they rely on traditional 2D images. In order to provide indelible examples for training and help avoid identification errors, 3D topography surface maps and statistical methods of pattern recognition are applied to toolmarks on bullets containing known subclass characteristics. This research was conducted by collecting 3D topography surface map data from land engraved areas of bullets fired through known barrels. This data was processed and used to train the statistical algorithms to predict their origin. The results from the algorithm are compared with the “right answers” (i.e. correct IDs) of ...