Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 99

Full-Text Articles in Physical Sciences and Mathematics

Decision Based Learning Course Design & Implementation For Introductory Statistics, Austin Heath Aug 2021

Decision Based Learning Course Design & Implementation For Introductory Statistics, Austin Heath

Undergraduate Honors Theses

Researchers in multiple industries (biomedicine, engineering, etc.) cite the selection of an appropriate statistical test as a common problem. Experts draw on a framework of conceptual and procedural knowledge to navigate when to use statistical methods. Students also struggle determining the correct statistical method to use for a given research question. This is because they lack the opportunity to practice recognizing a host of features in each research question that provide clues for experts as to which method is most appropriate. “Decision Based Learning” (DBL) is a teaching method designed to help teachers and students address this struggle. In this …


Intermountain West Lichen Dna Reference Library, Brian Colgrove Jun 2021

Intermountain West Lichen Dna Reference Library, Brian Colgrove

Undergraduate Honors Theses

Accurate estimates of biodiversity can play crucial roles in monitoring ecological health. BYU’s Lichen Air Quality Biomonitoring program (Wright) represents one of the largest biomonitoring programs in the nation. Recently, DNA metabarcoding approaches have shown promise in streamlining lichen biodiversity inventories. However, to date, lichen diversity of western North America is poorly represented in available DNA reference libraries, preventing biologists and land managers from using DNA barcoding to identify unknown specimens. To solve this problem, I have developed a DNA reference library for over 500 species occurring in the Intermountain West region. Using bioinformatic and statistical tools, I created a …


An Actuarial Approach To Personal Injury Protection Severity, Jason Colgrove Mar 2020

An Actuarial Approach To Personal Injury Protection Severity, Jason Colgrove

Undergraduate Honors Theses

Insurance companies examine the risk of financial losses for their policyholders as a way to accurately price insurance policies. Within the automobile insurance sector, the frequency of crashes and the associated liabilities started to increase in late 2013 when it had been on the decline for close to a decade. The purpose of this research focuses on the possible correlated variables that could lead to a better understanding of this change. To embark on this task, we teamed up with the Society of Actuaries, Casualty Actuarial Society, and the American Property Casualty Insurance Association to obtain data regarding frequency, severity, …


Examining Multimorbidities Using Association Rule Learning, Kaylee Dudley Jun 2018

Examining Multimorbidities Using Association Rule Learning, Kaylee Dudley

Undergraduate Honors Theses

All insurance companies, regardless of the kind of insurance they offer, do their best to predict the future by comparing current to historical information. Any statistically significant correlation, regardless of expectations and hidden factors, can help to actuarially model future behavior. Using deidentified data from over 6 million health insurance policies over one year, we looked for any significant groupings of medical issues. The medical issues are defined based on the commercial “Episode Treatment Groups” (ETGs) classification, and our claims contain 347 different ETGs. We performed different kinds of analysis, including Bayesian posterior cluster analysis, k-means cluster analysis, and association …


An Integrated Screening And Optimization Strategy, Nathaniel Jackson Rohbock Jul 2012

An Integrated Screening And Optimization Strategy, Nathaniel Jackson Rohbock

Theses and Dissertations

Within statistical methods, design of experiments (DOE) is well suited to make good inference from a minimal amount of data. Two types of designs within DOE are screening designs and optimization designs. Traditionally, these approaches have been necessarily separated by a gap between the objectives of each design and the methods available. Despite being so separated, in practice these designs are frequently connected by sequential experimentation. In fact, from the genesis of a project, the experimentor often knows that both designs will be necessary to accomplish his objectives. Due to advances in the understanding of experimental designs with complex aliasing …


An Applied Investigation Of Gaussian Markov Random Fields, Jessica Lyn Olsen Jun 2012

An Applied Investigation Of Gaussian Markov Random Fields, Jessica Lyn Olsen

Theses and Dissertations

Recently, Bayesian methods have become the essence of modern statistics, specifically, the ability to incorporate hierarchical models. In particular, correlated data, such as the data found in spatial and temporal applications, have benefited greatly from the development and application of Bayesian statistics. One particular application of Bayesian modeling is Gaussian Markov Random Fields. These methods have proven to be very useful in providing a framework for correlated data. I will demonstrate the power of GMRFs by applying this method to two sets of data; a set of temporal data involving car accidents in the UK and a set of spatial …


Xprime-Em: Eliciting Expert Prior Information For Motif Exploration Using The Expectation-Maximization Algorithm, Wei Zhou Jun 2012

Xprime-Em: Eliciting Expert Prior Information For Motif Exploration Using The Expectation-Maximization Algorithm, Wei Zhou

Theses and Dissertations

Understanding the possible mechanisms of gene transcription regulation is a primary challenge for current molecular biologists. Identifying transcription factor binding sites (TFBSs), also called DNA motifs, is an important step in understanding these mechanisms. Furthermore, many human diseases are attributed to mutations in TFBSs, which makes identifying those DNA motifs significant for disease treatment. Uncertainty and variations in specific nucleotides of TFBSs present difficulties for DNA motif searching. In this project, we present an algorithm, XPRIME-EM (Eliciting EXpert PRior Information for Motif Exploration using the Expectation-Maximization Algorithm), which can discover known and de novo (unknown) DNA motifs simultaneously from a …


Estimation Of The Effects Of Parental Measures On Child Aggression Using Structural Equation Modeling, Jordan Daniel Pyper Jun 2012

Estimation Of The Effects Of Parental Measures On Child Aggression Using Structural Equation Modeling, Jordan Daniel Pyper

Theses and Dissertations

A child's parents are the primary source of knowledge and learned behaviors for developing children, and the benefits or repercussions of certain parental practices can be long lasting. Although parenting practices affect behavioral outcomes for children, families tend to be diverse in their circumstances and needs. Research attempting to ascertain cause and effect relationships between parental influences and child behavior can be difficult due to the complex nature of family dynamics and the intricacies of real life. Structural equation modeling (SEM) is an appropriate method for this research as it is able to account for the complicated nature of child-parent …


Support Vector Machines For Classification And Imputation, Spencer David Rogers May 2012

Support Vector Machines For Classification And Imputation, Spencer David Rogers

Theses and Dissertations

Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios …


Species Identification And Strain Attribution With Unassembled Sequencing Data, Owen Eric Francis Apr 2012

Species Identification And Strain Attribution With Unassembled Sequencing Data, Owen Eric Francis

Theses and Dissertations

Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in …


Hitters Vs. Pitchers: A Comparison Of Fantasy Baseball Player Performances Using Hierarchical Bayesian Models, Scott D. Huddleston Apr 2012

Hitters Vs. Pitchers: A Comparison Of Fantasy Baseball Player Performances Using Hierarchical Bayesian Models, Scott D. Huddleston

Theses and Dissertations

In recent years, fantasy baseball has seen an explosion in popularity. Major League Baseball, with its long, storied history and the enormous quantity of data available, naturally lends itself to the modern-day recreational activity known as fantasy baseball. Fantasy baseball is a game in which participants manage an imaginary roster of real players and compete against one another using those players' real-life statistics to score points. Early forms of fantasy baseball began in the early 1960s, but beginning in the 1990s, the sport was revolutionized due to the advent of powerful computers and the Internet. The data used in this …


Using An Experimental Mixture Design To Identify Experimental Regions With High Probability Of Creating A Homogeneous Monolithic Column Capable Of Flow, Charles C. Willden Apr 2012

Using An Experimental Mixture Design To Identify Experimental Regions With High Probability Of Creating A Homogeneous Monolithic Column Capable Of Flow, Charles C. Willden

Theses and Dissertations

Graduate students in the Brigham Young University Chemistry Department are working to develop a filtering device that can be used to separate substances into their constituent parts. The device consists of a monomer and water mixture that is polymerized into a monolith inside of a capillary. The ideal monolith is completely solid with interconnected pores that are small enough to cause the constituent parts to pass through the capillary at different rates, effectively separating the substance. Although the end objective is to minimize pore sizes, it is necessary to first identify an experimental region where any combination of input variables …


Bayesian Pollution Source Apportionment Incorporating Multiple Simultaneous Measurements, Jonathan Casey Christensen Mar 2012

Bayesian Pollution Source Apportionment Incorporating Multiple Simultaneous Measurements, Jonathan Casey Christensen

Theses and Dissertations

We describe a method to estimate pollution profiles and contribution levels for distinct prominent pollution sources in a region based on daily pollutant concentration measurements from multiple measurement stations over a period of time. In an extension of existing work, we will estimate common source profiles but distinct contribution levels based on measurements from each station. In addition, we will explore the possibility of extending existing work to allow adjustments for synoptic regimes—large scale weather patterns which may effect the amount of pollution measured from individual sources as well as for particular pollutants. For both extensions we propose Bayesian methods …


Predicting Maximal Oxygen Consumption (Vo2max) Levels In Adolescents, Brent A. Shepherd Mar 2012

Predicting Maximal Oxygen Consumption (Vo2max) Levels In Adolescents, Brent A. Shepherd

Theses and Dissertations

Maximal oxygen consumption (VO2max) is considered by many to be the best overall measure of an individual's cardiovascular health. Collecting the measurement, however, requires subjecting an individual to prolonged periods of intense exercise until their maximal level, the point at which their body uses no additional oxygen from the air despite increased exercise intensity, is reached. Collecting VO2max data also requires expensive equipment and great subject discomfort to get accurate results. Because of this inherent difficulty, it is often avoided despite its usefulness. In this research, we propose a set of Bayesian hierarchical models to predict VO2max levels in adolescents, …


The Effect Of Smoking On Tuberculosis Incidence In Burdened Countries, Natalie Noel Ellison Mar 2012

The Effect Of Smoking On Tuberculosis Incidence In Burdened Countries, Natalie Noel Ellison

Theses and Dissertations

It is estimated that one third of the world's population is infected with tuberculosis. Though once thought a "dead" disease, tuberculosis is very much alive. The rise of drug resistant strains of tuberculosis, and TB-HIV coinfection have made tuberculosis an even greater worldwide threat. While HIV, poverty, and public health infrastructure are historically assumed to affect the burden of tuberculosis, recent research has been done to implicate smoking in this list. This analysis involves combining data from multiple sources in order determine if smoking is a statistically significant factor in predicting the number of incident tuberculosis cases in a country. …


Screening Designs That Minimize Model Dependence, Kenneth P. Fairchild Dec 2011

Screening Designs That Minimize Model Dependence, Kenneth P. Fairchild

Theses and Dissertations

When approaching a new research problem, we often use screening designs to determine which factors are worth exploring in more detail. Before exploring a problem, we don't know which factors are important. When examining a large number of factors, it is likely that only a handful are significant and that even fewer two-factor interactions will be significant. If there are important interactions, it is likely that they are connected with the handful of significant main effects. Since we don't know beforehand which factors are significant, we want to choose a design that gives us the highest probability a priori of …


Assessing The Effect Of Wal-Mart In Rural Utah Areas, Angela Nelson Jul 2011

Assessing The Effect Of Wal-Mart In Rural Utah Areas, Angela Nelson

Theses and Dissertations

Walmart and other “big box” stores seek to expand in rural markets, possibly due to cheap land and lack of zoning laws. In August 2000, Walmart opened a store in Ephraim, a small rural town in central Utah. It is of interest to understand how Walmart's entrance into the local market changes the sales tax revenue base for Ephraim and for the surrounding municipalities. It is thought that small “Mom and Pop” stores go out of business because they cannot compete with Walmart's prices, leading to a decrease in variety, selection, convenience, and most importantly, sales tax revenue base in …


An Introduction To Bayesian Methodology Via Winbugs And Proc Mcmc, Heidi Lula Lindsey Jul 2011

An Introduction To Bayesian Methodology Via Winbugs And Proc Mcmc, Heidi Lula Lindsey

Theses and Dissertations

Bayesian statistical methods have long been computationally out of reach because the analysis often requires integration of high-dimensional functions. Recent advancements in computational tools to apply Markov Chain Monte Carlo (MCMC) methods are making Bayesian data analysis accessible for all statisticians. Two such computer tools are Win-BUGS and SASR 9.2's PROC MCMC. Bayesian methodology will be introduced through discussion of fourteen statistical examples with code and computer output to demonstrate the power of these computational tools in a wide variety of settings.


Hierarchical Probit Models For Ordinal Ratings Data, Allison M. Butler Jun 2011

Hierarchical Probit Models For Ordinal Ratings Data, Allison M. Butler

Theses and Dissertations

University students often complete evaluations of their courses and instructors. The evaluation tool typically contains questions about the course and the instructor on an ordinal Likert scale. We assess instructor effectiveness while adjusting for known confounders. We present a probit regression model with a latent variable to measure the instructor effectiveness accounting for student specific covariates, such as student grade in the course, high school and university GPA, and ACT score.


A Bayesian Approach To Missile Reliability, Taylor Hardison Redd Jun 2011

A Bayesian Approach To Missile Reliability, Taylor Hardison Redd

Theses and Dissertations

Each year, billions of dollars are spent on missiles and munitions by the United States government. It is therefore vital to have a dependable method to estimate the reliability of these missiles. It is important to take into account the age of the missile, the reliability of different components of the missile, and the impact of different launch phases on missile reliability. Additionally, it is of importance to estimate the missile performance under a variety of test conditions, or modalities. Bayesian logistic regression is utilized to accurately make these estimates. This project presents both previously proposed methods and ways to …


Adaptive Threat Detector Testing Using Bayesian Gaussian Process Models, Bradley Thomas Ferguson May 2011

Adaptive Threat Detector Testing Using Bayesian Gaussian Process Models, Bradley Thomas Ferguson

Theses and Dissertations

Detection of biological and chemical threats is an important consideration in the modern national defense policy. Much of the testing and evaluation of threat detection technologies is performed without appropriate uncertainty quantification. This paper proposes an approach to analyzing the effect of threat concentration on the probability of detecting chemical and biological threats. The approach uses a probit semi-parametric formulation between threat concentration level and the probability of instrument detection. It also utilizes a bayesian adaptive design to determine at which threat concentrations the tests should be performed. The approach offers unique advantages, namely, the flexibility to model non-monotone curves …


Variable Selection And Parameter Estimation Using A Continuous And Differentiable Approximation To The L0 Penalty Function, Douglas Nielsen Vanderwerken Mar 2011

Variable Selection And Parameter Estimation Using A Continuous And Differentiable Approximation To The L0 Penalty Function, Douglas Nielsen Vanderwerken

Theses and Dissertations

L0 penalized likelihood procedures like Mallows' Cp, AIC, and BIC directly penalize for the number of variables included in a regression model. This is a straightforward approach to the problem of overfitting, and these methods are now part of every statistician's repertoire. However, these procedures have been shown to sometimes result in unstable parameter estimates as a result on the L0 penalty's discontinuity at zero. One proposed alternative, seamless-L0 (SELO), utilizes a continuous penalty function that mimics L0 and allows for stable estimates. Like other similar methods (e.g. LASSO and SCAD), SELO produces sparse solutions because the penalty function is …


Hierarchical Bayesian Methods For Evaluation Of Traffic Project Efficacy, Andrew Nolan Olsen Mar 2011

Hierarchical Bayesian Methods For Evaluation Of Traffic Project Efficacy, Andrew Nolan Olsen

Theses and Dissertations

A main objective of Departments of Transportation is to improve the safety of the roadways over which they have jurisdiction. Safety projects, such as cable barriers and raised medians, are utilized to reduce both crash frequency and crash severity. The efficacy of these projects must be evaluated in order to use resources in the best way possible. Five models are proposed for the evaluation of traffic projects: (1) a Bayesian Poisson regression model; (2) a hierarchical Poisson regression model building on model (1) by adding hyperpriors; (3) a similar model correcting for overdispersion; (4) a dynamic linear model; and (5) …


Utilizing Universal Probability Of Expression Code (Upc) To Identify Disrupted Pathways In Cancer Samples, Michelle Rachel Withers Mar 2011

Utilizing Universal Probability Of Expression Code (Upc) To Identify Disrupted Pathways In Cancer Samples, Michelle Rachel Withers

Theses and Dissertations

Understanding the role of deregulated biological pathways in cancer samples has the potential to improve cancer treatment, making it more effective by selecting treatments that reverse the biological cause of the cancer. One of the challenges with pathway analysis is identifying a deregulated pathway in a given sample. This project develops the Universal Probability of Expression Code (UPC), a profile of a single deregulated biological path- way, and projects it into a cancer cell to determine if it is present. One of the benefits of this method is that rather than use information from a single over-expressed gene, it pro- …


Parameter Estimation For The Two-Parameter Weibull Distribution, Mark A. Nielsen Mar 2011

Parameter Estimation For The Two-Parameter Weibull Distribution, Mark A. Nielsen

Theses and Dissertations

The Weibull distribution, an extreme value distribution, is frequently used to model survival, reliability, wind speed, and other data. One reason for this is its flexibility; it can mimic various distributions like the exponential or normal. The two-parameter Weibull has a shape (γ) and scale (β) parameter. Parameter estimation has been an ongoing search to find efficient, unbiased, and minimal variance estimators. Through data analysis and simulation studies, the following three methods of estimation will be discussed and compared: maximum likelihood estimation (MLE), method of moments estimation (MME), and median rank regression (MRR). The analysis of wind speed data from …


Assessment Of Acgh Clustering Methodologies, Serena F. Baker Oct 2010

Assessment Of Acgh Clustering Methodologies, Serena F. Baker

Theses and Dissertations

Array comparative genomic hybridization (aCGH) is a technique for identifying duplications and deletions of DNA at specific locations across a genome. Potential objectives of aCGH analysis are the identification of (1) altered regions for a given subject, (2) altered regions across a set of individuals, and (3) clinically relevant clusters of hybridizations. aCGH analysis can be particularly useful when it identifies previously unknown clusters with clinical relevance. This project focuses on the assessment of existing aCGH clustering methodologies. Three methodologies are considered: hierarchical clustering, weighted clustering of called aCGH data, and clustering based on probabilistic recurrent regions of alteration within …


Application Of Convex Methods To Identification Of Fuzzy Subpopulations, Ryan Lee Eliason Sep 2010

Application Of Convex Methods To Identification Of Fuzzy Subpopulations, Ryan Lee Eliason

Theses and Dissertations

In large observational studies, data are often highly multivariate with many discrete and continuous variables measured on each observational unit. One often derives subpopulations to facilitate analysis. Traditional approaches suggest modeling such subpopulations with a compilation of interaction effects. However, when many interaction effects define each subpopulation, it becomes easier to model membership in a subpopulation rather than numerous interactions. In many cases, subjects are not complete members of a subpopulation but rather partial members of multiple subpopulations. Grade of Membership scores preserve the integrity of this partial membership. By generalizing an analytic chemistry concept related to chromatography-mass spectrometry, we …


Cluster And Classification Analysis Of Fossil Invertebrates Within The Bird Spring Formation, Arrow Canyon, Nevada: Implications For Relative Rise And Fall Of Sea-Level, Scott L. Morris Apr 2010

Cluster And Classification Analysis Of Fossil Invertebrates Within The Bird Spring Formation, Arrow Canyon, Nevada: Implications For Relative Rise And Fall Of Sea-Level, Scott L. Morris

Theses and Dissertations

Carbonate strata preserve indicators of local marine environments through time. Such indicators often include microfossils that have relatively unique conditions under which they can survive, including light, nutrients, salinity, and especially water temperature. As such, microfossils are environmental proxies. When these microfossils are preserved in the rock record, they constitute key components of depositional facies. Spence et al. (2004, 2007) has proposed several approaches for determining the facies of a given stratigraphic succession based upon these proxies. Cluster analysis can be used to determine microfossil groups that represent specific environmental conditions. Identifying which microfossil groups exist through time can indicate …


Parameter Estimation In Linear-Linear Segmented Regression, Erika Lyn Hernandez Apr 2010

Parameter Estimation In Linear-Linear Segmented Regression, Erika Lyn Hernandez

Theses and Dissertations

Segmented regression is a type of nonlinear regression that allows differing functional forms to be fit over different ranges of the explanatory variable. This paper considers the simple segmented regression case of two linear segments that are constrained to meet, often called the linear-linear model. Parameter estimation in the case where the joinpoint between the regimes is unknown can be tricky. Using a simulation study, four estimators for the parameters of the linear-linear model are evaluated. The bias and mean squared error of the estimators are considered under differing parameter combinations and sample sizes. Parameters estimated in the model are …


Extensions Of Nearest Shrunken Centroid Method For Classification, Tomohiko Funai Mar 2010

Extensions Of Nearest Shrunken Centroid Method For Classification, Tomohiko Funai

Theses and Dissertations

Stylometry assumes that the essence of the individual style of an author can be captured using a number of quantitative criteria, such as the relative frequencies of noncontextual words (e.g., or, the, and, etc.). Several statistical methodologies have been developed for authorship analysis. Jockers et al. (2009) utilize Nearest Shrunken Centroid (NSC) classification, a promising classification methodology in DNA microarray analysis for authorship analysis of the Book of Mormon. Schaalje et al. (2010) develop an extended NSC classification to remedy the problem of a missing author. Dabney (2005) and Koppel et al. (2009) suggest other modifications of NSC. This paper …