Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

An Efficient Methodology For Learning Bayesian Networks, Emmanuel Owusu Asante-Asamani Aug 2012

An Efficient Methodology For Learning Bayesian Networks, Emmanuel Owusu Asante-Asamani

Theses and Dissertations

Statistics from the National Cancer Institute indicate that 1 in 8 women will develop Breast cancer in their lifetime. Researchers have developed numerous statistical models to predict breast cancer risk however physicians are hesitant to use these models because of disparities in the predictions they produce. In an effort to reduce these disparities, we use Bayesian networks to capture the joint distribution of risk factors, and simulate artificial patient populations (clinical avatars) for interrogating the existing risk prediction models. The challenge in this effort has been to produce a Bayesian network whose dependencies agree with literature and are good estimates …


A Comparison Of Methods Of Analysis To Control For Confounding In A Cohort Study Of A Dietary Intervention, Esinhart Hali Jul 2012

A Comparison Of Methods Of Analysis To Control For Confounding In A Cohort Study Of A Dietary Intervention, Esinhart Hali

Theses and Dissertations

Comparing samples from different populations can be biased by confounding. There are several statistical methods that can be used to control for confounding. These include; multiple linear regression, propensity score matching, propensity score/logit of propensity score as a single covariate in a linear regression model, stratified analysis using propensity score quintiles, weighted analysis using propensity scores or trimmed scores. The data were from two studies of a dietary intervention (FIBERR and RNP). The outcome variable was change from baseline to one month for eight outcome measures; fat, fiber, and fruits/ vegetables behavior, fat, fiber, and fruits/vegetables intentions, fat and fruits/vegetables …


An Integrated Screening And Optimization Strategy, Nathaniel Jackson Rohbock Jul 2012

An Integrated Screening And Optimization Strategy, Nathaniel Jackson Rohbock

Theses and Dissertations

Within statistical methods, design of experiments (DOE) is well suited to make good inference from a minimal amount of data. Two types of designs within DOE are screening designs and optimization designs. Traditionally, these approaches have been necessarily separated by a gap between the objectives of each design and the methods available. Despite being so separated, in practice these designs are frequently connected by sequential experimentation. In fact, from the genesis of a project, the experimentor often knows that both designs will be necessary to accomplish his objectives. Due to advances in the understanding of experimental designs with complex aliasing …


The Effect Of Baseline Cluster Stratification On The Power Of Pre-Post Analysis, Fengjiao Hu Jul 2012

The Effect Of Baseline Cluster Stratification On The Power Of Pre-Post Analysis, Fengjiao Hu

Theses and Dissertations

The purpose of study is to check whether the power of detecting the effect of intervention versus control in a pre- and post-study can be increased by using a stratified randomized controlled design. A stratified randomized controlled design with two study arms and two time points, where strata are determined by clustering on baseline outcomes of the primary measure, is considered. A modified hierarchical clustering algorithm is developed which guarantees optimality as well as requiring each cluster to have at least one subject per study arm. The power is calculated based on simulated bivariate normal distributed primary measures with mixture …


Does Pair-Matching On Ordered Baseline Measures Increase Power: A Simulation Study, Yan Jin Jul 2012

Does Pair-Matching On Ordered Baseline Measures Increase Power: A Simulation Study, Yan Jin

Theses and Dissertations

It has been shown that pair-matching on an ordered baseline with normally distributed measures reduces the variance of the estimated treatment effect (Park and Johnson, 2006). The main objective of this study is to examine if pair-matching improves the power when the distribution is a mixture of two normal distributions. Multiple scenarios with a combination of different sample sizes and parameters are simulated. The power curves are provided for three cases, with and without matching, as follows: analysis of post-intervention data only, adding baseline as a covariate, and classic pre-post comparison. The study shows that the additional variance reduction provided …


An Applied Investigation Of Gaussian Markov Random Fields, Jessica Lyn Olsen Jun 2012

An Applied Investigation Of Gaussian Markov Random Fields, Jessica Lyn Olsen

Theses and Dissertations

Recently, Bayesian methods have become the essence of modern statistics, specifically, the ability to incorporate hierarchical models. In particular, correlated data, such as the data found in spatial and temporal applications, have benefited greatly from the development and application of Bayesian statistics. One particular application of Bayesian modeling is Gaussian Markov Random Fields. These methods have proven to be very useful in providing a framework for correlated data. I will demonstrate the power of GMRFs by applying this method to two sets of data; a set of temporal data involving car accidents in the UK and a set of spatial …


Xprime-Em: Eliciting Expert Prior Information For Motif Exploration Using The Expectation-Maximization Algorithm, Wei Zhou Jun 2012

Xprime-Em: Eliciting Expert Prior Information For Motif Exploration Using The Expectation-Maximization Algorithm, Wei Zhou

Theses and Dissertations

Understanding the possible mechanisms of gene transcription regulation is a primary challenge for current molecular biologists. Identifying transcription factor binding sites (TFBSs), also called DNA motifs, is an important step in understanding these mechanisms. Furthermore, many human diseases are attributed to mutations in TFBSs, which makes identifying those DNA motifs significant for disease treatment. Uncertainty and variations in specific nucleotides of TFBSs present difficulties for DNA motif searching. In this project, we present an algorithm, XPRIME-EM (Eliciting EXpert PRior Information for Motif Exploration using the Expectation-Maximization Algorithm), which can discover known and de novo (unknown) DNA motifs simultaneously from a …


Estimation Of The Effects Of Parental Measures On Child Aggression Using Structural Equation Modeling, Jordan Daniel Pyper Jun 2012

Estimation Of The Effects Of Parental Measures On Child Aggression Using Structural Equation Modeling, Jordan Daniel Pyper

Theses and Dissertations

A child's parents are the primary source of knowledge and learned behaviors for developing children, and the benefits or repercussions of certain parental practices can be long lasting. Although parenting practices affect behavioral outcomes for children, families tend to be diverse in their circumstances and needs. Research attempting to ascertain cause and effect relationships between parental influences and child behavior can be difficult due to the complex nature of family dynamics and the intricacies of real life. Structural equation modeling (SEM) is an appropriate method for this research as it is able to account for the complicated nature of child-parent …


Unbiased Estimation For The Contextual Effect Of Duration Of Adolescent Height Growth On Adulthood Obesity And Health Outcomes Via Hierarchical Linear And Nonlinear Models, Robert Carrico May 2012

Unbiased Estimation For The Contextual Effect Of Duration Of Adolescent Height Growth On Adulthood Obesity And Health Outcomes Via Hierarchical Linear And Nonlinear Models, Robert Carrico

Theses and Dissertations

This dissertation has multiple aims in studying hierarchical linear models in biomedical data analysis. In Chapter 1, the novel idea of studying the durations of adolescent growth spurts as a predictor of adulthood obesity is defined, established, and illustrated. The concept of contextual effects modeling is introduced in this first section as we study secular trend of adulthood obesity and how this trend is mitigated by the durations of individual adolescent growth spurts and the secular average length of adolescent growth spurts. It is found that individuals with longer periods of fast height growth in adolescence are more prone to …


Support Vector Machines For Classification And Imputation, Spencer David Rogers May 2012

Support Vector Machines For Classification And Imputation, Spencer David Rogers

Theses and Dissertations

Support vector machines (SVMs) are a powerful tool for classification problems. SVMs have only been developed in the last 20 years with the availability of cheap and abundant computing power. SVMs are a non-statistical approach and make no assumptions about the distribution of the data. Here support vector machines are applied to a classic data set from the machine learning literature and the out-of-sample misclassification rates are compared to other classification methods. Finally, an algorithm for using support vector machines to address the difficulty in imputing missing categorical data is proposed and its performance is demonstrated under three different scenarios …


Species Identification And Strain Attribution With Unassembled Sequencing Data, Owen Eric Francis Apr 2012

Species Identification And Strain Attribution With Unassembled Sequencing Data, Owen Eric Francis

Theses and Dissertations

Emerging sequencing approaches have revolutionized the way we can collect DNA sequence data for applications in bioforensics and biosurveillance. In this research, we present an approach to construct a database of known biological agents and use this database to develop a statistical framework to analyze raw reads from next-generation sequence data for species identification and strain attribution. Our method capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in …


Hitters Vs. Pitchers: A Comparison Of Fantasy Baseball Player Performances Using Hierarchical Bayesian Models, Scott D. Huddleston Apr 2012

Hitters Vs. Pitchers: A Comparison Of Fantasy Baseball Player Performances Using Hierarchical Bayesian Models, Scott D. Huddleston

Theses and Dissertations

In recent years, fantasy baseball has seen an explosion in popularity. Major League Baseball, with its long, storied history and the enormous quantity of data available, naturally lends itself to the modern-day recreational activity known as fantasy baseball. Fantasy baseball is a game in which participants manage an imaginary roster of real players and compete against one another using those players' real-life statistics to score points. Early forms of fantasy baseball began in the early 1960s, but beginning in the 1990s, the sport was revolutionized due to the advent of powerful computers and the Internet. The data used in this …


Using An Experimental Mixture Design To Identify Experimental Regions With High Probability Of Creating A Homogeneous Monolithic Column Capable Of Flow, Charles C. Willden Apr 2012

Using An Experimental Mixture Design To Identify Experimental Regions With High Probability Of Creating A Homogeneous Monolithic Column Capable Of Flow, Charles C. Willden

Theses and Dissertations

Graduate students in the Brigham Young University Chemistry Department are working to develop a filtering device that can be used to separate substances into their constituent parts. The device consists of a monomer and water mixture that is polymerized into a monolith inside of a capillary. The ideal monolith is completely solid with interconnected pores that are small enough to cause the constituent parts to pass through the capillary at different rates, effectively separating the substance. Although the end objective is to minimize pore sizes, it is necessary to first identify an experimental region where any combination of input variables …


A Dempster-Shafer Method For Multi-Sensor Fusion, Bethany G. Foley Mar 2012

A Dempster-Shafer Method For Multi-Sensor Fusion, Bethany G. Foley

Theses and Dissertations

The Dempster-Shafer Theory, a generalization of the Bayesian theory, is based on the idea of belief and as such can handle ignorance. When all of the required information is available, many data fusion methods provide a solid approach. Yet, most do not have a good way of dealing with ignorance. In the absence of information, these methods must then make assumptions about the sensor data. However, the real data may not fit well within the assumed model. Consequently, the results are often unsatisfactory and inconsistent. The Dempster-Shafer Theory is not hindered by incomplete models or by the lack of prior …


Computer Aided Multi-Data Fusion Dismount Modeling, Juan L. Morales Mar 2012

Computer Aided Multi-Data Fusion Dismount Modeling, Juan L. Morales

Theses and Dissertations

Recent research efforts strive to address the growing need for dismount surveillance, dismount tracking and characterization. Current work in this area utilizes hyperspectral and multispectral imaging systems to exploit spectral properties in order to detect areas of exposed skin and clothing characteristics. Because of the large bandwidth and high resolution, hyperspectral imaging systems pose great ability to characterize and detect dismounts. A multi-data dismount modeling system where the development and manipulation of dismount models is a necessity. This thesis demonstrates a computer aided multi-data fused dismount model, which facilitates studies of dismount detection, characterization and identification. The system is created …


Covariance Analysis Of Vision Aided Navigation By Bootstrapping, Andrew L. Relyea Mar 2012

Covariance Analysis Of Vision Aided Navigation By Bootstrapping, Andrew L. Relyea

Theses and Dissertations

Inertial Navigation System (INS) aiding using bearing measurements taken over time of stationary ground features is investigated. A cross country flight, in two and three dimensional space, is considered, as well as a vertical drop in three dimensional space. The objective is to quantify the temporal development of the uncertainty in the navigation states of an aircraft INS which is aided by taking bearing measurements of ground objects which have been geolocated using ownship position. It is shown that during wings level flight at constant speed and a fixed altitude, an aircraft that tracks ground objects and over time sequentially …


Using Multiattribute Utility Copulas In Support Of Uav Search And Destroy Operations, Beau A. Nunnally Mar 2012

Using Multiattribute Utility Copulas In Support Of Uav Search And Destroy Operations, Beau A. Nunnally

Theses and Dissertations

The multiattribute utility copula is an emerging form of utility function used by decision analysts to study decisions with dependent attributes. Failure to properly address attribute dependence may cause errors in selecting the optimal policy. This research examines two scenarios of interest to the modern warfighter. The first scenario employs a utility copula to determine the type, quantity, and altitude of UAVs to be sent to strike a stationary target. The second scenario employs a utility copula to examine the impact of attribute dependence on the optimal routing of UAVs in a contested operational environment when performing a search and …


Bayesian Pollution Source Apportionment Incorporating Multiple Simultaneous Measurements, Jonathan Casey Christensen Mar 2012

Bayesian Pollution Source Apportionment Incorporating Multiple Simultaneous Measurements, Jonathan Casey Christensen

Theses and Dissertations

We describe a method to estimate pollution profiles and contribution levels for distinct prominent pollution sources in a region based on daily pollutant concentration measurements from multiple measurement stations over a period of time. In an extension of existing work, we will estimate common source profiles but distinct contribution levels based on measurements from each station. In addition, we will explore the possibility of extending existing work to allow adjustments for synoptic regimes—large scale weather patterns which may effect the amount of pollution measured from individual sources as well as for particular pollutants. For both extensions we propose Bayesian methods …


Predicting Maximal Oxygen Consumption (Vo2max) Levels In Adolescents, Brent A. Shepherd Mar 2012

Predicting Maximal Oxygen Consumption (Vo2max) Levels In Adolescents, Brent A. Shepherd

Theses and Dissertations

Maximal oxygen consumption (VO2max) is considered by many to be the best overall measure of an individual's cardiovascular health. Collecting the measurement, however, requires subjecting an individual to prolonged periods of intense exercise until their maximal level, the point at which their body uses no additional oxygen from the air despite increased exercise intensity, is reached. Collecting VO2max data also requires expensive equipment and great subject discomfort to get accurate results. Because of this inherent difficulty, it is often avoided despite its usefulness. In this research, we propose a set of Bayesian hierarchical models to predict VO2max levels in adolescents, …


The Effect Of Smoking On Tuberculosis Incidence In Burdened Countries, Natalie Noel Ellison Mar 2012

The Effect Of Smoking On Tuberculosis Incidence In Burdened Countries, Natalie Noel Ellison

Theses and Dissertations

It is estimated that one third of the world's population is infected with tuberculosis. Though once thought a "dead" disease, tuberculosis is very much alive. The rise of drug resistant strains of tuberculosis, and TB-HIV coinfection have made tuberculosis an even greater worldwide threat. While HIV, poverty, and public health infrastructure are historically assumed to affect the burden of tuberculosis, recent research has been done to implicate smoking in this list. This analysis involves combining data from multiple sources in order determine if smoking is a statistically significant factor in predicting the number of incident tuberculosis cases in a country. …


Statistical Methods For Normalization And Analysis Of High-Throughput Genomic Data, Tobias Guennel Jan 2012

Statistical Methods For Normalization And Analysis Of High-Throughput Genomic Data, Tobias Guennel

Theses and Dissertations

High-throughput genomic datasets obtained from microarray or sequencing studies have revolutionized the field of molecular biology over the last decade. The complexity of these new technologies also poses new challenges to statisticians to separate biological relevant information from technical noise. Two methods are introduced that address important issues with normalization of array comparative genomic hybridization (aCGH) microarrays and the analysis of RNA sequencing (RNA-Seq) studies. Many studies investigating copy number aberrations at the DNA level for cancer and genetic studies use comparative genomic hybridization (CGH) on oligo arrays. However, aCGH data often suffer from low signal to noise ratios resulting …