Open Access. Powered by Scholars. Published by Universities.®

2017

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 27 of 27

Full-Text Articles in Other Statistics and Probability

Making Models With Bayes, Pilar Olid Dec 2017

Making Models With Bayes, Pilar Olid

Electronic Theses, Projects, and Dissertations

Bayesian statistics is an important approach to modern statistical analyses. It allows us to use our prior knowledge of the unknown parameters to construct a model for our data set. The foundation of Bayesian analysis is Bayes' Rule, which in its proportional form indicates that the posterior is proportional to the prior times the likelihood. We will demonstrate how we can apply Bayesian statistical techniques to fit a linear regression model and a hierarchical linear regression model to a data set. We will show how to apply different distributions to Bayesian analyses and how the use of a prior affects …


Open Source Artificial Intelligence In A Biological/Ecological Context, Trevor Grant Oct 2017

Open Source Artificial Intelligence In A Biological/Ecological Context, Trevor Grant

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Discrete Stochastic Modeling For First-Year Biology Students, Dmitry Kondrashov Oct 2017

Discrete Stochastic Modeling For First-Year Biology Students, Dmitry Kondrashov

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Investigating The Student Enrollment Decision At Wku, Alec Brown Sep 2017

Investigating The Student Enrollment Decision At Wku, Alec Brown

Mahurin Honors College Capstone Experience/Thesis Projects

The purpose of this research is to investigate the relationships between the enrollment decision of first-time, first-year students admitted to Western Kentucky University and the amount of financial aid awarded, as well as demographic information. The Division of Enrollment Management provided a SAS dataset containing various information about all WKU students admitted in 2013, 2014, and 2015. Additionally, information about the 2016 class of admitted students was provided. The data has been analyzed in SAS Enterprise Miner. We performed analysis using decision tree modeling and logistic regression modeling. Results of these two procedures indicated the importance of credit hours earned …


Heterogeneity Aware Random Forest For Drug Sensitivity Prediction, Raziur Rahman, Kevin Matlock, Souparno Ghosh, Ranadip Pal Sep 2017

Heterogeneity Aware Random Forest For Drug Sensitivity Prediction, Raziur Rahman, Kevin Matlock, Souparno Ghosh, Ranadip Pal

Department of Statistics: Faculty Publications

Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea …


Imputation For Random Forests, Joshua Young Aug 2017

Imputation For Random Forests, Joshua Young

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

This project introduces two new methods for imputation of missing data in random forests. The new methods are compared against other frequently used imputation methods, including those used in the randomForest package in R. To test the effectiveness of these methods, missing data are imputed into datasets that contain two missing data mechanisms including missing at random and missing completely at random. After imputation, random forests are run on the data and accuracies for the predictions are obtained. Speed is an important aspect in computing; the speeds for all the tested methods are also compared.

One of the new methods …


A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome Jun 2017

A Comparison Of Some Confidence Intervals For Estimating The Kurtosis Parameter, Guensley Jerome

FIU Electronic Theses and Dissertations

Several methods have been proposed to estimate the kurtosis of a distribution. The three common estimators are: g2, G2 and b2. This thesis addressed the performance of these estimators by comparing them under the same simulation environments and conditions. The performance of these estimators are compared through confidence intervals by determining the average width and probabilities of capturing the kurtosis parameter of a distribution. We considered and compared classical and non-parametric methods in constructing these intervals. Classical method assumes normality to construct the confidence intervals while the non-parametric methods rely on bootstrap techniques. The bootstrap …


On The Analysis Of The Sir Epidemic Model For Small Networks: An Application In Hospital Settings, Martin Lopez-Garcia May 2017

On The Analysis Of The Sir Epidemic Model For Small Networks: An Application In Hospital Settings, Martin Lopez-Garcia

Biology and Medicine Through Mathematics Conference

No abstract provided.


Can Cone Signals In The Wild Be Predicted From The Past?, David H. Foster, Iván Marín-Franch May 2017

Can Cone Signals In The Wild Be Predicted From The Past?, David H. Foster, Iván Marín-Franch

MODVIS Workshop

In the natural world, the past is usually a good guide to the future. If light from the sun and sky is blue earlier in the day and yellow now, then it is likely to be more yellow later, as the sun's elevation decreases. But is the light reflected from a scene into the eye as predictable as the light incident upon the scene, especially when lighting changes are not just spectral but include changes in local shadows and mutual reflections? The aim of this work was to test the predictability of cone photoreceptor signals in the wild over the …


Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons May 2017

Gilmore Girls And Instagram: A Statistical Look At The Popularity Of The Television Show Through The Lens Of An Instagram Page, Brittany Simmons

Student Scholar Symposium Abstracts and Posters

After going on the Warner Brothers Tour in December of 2015, I created a Gilmore Girls Instagram account. This account, which started off as a way for me to create edits of the show and post my photos from the tour turned into something bigger than I ever could have imagined. In just over a year I have over 55,000 followers. I post content including revival news, merchandise, and edits of the show that have been featured in Entertainment Weekly, Bustle, E! News, People Magazine, Yahoo News, & GilmoreNews.

I created a dataset of qualitative and quantitative outcomes from my …


Spatially Explicit Population Estimates Of The Florida Black Bear, Jacob Michael Humm May 2017

Spatially Explicit Population Estimates Of The Florida Black Bear, Jacob Michael Humm

Masters Theses

The Florida black bear (Ursus americanus floridanus) is currently comprised of 7 isolated subpopulations: Apalachicola, Eglin, Osceola, Ocala/St. Johns, Chassahowitzka, Highlands/Glades, and Big Cypress. The last statewide assessment of Florida black bear population dynamics was conducted by Simek et al. (2005) using traditional capture-markrecapture methods. The subspecies was removed from Florida’s List of State Threatened Species in 2012 contingent upon the formulation of a management plan that would maintain viable subpopulations of black bears in suitable habitat. Accurate population estimates for each of the remaining black bear subpopulations in Florida were needed to achieve the management goals of …


The Value Of A Win: Analysis Of Playoff Structures, Matthew Orsi Apr 2017

The Value Of A Win: Analysis Of Playoff Structures, Matthew Orsi

Honors Projects in Mathematics

The purpose of this Senior Capstone project is to analyze the distinctions between existing playoff systems. In particular, we are looking to analyze the differences between the standard single-elimination tournament (which the NCAA has used since the inception of the tournament) and other potential options: double-elimination and multiple game series. Popular sports such as Major League Baseball and the National Basketball Association all use multiple game series for their playoffs. This project will use probability theory and simulation to determine the likelihood of different seeds winning a championship as well as the expected number of victories by seed in each …


Statistically Analyzing Assembly Line Processing Times Through Incorporation Of Product Variation, Kyle Rehr, Matthew Farr Mar 2017

Statistically Analyzing Assembly Line Processing Times Through Incorporation Of Product Variation, Kyle Rehr, Matthew Farr

Scholars Week

Timing methods and performance metrics are important in the heavily industrialized world we live in. Industrial plants use metrics to measure quality of production, help make decisions, and drive the strategy of the organization. However, there are many factors to be considered when measuring performance based on a metric; of which we will be analyzing the importance of product variation. We will be analyzing assembly line timings, whilst controlling for product variance, to show the importance differences between products makes in one’s ability to predict performance. In addition, we will be analyzing the current “statistical” methods used by an industrial …


Maximum Likelihood Estimation Of Parameters In Exponential Power Distribution With Upper Record Values, Tianchen Zhi Mar 2017

Maximum Likelihood Estimation Of Parameters In Exponential Power Distribution With Upper Record Values, Tianchen Zhi

FIU Electronic Theses and Dissertations

The exponential power (EP) distribution is a very important distribution that was used by survival analysis and related with asymmetrical EP distribution. Many researchers have discussed statistical inference about the parameters in EP distribution using i.i.d random samples. However, sometimes available data might contain only record values, or it is more convenient for researchers to collect record values. We aim to resolve this problem. We estimated two parameters of the EP distribution by MLE using upper record values. According to simulation study, we used the Bias and MSE of the estimators for studying the efficiency of the proposed estimation method. …


Assessing The Impact Of Retreat Mechanisms In A Simple Antarctic Ice Sheet Model Using Bayesian Calibration, Kelsey L. Ruckert, Gary Shaffer, David Pollard, Yawen Guan, Tony E. Wong, Chris E. Forest, Klaus Keller Jan 2017

Assessing The Impact Of Retreat Mechanisms In A Simple Antarctic Ice Sheet Model Using Bayesian Calibration, Kelsey L. Ruckert, Gary Shaffer, David Pollard, Yawen Guan, Tony E. Wong, Chris E. Forest, Klaus Keller

Department of Statistics: Faculty Publications

The response of the Antarctic ice sheet (AIS) to changing climate forcings is an important driver of sea-level changes. Anthropogenic climate change may drive a sizeable AIS tipping point response with subsequent increases in coastal flooding risks. Many studies analyzing flood risks use simple models to project the future responses of AIS and its sea-level contributions. These analyses have provided important new insights, but they are often silent on the effects of potentially important processes such as Marine Ice Sheet Instability (MISI) or Marine Ice Cliff Instability (MICI). These approximations can be well justified and result in more parsimonious and …


What’S Brewing? A Statistics Education Discovery Project, Marla A. Sole, Sharon L. Weinberg Jan 2017

What’S Brewing? A Statistics Education Discovery Project, Marla A. Sole, Sharon L. Weinberg

Publications and Research

We believe that students learn best, are actively engaged, and are genuinely interested when working on real-world problems. This can be done by giving students the opportunity to work collaboratively on projects that investigate authentic, familiar problems. This article shares one such project that was used in an introductory statistics course. We describe the steps taken to investigate why customers are charged more for iced coffee than hot coffee, which included collecting data and using descriptive and inferential statistical analysis. Interspersed throughout the article, we describe strategies that can help teachers implement the project and scaffold material to assist students …


Detecting Discordance Enrichment Among A Series Of Two-Sample Genome-Wide Expression Data Sets, Yinglei Lai, Fanni Zhang, Tapan Nayak, Reza Modarres, Norman H. Lee, Timothy A. Mccaffrey Jan 2017

Detecting Discordance Enrichment Among A Series Of Two-Sample Genome-Wide Expression Data Sets, Yinglei Lai, Fanni Zhang, Tapan Nayak, Reza Modarres, Norman H. Lee, Timothy A. Mccaffrey

Epidemiology Faculty Publications

Background

With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest.

Methods

In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when …


Perennial Warm-Season Grasses For Producing Biofuel And Enhancing Soil Properties: An Alternative To Corn Residue Removal, Humberto Blanco-Canqui, Robert B. Mitchell, Virginia L. Jin, Marty R. Schmer, Kent M. Eskridge Jan 2017

Perennial Warm-Season Grasses For Producing Biofuel And Enhancing Soil Properties: An Alternative To Corn Residue Removal, Humberto Blanco-Canqui, Robert B. Mitchell, Virginia L. Jin, Marty R. Schmer, Kent M. Eskridge

Department of Statistics: Faculty Publications

Removal of corn (Zea mays L.) residues at high rates for biofuel and other off-farm uses may negatively impact soil and the environment in the long term. Biomass removal from perennial warm-season grasses (WSGs) grown in marginally-productive lands could be an alternative to corn residue removal as biofuel feedstocks while controlling water and wind erosion, sequestering carbon (C), cycling water and nutrients, and enhancing other soil ecosystem services. We compared wind and water erosion potential, soil compaction, soil hydraulic properties, soil organic C (SOC), and soil fertility between biomass removal from WSGs and corn residue removal from rainfed no-till …


Impact Of Menthol Smoking On Nicotine Dependence For Diverse Racial/Ethnic Groups Of Daily Smokers, Julia N. Soulakova, Ryan R. Danczak Jan 2017

Impact Of Menthol Smoking On Nicotine Dependence For Diverse Racial/Ethnic Groups Of Daily Smokers, Julia N. Soulakova, Ryan R. Danczak

Department of Statistics: Faculty Publications

Introduction: The aims of this study were to evaluate whether menthol smoking and race/ethnicity are associated with nicotine dependence in daily smokers. Methods: The study used two subsamples of U.S. daily smokers who responded to the 2010–2011 Tobacco Use Supplement to the Current Population Survey. The larger subsample consisted of 18,849 non-Hispanic White (NHW), non-Hispanic Black (NHB), and Hispanic (HISP) smokers. The smaller subsample consisted of 1112 non-Hispanic American Indian/Alaska Native (AIAN), non-Hispanic Asian (ASIAN), non-Hispanic Hawaiian/Pacific Islander (HPI), and non-Hispanic Multiracial (MULT) smokers. Results: For larger (smaller) groups the rates were 45% (33%) for heavy smoking (16+ cig/day), 59% …


Evaluating Current Practices In Shelf Life Estimation, Robert Capen, David Christopher, Patrick Forenzo, Kim Huynh-Ba, David Leblond, Oscar Liu, John O'Neill, Nate Patterson, Michelle Quinlan, Radhika Rajagopalan, James Schwenke, Walter W. Stroup Jan 2017

Evaluating Current Practices In Shelf Life Estimation, Robert Capen, David Christopher, Patrick Forenzo, Kim Huynh-Ba, David Leblond, Oscar Liu, John O'Neill, Nate Patterson, Michelle Quinlan, Radhika Rajagopalan, James Schwenke, Walter W. Stroup

Department of Statistics: Faculty Publications

The current International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) methods for determining the supported shelf life of a drug product, described in ICH guidance documents Q1A and Q1E, are evaluated in this paper. To support this evaluation, an industry data set is used which is comprised of 26 individual stability batches of a common drug product where most batches are measured over a 24 month storage period. Using randomly sampled sets of 3 or 6 batches from the industry data set, the current ICH methods are assessed from three perspectives. First, the distributional properties …


Generalized Confidence Intervals Compatible With The Min Test For Simultaneous Comparisons Of One Subpopulation To Several Other Subpopulations, Julia N. Soulakova Jan 2017

Generalized Confidence Intervals Compatible With The Min Test For Simultaneous Comparisons Of One Subpopulation To Several Other Subpopulations, Julia N. Soulakova

Department of Statistics: Faculty Publications

A problem where one subpopulation is compared to several other subpopulations in terms of means with the goal of estimating the smallest difference between the means commonly arises in biology, medicine, and many other scientific fields. A generalization of Strassburger, Bretz and Hochberg (2004) approach for two comparisons is presented for cases with three and more comparisons. The method allows constructing an interval-estimator for the smallest mean difference, which is compatible with the Min test. An application to a fluency-disorder study is illustrated. Simulations confirmed adequate probability coverage for normally distributed outcomes for a number of designs.


Increasing Genomic-Enabled Prediction Accuracy By Modeling Genotype X Environment Interactions In Kansas Wheat, Diego Jarquin, Cristiano Lemas Da Silva, R. Chris Gaynor, Jesse Poland, Allan Fritz, Reka Howard, Sarah Battenfield, José Crossa Jan 2017

Increasing Genomic-Enabled Prediction Accuracy By Modeling Genotype X Environment Interactions In Kansas Wheat, Diego Jarquin, Cristiano Lemas Da Silva, R. Chris Gaynor, Jesse Poland, Allan Fritz, Reka Howard, Sarah Battenfield, José Crossa

Department of Statistics: Faculty Publications

Wheat (Triticum aestivum L.) breeding programs test experimental lines in multiple locations over multiple years to get an accurate assessment of grain yield and yield stability. Selections in early generations of the breeding pipeline are based on information from only one or few locations and thus materials are advanced with little knowledge of the genotype × environment interaction (G × E) effects. Later, large trials are conducted in several locations to assess the performance of more advanced lines across environments. Genomic selection (GS) models that include G × E covariates allow us to borrow information not only from related …


Application Of Response Surface Methods To Determine Conditions For Optimal Genomic Prediction, Reka Howard, Alicia L. Carriquiry, William D. Beavis Jan 2017

Application Of Response Surface Methods To Determine Conditions For Optimal Genomic Prediction, Reka Howard, Alicia L. Carriquiry, William D. Beavis

Department of Statistics: Faculty Publications

An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability). Possible values for these factors and the …


Trans-Ancestry Fine Mapping And Molecular Assays Identify Regulatory Variants At The Angptl8 Hdl-C Gwas Locus, Maren E. Cannon, Qing Duan, Ying Wu, Monica Zeynalzadeh, Zheng Xu, Antti J. Kangas, Pasi Soininen, Mika Ala-Korpela, Mete Civelek, Aldons J. Lusis, Johanna Kuusisto, Francis S. Collins, Michael Boehnke, Hua Tang, Markku Laakso, Yun Li, Karen L. Mohlke Jan 2017

Trans-Ancestry Fine Mapping And Molecular Assays Identify Regulatory Variants At The Angptl8 Hdl-C Gwas Locus, Maren E. Cannon, Qing Duan, Ying Wu, Monica Zeynalzadeh, Zheng Xu, Antti J. Kangas, Pasi Soininen, Mika Ala-Korpela, Mete Civelek, Aldons J. Lusis, Johanna Kuusisto, Francis S. Collins, Michael Boehnke, Hua Tang, Markku Laakso, Yun Li, Karen L. Mohlke

Department of Statistics: Faculty Publications

Recent genome-wide association studies (GWAS) have identified variants associated with highdensity lipoprotein cholesterol (HDL-C) located in or near the ANGPTL8 gene. Given the extensive sharing of GWAS loci across populations, we hypothesized that at least one shared variant at this locus affects HDL-C. The HDL-C–associated variants are coincident with expression quantitative trait loci for ANGPTL8 and DOCK6 in subcutaneous adipose tissue; however, only ANGPTL8 expression levels are associated with HDL-C levels. We identified a 400-bp promoter region of ANGPTL8 and enhancer regions within 5 kb that contribute to regulating expression in liver and adipose. To identify variants functionally responsible for …


A Bayes Interpretation Of Stacking For M-Complete And M-Open Settings, Tri Le, Bertrand S. Clarke Jan 2017

A Bayes Interpretation Of Stacking For M-Complete And M-Open Settings, Tri Le, Bertrand S. Clarke

Department of Statistics: Faculty Publications

In M-open problems where no true model can be conceptualized, it is common to back off from modeling and merely seek good prediction. Even in M-complete problems, taking a predictive approach can be very useful. Stacking is a model averaging procedure that gives a composite predictor by combining individual predictors from a list of models using weights that optimize a cross validation criterion. We show that the stacking weights also asymptotically minimize a posterior expected loss. Hence we formally provide a Bayesian justification for cross-validation. Often the weights are constrained to be positive and sum to one. For greater generality, …


Selection Portfolio: Applying Modern Portfolio Theory To Personnel Selection, Eric Leingang Jan 2017

Selection Portfolio: Applying Modern Portfolio Theory To Personnel Selection, Eric Leingang

All Graduate Theses, Dissertations, and Other Capstone Projects

Modern Portfolio Theory (MPT) is a framework for building a portfolio of risky assets such that the ratio of risk to return is minimized. While this theory has been used in the field of financial economics for over sixty years, the method has not yet been applied to compensatory personnel selection. A common method for personnel selection is multiple regression to maximize the predicted performance of the selected group given a cut-off score on the predictor(s). Recognizing that maximizing the performance of the selected group is not the only consideration, and that, for many jobs and organizations, the outcomes of …


Greek New Testament For Data Analysis, Keith L. Yoder Dec 2016

Greek New Testament For Data Analysis, Keith L. Yoder

Keith L. Yoder

Updated: 18 December 2017
This Excel data file (compatible with Excel 2007 and later versions) is an extract of my working Greek New Testament database which I use for statistical and data analysis. It originated in the early 2000's from UBS3 data files in beta code I obtained from CCAT, and has since been evolving through countless changes and corrections. A flat-file table display such as Excel 2007+ is the best format suitable for Autofilter and VBA applications, without involving a more complex XML format. The file itself may be opened with Excel 2007 or later versions, or with the …