Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Electronic Theses and Dissertations

2017

Discipline
Institution
Keyword

Articles 1 - 30 of 36

Full-Text Articles in Statistics and Probability

Cellulose Nanofiber-Reinforced Impact Modified Polypropylene: Assessing Material Properties From Fused Layer Modeling And Injection Molding Processing, Jordan Elliott Sanders Dec 2017

Cellulose Nanofiber-Reinforced Impact Modified Polypropylene: Assessing Material Properties From Fused Layer Modeling And Injection Molding Processing, Jordan Elliott Sanders

Electronic Theses and Dissertations

The purpose of this research was to investigate the use of cellulose nanofibers (CNF) compounded into an impact modified polypropylene (IMPP) matrix. A IMPP was used because it shrinks less than a PP homopolymer during FLM processing. An assessment of material properties from fused layer modeling (FLM), an additive manufacturing (AM) method, and injection molding (IM) was conducted. Results showed that material property measurements in neat PP were statistically similar between IM and FLM for density, strain at yield and flexural stiffness. Additionally, PP plus the coupling agent maleic anhydride (MA) showed statistically similar results in comparison of IM and …


Sample Size Calculations And Normalization Methods For Rna-Seq Data., Xiaohong Li Dec 2017

Sample Size Calculations And Normalization Methods For Rna-Seq Data., Xiaohong Li

Electronic Theses and Dissertations

High-throughput RNA sequencing (RNA-seq) has become the preferred choice for transcriptomics and gene expression studies. With the rapid growth of RNA-seq applications, sample size calculation methods for RNA-seq experiment design and data normalization methods for DEG analysis are important issues to be explored and discussed. The underlying theme of this dissertation is to develop novel sample size calculation methods in RNA-seq experiment design using test statistics. I have also proposed two novel normalization methods for analysis of RNA-seq data. In chapter one, I present the test statistical methods including Wald’s test, log-transformed Wald’s test and likelihood ratio test statistics for …


Functional Data Analysis Methods For Predicting Disease Status., Sarah Kendrick Dec 2017

Functional Data Analysis Methods For Predicting Disease Status., Sarah Kendrick

Electronic Theses and Dissertations

Introduction: Differential scanning calorimetry (DSC) is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are often considered as functional data. In this dissertation we propose and apply functional data analysis (FDA) techniques to analyze DSC data from the Lupus Family Registry and Repository (LFRR). The aim is to develop FDA methods to …


Using Hydroacoustics To Investigate Biological Responses In Fish Abundance To Restoration Efforts In The Penobscot River, Maine, Constantin C. Scherelis Aug 2017

Using Hydroacoustics To Investigate Biological Responses In Fish Abundance To Restoration Efforts In The Penobscot River, Maine, Constantin C. Scherelis

Electronic Theses and Dissertations

Spatiotemporal advantages linked to hydroacoustic sampling techniques have caused a surge in the use of these techniques for fisheries monitoring studies applied over long periods of time in marine systems. Dynamic physical conditions such as tidal height, boat traffic, floating debris, and suspended particle concentrations result in unwanted noise signatures that vary in intensity and location within a hydroacoustic beam over time and can be mixed with the acoustic returns from intended targets (e.g., fish). Typical processing filters applied over long term datasets to minimize noise and maximize signals do not address spatiotemporal fluctuations of noise in dynamic systems. We …


Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek Aug 2017

Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek

Electronic Theses and Dissertations

ABSTRACT

Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models

By

Gregory Frank Malek

Stephen F. Austin State University, Masters in Statistics Program,

Nacogdoches, Texas, U.S.A.

g_m_2002@live.com

This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …


Bayesian Approach On Short Time-Course Data Of Protein Phosphorylation, Casual Inference For Ordinal Outcome And Causal Analysis Of Dietary And Physical Activity In T2dm Using Nhanes Data., You Wu Aug 2017

Bayesian Approach On Short Time-Course Data Of Protein Phosphorylation, Casual Inference For Ordinal Outcome And Causal Analysis Of Dietary And Physical Activity In T2dm Using Nhanes Data., You Wu

Electronic Theses and Dissertations

This dissertation contains three different projects in proteomics and causal inferences. In the first project, I apply a Bayesian hierarchical model to assess the stability of phosphorylated proteins under short-time cold ischemia. This study provides inference on the stability of these phosphorylated proteins, which is valuable when using these proteins as biomarkers for a disease. in the second project, I perform a comparative study of different confounding-adjusted to estimate the treatment effect when the outcome variable is ordinal using observational data. The adjusted U-statistics method is compared with other methods such as ordinal logistic regression, propensity score based stratification and …


Estimation Of The Three Key Parameters And The Lead Time Distribution In Lung Cancer Screening., Ruiqi Liu Aug 2017

Estimation Of The Three Key Parameters And The Lead Time Distribution In Lung Cancer Screening., Ruiqi Liu

Electronic Theses and Dissertations

This dissertation contains three research projects on cancer screening probability modeling. Cancer screening is the primary technique for early detection. The goal of screening is to catch the disease early before clinical symptoms appear. In these projects, the three key parameters and lead time distribution were estimated to provide a statistical point of view on the effectiveness of cancer screening programs. In the first project, cancer screening probability model was used to analyze the computed tomography (CT) scan group in the National Lung Screening Trial (NLST) data. Three key parameters were estimated using Bayesian approach and Markov Chain Monte Carlo …


Uses Of The Hypergeometric Distribution For Determining Survival Or Complete Representation Of Subpopulations In Sequential Sampling, Brooke Busbee Aug 2017

Uses Of The Hypergeometric Distribution For Determining Survival Or Complete Representation Of Subpopulations In Sequential Sampling, Brooke Busbee

Electronic Theses and Dissertations

This thesis will explore the hypergeometric probability distribution by looking at many different aspects of the distribution. These include, and are not limited to: history and origin, derivation and elementary applications, properties, relationships to other probability models, kindred hypergeometric distributions and elements of statistical inference associated with the hypergeometric distribution. Once the above are established, an investigation into and furthering of work done by Walton (1986) and Charlambides (2005) will be done. Here, we apply the hypergeometric distribution to sequential sampling in order to determine a surviving subcategory as well as study the problem of and complete representation of the …


A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro Aug 2017

A Cross-Sectional Exploration Of Household Financial Reactions And Homebuyer Awareness Of Registered Sex Offenders In A Rural, Suburban, And Urban County., John Charles Navarro

Electronic Theses and Dissertations

As stigmatized persons, registered sex offenders betoken instability in communities. Depressed home sale values are associated with the presence of registered sex offenders even though the public is largely unaware of the presence of registered sex offenders. Using a spatial multilevel approach, the current study examines the role registered sex offenders influence sale values of homes sold in 2015 for three U.S. counties (rural, suburban, and urban) located in Illinois and Kentucky within the social disorganization framework. Homebuyers were surveyed to examine whether awareness of local registered sex offenders and the homebuyer’s community type operate as moderators between home selling …


Likelihood-Based Methods For Analysis Of Copy Number Variation Using Next Generation Sequencing Data., Udika Iroshini Bandara Aug 2017

Likelihood-Based Methods For Analysis Of Copy Number Variation Using Next Generation Sequencing Data., Udika Iroshini Bandara

Electronic Theses and Dissertations

A Copy Number Variation (CNV) detection problem is considered using Circular Binary Segmentation (CBS) procedures, including newly developed procedures based on likelihood ratio tests with the parametric bootstrap for models based on discrete distributions for count data (Poisson and negative binomial) and a widely-used DNAcopy package. Results from the literature concerning maximum likelihood estimation for the negative binomial distribution are reviewed. The Newton-Raphson method is used to find the root of the derivative of the profile log likelihood function when applicable, and it is proven that this method converges to the true Maximum Likeihood Estimate (MLE), if the starting point …


Peptide Identification: Refining A Bayesian Stochastic Model, Theophilus Barnabas Kobina Acquah May 2017

Peptide Identification: Refining A Bayesian Stochastic Model, Theophilus Barnabas Kobina Acquah

Electronic Theses and Dissertations

Notwithstanding the challenges associated with different methods of peptide identification, other methods have been explored over the years. The complexity, size and computational challenges of peptide-based data sets calls for more intrusion into this sphere. By relying on the prior information about the average relative abundances of bond cleavages and the prior probability of any specific amino acid sequence, we refine an already developed Bayesian approach in identifying peptides. The likelihood function is improved by adding additional ions to the model and its size is driven by two overall goodness of fit measures. In the face of the complexities associated …


A Distribution Of The First Order Statistic When The Sample Size Is Random, Vincent Z. Forgo Mr May 2017

A Distribution Of The First Order Statistic When The Sample Size Is Random, Vincent Z. Forgo Mr

Electronic Theses and Dissertations

Statistical distributions also known as probability distributions are used to model a random experiment. Probability distributions consist of probability density functions (pdf) and cumulative density functions (cdf). Probability distributions are widely used in the area of engineering, actuarial science, computer science, biological science, physics, and other applicable areas of study. Statistics are used to draw conclusions about the population through probability models. Sample statistics such as the minimum, first quartile, median, third quartile, and maximum, referred to as the five-number summary, are examples of order statistics. The minimum and maximum observations are important in extreme value theory. This paper will …


Spatiotemporal Analyses Of Recycled Water Production, Jana E. Archer May 2017

Spatiotemporal Analyses Of Recycled Water Production, Jana E. Archer

Electronic Theses and Dissertations

Increased demands on water supplies caused by population expansion, saltwater intrusion, and drought have led to water shortages which may be addressed by use of recycled water as recycled water products. Study I investigated recycled water production in Florida and California during 2009 to detect gaps in distribution and identify areas for expansion. Gaps were detected along the panhandle and Miami, Florida, as well as the northern and southwestern regions in California. Study II examined gaps in distribution, identified temporal change, and located areas for expansion for Florida in 2009 and 2015. Production increased in the northern and southern regions …


Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch May 2017

Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch

Electronic Theses and Dissertations

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.

However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software …


Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah May 2017

Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah

Electronic Theses and Dissertations

Despite considerable advances in high throughput technology over the last decade, new challenges have emerged related to the analysis, interpretation, and integration of high-dimensional data. The arrival of omics datasets has contributed to the rapid improvement of systems biology, which seeks the understanding of complex biological systems. Metabolomics is an emerging omics field, where mass spectrometry technologies generate high dimensional datasets. As advances in this area are progressing, the need for better analysis methods to provide correct and adequate results are required. While in other omics sectors such as genomics or proteomics there has and continues to be critical understanding …


Denoising Tandem Mass Spectrometry Data, Felix Offei May 2017

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy …


U-Statistics For Characterizing Forensic Sufficiency Studies, Cami Fuglsby Jan 2017

U-Statistics For Characterizing Forensic Sufficiency Studies, Cami Fuglsby

Electronic Theses and Dissertations

One of the main metrics for deciding if a given forensic modality is useful across a broad spectrum of cases, within a given population, is the Random Match Probability (RMP), or the corresponding discriminating power. Traditionally, the RMP of a given modality is studied by comparing full `templates' and estimating the rate at which pairs of templates 'match' in a given population. This strategy leads to a natural U-statistic of degree two. However, in questioned document examination, the RMP is studied as a function of the amount of handwriting contained in the two documents being compared; turning the U-statistic into …


An Evaluation Of Critical Realignment Theory: Comparing Bayesian And Frequentist Approaches, Tara A. Rhodes Jan 2017

An Evaluation Of Critical Realignment Theory: Comparing Bayesian And Frequentist Approaches, Tara A. Rhodes

Electronic Theses and Dissertations

Prior to this study, critical realignment theory, which presupposes eras of substantial and sustained swings in American political party dominance, had only been evaluated using the classical, frequentist approach to modeling. However, potential for more information concerning these electoral phenomena exists given a shift in the design and approach to realigning elections. This study sought to explore those options through one particular alternative to the classical approach to statistics--in this particular case, the Bayesian approach to statistics. Bayesian methods differ from the frequentist approach in three main ways: the treatment of probability, the treatment of parameters, and the treatment of …


Analyzing Electricity Use Of Low Income Weatherization Program Participants Using Propensity Score Analysis And A Hierarchical Linear Growth Model, Ksenia Polson Jan 2017

Analyzing Electricity Use Of Low Income Weatherization Program Participants Using Propensity Score Analysis And A Hierarchical Linear Growth Model, Ksenia Polson

Electronic Theses and Dissertations

This evaluation utilized propensity score matching methods and a longitudinal hierarchical linear growth model to determine the effect of residential energy efficiency upgrade(s) on household electricity use for the low-income community over the course of a year in the City and County of Denver, Colorado. Propensity score analysis with risk set matching was performed at each month under analysis applying nearest neighbor and nearest neighbor with caliper approaches by balancing covariates across the treatment and control groups. Following the completion of propensity score analysis, the data were aggregated to form a data set that was used in a hierarchical linear …


Consumers' Perceptions Of Voluntary And Involuntary Deconsumption: An Exploratory Sequential Scale Development Study, Kranti K. Dugar Jan 2017

Consumers' Perceptions Of Voluntary And Involuntary Deconsumption: An Exploratory Sequential Scale Development Study, Kranti K. Dugar

Electronic Theses and Dissertations

This exploratory sequential mixed methods study of scale development was conducted among baby boomers in the United States to render conceptual clarity to the concepts of voluntary and involuntary deconsumption, to explore deconsumption behavior under the tenets of the attribution theory of motivation, and to examine the components, structures, uses, and measurement properties of scales of voluntary and involuntary deconsumption. It was also an attempt to reiterate the importance of the baby boomer segment(s) for marketing practitioners based on growth, economic viability, and the power of influence, and to establish a deep understanding of the deconsumption processes, which could enable …


A Meta-Analysis Of The Effects Of Incentives On Response Rate In Online Survey Studies, Amal Muhammad Asire Jan 2017

A Meta-Analysis Of The Effects Of Incentives On Response Rate In Online Survey Studies, Amal Muhammad Asire

Electronic Theses and Dissertations

Meta-analysis was used to investigate the effect of incentives on response rates of web-based survey studies. Whereas numerous meta-analyses that address the effect of incentives on increasing response rates in survey studies are available in the literature, these analyses are based on mail surveys, so there is a need for an applied meta-analysis to examine the effect of incentives on response rates in online survey studies. A meta-analysis of an online method of survey administration was used because the use of online surveys has greatly increased, making web-based survey administration an important form of data collection in multiple fields of …


Meta-Analyses Of The Relationship Between Depression And Nine Dimensions Of Perfectionism, Gabriel Lynn Hottinger Jan 2017

Meta-Analyses Of The Relationship Between Depression And Nine Dimensions Of Perfectionism, Gabriel Lynn Hottinger

Electronic Theses and Dissertations

Perfectionism has been shown to be related to depression, but perfectionism is multidimensional. Some dimensions are related to positive psychological characteristics and outcomes and other dimensions are related to negative psychological characteristics and outcomes. This study reports results of nine meta-analyses performed to investigate the association between each of nine subscales of perfectionism and depression to determine which dimensions of perfectionism are most strongly associated with depression. The two subscales that were used from the Hewitt and Flett (1991b) Multidimensional Perfectionism scale were Self-Oriented Perfectionism (SOP) and Socially-Prescribed Perfectionism (SPP). The five subscales that were used from the Frost et …


Memory Properties Of Transformations Of Linear Processes And Symmetric Gini Correlation, Yongli Sang Jan 2017

Memory Properties Of Transformations Of Linear Processes And Symmetric Gini Correlation, Yongli Sang

Electronic Theses and Dissertations

A large class of time series processes can be modeled by linear processes, including a subset of the fractional ARIMA process. Transformation of linear processes is one of the most popular topics in univariate time-series analysis in recent years. In this dissertation, we study the memory properties of transformations of linear processes. Our results show that the transformations of short-memory time series still have short-memory and the transformation of long-memory time series may have different weaker memory parameters which depend on the power rank of the transformation. In particular, we provide the memory parameters of the FARIMA (p,d,q) processes. As …


Response Surface Methodology And Its Application In Optimizing The Efficiency Of Organic Solar Cells, Rajab Suliman Jan 2017

Response Surface Methodology And Its Application In Optimizing The Efficiency Of Organic Solar Cells, Rajab Suliman

Electronic Theses and Dissertations

Response surface methodology (RSM) is a ubiquitous optimization approach used in a wide variety of scientific research studies. The philosophy behind a response surface method is to sequentially run relatively simple experiments or models in order to optimize a response variable of interest. In other words, we run a small number of experiments sequentially that can provide a large amount of information upon augmentation. In this dissertation, the RSM technique is utilized in order to find the optimum fabrication condition of a polymer solar cell that maximizes the cell efficiency. The optimal device performance was achieved using 10.25 mg/ml polymer …


Audio-Based Productivity Forecasting Of Construction Cyclic Activities, Chris A. Sabillon Jan 2017

Audio-Based Productivity Forecasting Of Construction Cyclic Activities, Chris A. Sabillon

Electronic Theses and Dissertations

Due to its high cost, project managers must be able to monitor the performance of construction heavy equipment promptly. This cannot be achieved through traditional management techniques, which are based on direct observation or on estimations from historical data. Some manufacturers have started to integrate their proprietary technologies, but construction contractors are unlikely to have a fleet of entirely new and single manufacturer equipment for this to represent a solution. Third party automated approaches include the use of active sensors such as accelerometers and gyroscopes, passive technologies such as computer vision and image processing, and audio signal processing. Hitherto, most …


Threshold Models For Genome-Wide Association Mapping Of Familial Breast Cancer Incidence In Humans, Nasir Elmesmari Jan 2017

Threshold Models For Genome-Wide Association Mapping Of Familial Breast Cancer Incidence In Humans, Nasir Elmesmari

Electronic Theses and Dissertations

Breast cancer is the second most fatal cancer in the world and one of the most highly harmful cancers from which people suffer. Breast cancer studies have been able to uncover some knowledge about genetic susceptibility for familial breast cancer in humans. Hence, determining genetic factors may potentially help track the disease, as well as discover the cancer in early stages, or perhaps before it starts. In addition, this may allow early determination of possible treatment strategies which will make it easier to prevent the disease. In this context, it is important to determine whether the heritability of breast cancer …


Comparative Study Of The Distribution Of Repetitive Dna In Model Organisms, Mohamed K. Aburweis Jan 2017

Comparative Study Of The Distribution Of Repetitive Dna In Model Organisms, Mohamed K. Aburweis

Electronic Theses and Dissertations

Repetitive DNA elements are abundant in the genome of a wide range of organisms. In mammals, repetitive elements comprise about 40-50% of the total genomes. However, their biological functions remain largely unknown. Analysis of their abundance and distribution may shed some light on how they affect genome structure, function, and evolution. We conducted a detailed comparative analysis of repetitive DNA elements across ten different eukaryotic organisms, including chicken (G. gallus), zebrafish (D. rerio), Fugu (T. rubripes), fruit fly (D. melanogaster), and nematode worm (C. elegans), along with five mammalian organisms: human (H. sapiens), mouse (M. musculus), cow (B. taurus), rat …


Development Of Computational Techniques For Regulatory Dna Motif Identification Based On Big Biological Data, Jinyu Yang Jan 2017

Development Of Computational Techniques For Regulatory Dna Motif Identification Based On Big Biological Data, Jinyu Yang

Electronic Theses and Dissertations

Accurate regulatory DNA motif (or motif) identification plays a fundamental role in the elucidation of transcriptional regulatory mechanisms in a cell and can strongly support the regulatory network construction for both prokaryotic and eukaryotic organisms. Next-generation sequencing techniques generate a huge amount of biological data for motif identification. Specifically, Chromatin Immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) enables researchers to identify motifs on a genome scale. Recently, technological improvements have allowed for DNA structural information to be obtained in a high-throughput manner, which can provide four DNA shape features. The DNA shape has been found as a complementary factor …


Development And Properties Of Kernel-Based Methods For The Interpretation And Presentation Of Forensic Evidence, Douglas Armstrong Jan 2017

Development And Properties Of Kernel-Based Methods For The Interpretation And Presentation Of Forensic Evidence, Douglas Armstrong

Electronic Theses and Dissertations

The inference of the source of forensic evidence is related to model selection. Many forms of evidence can only be represented by complex, high-dimensional random vectors and cannot be assigned a likelihood structure. A common approach to circumvent this is to measure the similarity between pairs of objects composing the evidence. Such methods are ad-hoc and unstable approaches to the judicial inference process. While these methods address the dimensionality issue they also engender dependencies between scores when 2 scores have 1 object in common that are not taken into account in these models. The model developed in this research captures …


Approximate Statistical Solutions To The Forensic Identification Of Source Problem, Danica M. Ommen Jan 2017

Approximate Statistical Solutions To The Forensic Identification Of Source Problem, Danica M. Ommen

Electronic Theses and Dissertations

Currently in forensic science, the statistical methods for solving the identification of source problems are inherently subjective and generally ad-hoc. The formal Bayesian decision framework provides the most statistically rigorous foundation for these problems to date. However, computing a solution under this framework, which relies on a Bayes Factor, tends to be computationally intensive and highly sensitive to the subjective choice of prior distributions for the parameters. Therefore, this dissertation aims to develop statistical solutions to the forensic identification of source problems which are less subjective, but which retain the statistical rigor of the Bayesian solution. First, this dissertation focuses …