Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics and Probability

2018

Institution
Keyword
Publication

Articles 1 - 30 of 262

Full-Text Articles in Physical Sciences and Mathematics

Grammar And Variation: Understanding How Cis-Regulatory Information Is Encoded In Mammalian Genomes, Dana Michele King Dec 2018

Grammar And Variation: Understanding How Cis-Regulatory Information Is Encoded In Mammalian Genomes, Dana Michele King

Arts & Sciences Electronic Theses and Dissertations

Understanding how genotype leads to phenotype is key to understand both the development and dysfunction of complex organisms. In the context of regulating the gene expression patterns that contribute to cell identity and function, the goal of my thesis research is to how changes in genome sequence may impact impact gene expression by determining how sequence features contribute to regulatory potential. To accomplish this goal, I first leveraged the key regulatory role of pluripotency transcription factors (TFs) in mouse embryonic stem cells (mESCs) and tested synthetically generated and genomic identified combinations of binding site for four TFs, OCT4, SOX2, KLF4, …


Estimation Of The Parameters In A Spatial Regressive-Autoregressive Model Using Ord's Eigenvalue Method, Sajib Mahmud Mahmud Tonmoy Dec 2018

Estimation Of The Parameters In A Spatial Regressive-Autoregressive Model Using Ord's Eigenvalue Method, Sajib Mahmud Mahmud Tonmoy

UNLV Theses, Dissertations, Professional Papers, and Capstones

In this thesis, we study one of Ord's (1975) global spatial regression models.

Ord considered spatial regressive-autoregressive models to describe the interaction

between location and a response variable in the presence of several covariates. He also

developed a practical estimation method for the parameters of this regression model

using the eigenvalues of a weight matrix that captures the contiguity of locations.

We review the theoretical aspects of his estimation method and implement it in the

statistical package R.

We also implement Ord's methods on the Columbus, Ohio, crime data set from the

year 1980, which involves the crime rate of …


Evaluating Rater Effects In The Context Of Ethical Reasoning Essay Assessment: An Application Of The Many-Facets Rasch Measurement Model, Madison A. Holzman Dec 2018

Evaluating Rater Effects In The Context Of Ethical Reasoning Essay Assessment: An Application Of The Many-Facets Rasch Measurement Model, Madison A. Holzman

Dissertations, 2014-2019

Performance assessments are an often desired type of assessment due to their potential for alignment between the assessment and reality. However, due to the rater-mediated nature of scoring (Eckes, 2015), performance assessments have psychometric challenges that cannot be ignored in testing and assessment work. Specifically, performance assessment scores are prone to rater effects, or systematic differences in how raters evaluate performance assessment products (Myford & Wolfe, 2003). The purpose of this project was to evaluate ethical reasoning essay scores for rater effects. The Many-Facets Rasch Measurement (MFRM) model was used to evaluate ethical reasoning essay scores for rater leniency/severity effects, …


A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano Dec 2018

A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano

Graduate Theses and Dissertations

The objective of the research this thesis describes is to find a way to classify text-based descriptions of biological adaption to support Biologically Inspired design. Biologically inspired design is a fairly new field with ongoing research. There are different tools to assist designers and biologists in bio-inspired design. Some of the most common are BioTRIZ and AskNature. In recent years, more tools have been proposed to aid and make research in the field easier, for example, the Biologically Inspired Adaptive System Design (BIASD) tool. This tool was designed with the goal of helping designers in early design stages generate more …


Shoulder-Specific Patient Reported Outcome Measures For Use In Patients With Head And Neck Cancer:An Assessment Of Reliability, Construct Validity, And Overall Appropriateness Of Test Score Interpretation Using Rasch Analysis, Melissa Michelle Eden Dec 2018

Shoulder-Specific Patient Reported Outcome Measures For Use In Patients With Head And Neck Cancer:An Assessment Of Reliability, Construct Validity, And Overall Appropriateness Of Test Score Interpretation Using Rasch Analysis, Melissa Michelle Eden

Department of Physical Therapy Student Theses, Dissertations and Capstones

Context: Medical management for head and neck cancer (HNC) often includes neck dissection surgery, a side effect of which is shoulder dysfunction. There is no consensus for which patient-reported outcome measure (PRO) is most appropriate to quantify shoulder dysfunction in this population.

Objective: The aims of this research study were to: (1) use Rasch methodologies to assess construct validity and overall appropriateness of test score interpretation of Disability of the Arm, Shoulder and Hand (DASH), QuickDASH, Shoulder Pain and Disability Index (SPADI) and Neck Dissection Impairment Index (NDII) in the HNC population; (2) determine appropriateness of use of University of …


Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett Dec 2018

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create …


Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang Dec 2018

Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang

Graduate Theses and Dissertations

Regression analysis fits predictive models to data on a response variable and corresponding values for a set of explanatory variables. Often data on the explanatory variables come at a cost from commercial databases, so the available budget may limit which ones are used in the final model.

In this dissertation, two budget-constrained regression models are proposed for continuous and categorical variables respectively using Mixed Integer Nonlinear Programming (MINLP) to choose the explanatory variables to be included in solutions. First, we propose a budget-constrained linear regression model for continuous response variables. Properties such as solvability and global optimality of the proposed …


Quantitative Microbial Risk Assessment For Parts, Ground, And Msc Poultry Product Including Intervention Analysis And Exploration Of Enterobacteriaceae As An Indicator Organism In Poultry Processing, Leigh Ann Parette Dec 2018

Quantitative Microbial Risk Assessment For Parts, Ground, And Msc Poultry Product Including Intervention Analysis And Exploration Of Enterobacteriaceae As An Indicator Organism In Poultry Processing, Leigh Ann Parette

Graduate Theses and Dissertations

Samples collected at five different large bird poultry processing facilities over a period of 7 months from prescald to post debone locations were enumerated for Enterobacteriaceae, Salmonella spp., and Campylobacter spp. and the results were used to create Quantitative Microbial Risk Analyses (QMRA) models for parts, ground, and mechanically separated chicken (MSC) products. Sensitivity analyses indicated the points in the process at which reductions would be most advantageous to the endpoint and simulation models were run to test reductions required to meet the current USDA performance standards.

These data were analyzed to determine the reductions from one node (location) to …


Sequential Inference For Hidden Markov Models, Michael Ellis Dec 2018

Sequential Inference For Hidden Markov Models, Michael Ellis

Graduate Theses and Dissertations

In many applications data are collected sequentially in time with very short time intervals between observations. If one is interested in using new observations as they arrive in time then non-sequential Bayesian inference methods, such as Markov Chain Monte Carlo (MCMC) sampling, can be too slow. Increasingly, state space models are being used to model nonlinear and non-Gaussian systems. The structure of state space models allows for sequential Bayesian inference so that an approximation to the posterior distribution of interest can be updated as new observations arrive. In special cases, the exact posterior distribution can be updated through conjugate Bayesian …


Effectiveness Of Prescribed Fire On Meeting Fuel Load And Wildlife Habitat Management Objectives In East Texas National Forests, Trey Wall Dec 2018

Effectiveness Of Prescribed Fire On Meeting Fuel Load And Wildlife Habitat Management Objectives In East Texas National Forests, Trey Wall

Electronic Theses and Dissertations

Using standardized methodology outlined by the United States Forest Service and the National Forests and Grasslands in Texas’ Fire Monitoring Program for data collection, the efficacy of current Forest Service prescribed burn regimes were analyzed for 24 study sites in East Texas National Forests. Study sites were located within Sam Houston, Davy Crockett, and Angelina/Sabine National Forests. Efficacy was determined by comparing defined management objectives established by the Forest Service to the data collected at the study sites. The results conclude that most objectives, as outlined by the Forest Service, are not being met with the current practices. Re-visitation of …


The Strong Law Of Large Numbers For U-Statistics Under Random Censorship, Jan Höft Dec 2018

The Strong Law Of Large Numbers For U-Statistics Under Random Censorship, Jan Höft

Theses and Dissertations

We introduce a semi-parametric U-statistics estimator for randomly right censored data. We will study the strong law of large numbers for this estimator under proper assumptions about the conditional expectation of the censoring indicator with re- spect to the observed life times. Moreover we will conduct simulation studies, where the semi-parametric estimator is compared to a U-statistic based on the Kaplan- Meier product limit estimator in terms of bias, variance and mean squared error, under different censoring models.


Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan Dec 2018

Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan

Graduate Theses and Dissertations

The USDA Forest Service aims to use satellite imagery for monitoring and predicting changes in forest conditions over time within the country. We specifically focus on a 230, 400 hectares region in north-central Wisconsin between 2003 - 2012. The auxiliary data collected from the satellite imagery of this region are relatively dense in space and time and can be used to efficiently predict how the forest condition changed over that decade. However, these records have a significant proportion of missing values due to weather conditions and system failures. To fill in these missing values, we build spaciotemporal models based on …


Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An Dec 2018

Different Estimation Methods For The Basic Independent Component Analysis Model, Zhenyi An

Arts & Sciences Electronic Theses and Dissertations

Inspired by classic cocktail-party problem, the basic Independent Component Analysis (ICA) model is created. What differs Independent Component Analysis (ICA) from other kinds of analysis is the intrinsic non-Gaussian assumption of the data. Several approaches are proposed based on maximizing the non-Gaussianity of the data, which is measured by kurtosis, mutual information, and others. With each estimation, we need to optimize the functions of expectations of non-quadratic functions since it can help us to access the higher-order statistics of non-Gaussian part of the data. In this thesis, our goal is to review the one of the most efficient estimation methods, …


Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert Dec 2018

Comparing Performance Of Gene Set Test Methods Using Biologically Relevant Simulated Data, Richard M. Lambert

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Today we know that there are many genetically driven diseases and health conditions. These problems often manifest only when a set of genes are either active or inactive. Recent technology allows us to measure the activity level of genes in cells, which we call gene expression. It is of great interest to society to be able to statistically compare the gene expression of a large number of genes between two or more groups. For example, we may want to compare the gene expression of a group of cancer patients with a group of non-cancer patients to better understand the genetic …


Statistical Methods To Account For Gene-Level Covariates In Normalization Of High-Dimensional Read-Count Data, Lauren Holt Lenz Dec 2018

Statistical Methods To Account For Gene-Level Covariates In Normalization Of High-Dimensional Read-Count Data, Lauren Holt Lenz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The goal of genetic-based cancer research is often to identify which genes behave differently in cancerous and healthy tissue. This difference in behavior, referred to as differential expression, may lead researchers to more targeted preventative care and treatment. One way to measure the expression of genes is though a process called RNA-Seq, that takes physical tissue samples and maps gene products and fragments in the sample back to the gene that created it, resulting in a large read-count matrix with genes in the rows and a column for each sample. The read-counts for tumor and normal samples are then compared …


The Power Law Distribution Of Agricultural Land Size, Lauren Chamberlain Dec 2018

The Power Law Distribution Of Agricultural Land Size, Lauren Chamberlain

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This paper demonstrates that the distribution of county level agricultural land size in the United States is best described by a power-law distribution, a distribution that displays extremely heavy tails. This indicates that the majority of farmland exists in the upper tail. Our analysis indicates that the top 5% of agricultural counties account for about 25% of agricultural land between 1997-2012. The power-law distribution of farm size has important implications for the design of more efficient regional and national agricultural policies as counties close to the mean account for little of the cumulative distribution of total agricultural land. This has …


Surviving A Civil War: Expanding The Scope Of Survival Analysis In Political Science, Andrew B. Whetten Dec 2018

Surviving A Civil War: Expanding The Scope Of Survival Analysis In Political Science, Andrew B. Whetten

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Survival Analysis in the context of Political Science is frequently used to study the duration of agreements, political party influence, wars, senator term lengths, etc. This paper surveys a collection of methods implemented on a modified version of the Power-Sharing Event Dataset (which documents civil war peace agreement durations in the Post-Cold War era) in order to identify the research questions that are optimally addressed by each method. A primary comparison will be made between a Cox Proportional Hazards Model using some advanced capabilities in the glmnet package, a Survival Random Forest Model, and a Survival SVM. En route to …


Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson Dec 2018

Innate Immunity, The Hepatic Extracellular Matrix, And Liver Injury: Mathematical Modeling Of Metastatic Potential And Tumor Development In Alcoholic Liver Disease., Shanice V. Hudson

Electronic Theses and Dissertations

The overarching goals of the current work are to fill key gaps in the current understanding of alcohol consumption and the risk of metastasis to the liver. Considering the evidence this research group has compiled confirming that the hepatic matrisome responds dynamically to injury, an altered extracellular matrix (ECM) profile appears to be a key feature of pre-fibrotic inflammatory injury in the liver. This group has demonstrated that the hepatic ECM responds dynamically to alcohol exposure, in particular, sensitizing the liver to LPS-induced inflammatory damage. Although the study of alcohol in its role as a contributing factor to oncogenesis and …


Learning Statistics Through Guided Block Play: A Pre-Curriculum In Statistical Literacy, Robert P. Giebitz Nov 2018

Learning Statistics Through Guided Block Play: A Pre-Curriculum In Statistical Literacy, Robert P. Giebitz

Organization, Information and Learning Sciences ETDs

Learning to use data to investigate the world and make decisions has become an essential skill for all citizens. Play and curiosity are powerful motivators for learning. Inquiry – the process of asking questions and seeking answers – can engage the natural curiosity of young learners and motivate early learning. Recent research in statistics education has shown that children as young as 4 and 5 years old can learn to collect, organize, and interpret data they acquire through observation, counting, and measuring in a process of guided inquiry. Guided block play has been used for over 100 years to enable …


Flowgraph Models For Clustered Multistate Time To Event Data, Kristin Hall Nov 2018

Flowgraph Models For Clustered Multistate Time To Event Data, Kristin Hall

USF Tampa Graduate Theses and Dissertations

Healthcare systems have multistate processes. Such processes may be modeled using flowgraphs, which are directed graphs. Flowgraph models support a variety of transition time distributions, easily handle reversibility between states and allow alternate paths to the event or state of interest to be taken. However, estimation of flowgraph and first passage time distribution parameters can lead to incorrect inferences when interdependent data are treated as independent.

In this dissertation, we expand the flowgraph model to accommodate nested and correlated data structures. We develop a framework to incorporate random effects into transition probability and transition time components of a flowgraph model. …


Anisotropic Kernel Smoothing For Change-Point Data With An Analysis Of Fire Spread Rate Variability, John Ronald James Thompson Nov 2018

Anisotropic Kernel Smoothing For Change-Point Data With An Analysis Of Fire Spread Rate Variability, John Ronald James Thompson

Electronic Thesis and Dissertation Repository

Wildland fires are natural disturbances that enable the renewal of forests. However, these fires also place public safety and property at risk. Understanding forest fire spread in any region of Canada is critical to promoting forest health, and protecting human life and infrastructure. In 2014, Ontario updated its Wildland Fire Management Strategy, moving away from ``zone-based" decision making to ``appropriate response" decision making. This new strategy calls for an assessment of the risks and benefits of every wildland fire reported in the province. My research places the emphasis on the knowledge and understanding of fire spread rates and their variabilities. …


The Compensation For Few Clusters In Clustered Randomized Trials With Binary Outcomes, Lily Stalter Nov 2018

The Compensation For Few Clusters In Clustered Randomized Trials With Binary Outcomes, Lily Stalter

Mathematics & Statistics ETDs

Cluster randomized trials are increasingly popular in epidemiological and medical research. When analyzing the data from such studies it is imperative that the hierarchical structure of the data be taken into account. Multilevel logistic regression is used to analyze clustered data with binary outcomes. Previous literature shows that a greater number of clusters is more important than a large number of subjects per cluster. This paper investigates if it is possible to compensate for the increased bias found for parameter estimates when the number of clusters is decreased. A simulation study was conducted where the absolute percent relative bias for …


Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma Nov 2018

Bias Assessment And Reduction In Kernel Smoothing, Wenkai Ma

Electronic Thesis and Dissertation Repository

When performing local polynomial regression (LPR) with kernel smoothing, the choice of the smoothing parameter, or bandwidth, is critical. The performance of the method is often evaluated using the Mean Square Error (MSE). Bias and variance are two components of MSE. Kernel methods are known to exhibit varying degrees of bias. Boundary effects and data sparsity issues are two potential problems to watch for. There is a need for a tool to visually assess the potential bias when applying kernel smooths to a given scatterplot of data. In this dissertation, we propose pointwise confidence intervals for bias and demonstrate a …


Quantile Regression For Survival Data With Delayed Entry, Boqin Sun Nov 2018

Quantile Regression For Survival Data With Delayed Entry, Boqin Sun

Doctoral Dissertations

Delayed entry arises frequently in follow-up studies for survival outcomes, where additional study subjects enter during the study period. We propose a quantile regression model to analyze survival data subject to delayed entry and right-censoring. Such a model offers flexibility in assessing covariate effects on survival outcome and the regression coefficients are interpretable as direct effects on the event time. Under the conditional independent censoring assumption, we proposed a weighted martingale-based estimating equation, and formulated the solution finding as a $\ell_1$-type convex optimization problem, which was solved through a linear programming algorithm. We established uniform consistency and weak convergence of …


Variational Approximations For Density Deconvolution, Yue Chang Nov 2018

Variational Approximations For Density Deconvolution, Yue Chang

Doctoral Dissertations

This thesis considers the problem of density estimation when the variables of interest are subject to measurement error. The measurement error is assumed to be additive and homoscedastic. We specify the density of interest by a Dirichlet Process Mixture Model and establish variational approximation approaches to the density deconvolution problem. Gaussian and Laplacian error distributions are considered, which are representatives of supersmooth and ordinary smooth distributions, respectively. We develop two variational approximation algorithms for Gaussian error deconvolution and one variational approximation algorithm for Laplacian error deconvolution. Their performances are compared to deconvoluting kernels and Monte Carlo Markov Chain method by …


Model-Based Predictive Analytics For Additive And Smart Manufacturing, Zhuo Yang Oct 2018

Model-Based Predictive Analytics For Additive And Smart Manufacturing, Zhuo Yang

Doctoral Dissertations

Qualification and certification for additive and smart manufacturing systems can be uncertain and very costly. Using available historical data can mitigate some costs of producing and testing sample parts. However, use of such data lacks the flexibility to represent specific new problems which decreases predictive accuracy and efficiency. To address these compelling needs, in this dissertation modeling techniques are introduced that can proactively estimate results expected from additive and smart manufacturing processes swiftly and with practical levels of accuracy and reliability. More specifically, this research addresses the current challenges and limitations posed by use of available data and the high …


Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest Oct 2018

Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest

Doctoral Dissertations

ABSTRACT ESSAYS IN FINANCIAL ECONOMICS: ANNOUNCEMENT EFFECTS IN FIXED INCOME MARKETS PHD IN FINANCE MAY 2018 JAMES J FOREST B.A., FRAMINGHAM STATE UNIVERSITY M.S., NORTHEASTERN UNIVERSITY Ph.D., UNIVERSITY OF MASSACHUSETTS – AMHERST Directed by: Professor Hossein B. Kazemi This dissertation demonstrates the use of empirical techniques for dealing with modeling issues that arise when analyzing announcement effects in fixed income markets. It describes empirical challenges in achieving unbiased and efficient parameter estimates and shows the importance of modelling a wide range of macroeconomic announcement effects to avoid omitted variable bias. Employing techniques common in Macroeconomics, financial market researchers are better …


Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak Oct 2018

Real-Time Dengue Forecasting In Thailand: A Comparison Of Penalized Regression Approaches Using Internet Search Data, Caroline Kusiak

Masters Theses

Dengue fever affects over 390 million people annually worldwide and is of particu- lar concern in Southeast Asia where it is one of the leading causes of hospitalization. Modeling trends in dengue occurrence can provide valuable information to Public Health officials, however many challenges arise depending on the data available. In Thailand, reporting of dengue cases is often delayed by more than 6 weeks, and a small fraction of cases may not be reported until over 11 months after they occurred. This study shows that incorporating data on Google Search trends can improve dis- ease predictions in settings with severely …


Crustal Seismic Anisotropy Of The Ruby Mountains Core Complex And Surrounding Northern Basin And Range, Justin T. Wilgus Oct 2018

Crustal Seismic Anisotropy Of The Ruby Mountains Core Complex And Surrounding Northern Basin And Range, Justin T. Wilgus

Earth and Planetary Sciences ETDs

Metamorphic core complexes (MCC) are distinctive uplifts that expose deeply exhumed and deformed crustal rocks due to localized extensional deformation. Consequently, their detailed structure provide a window into deep crustal mechanics. The North American Cordillera contains numerous MCC, one of which is the Ruby Mountains core complex (RMCC) located in the highly extended northern Basin and Range. To constrain the extent to which anisotropy below the RMCC deviates from the regional Basin and Range average and test the depth dependence of crustal anisotropy we conduct a radial anisotropy investigation below the RMCC and surrounding northern Basin and Range. Data from …


Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song Oct 2018

Group-Lasso Estimation In High-Dimensional Factor Models With Structural Breaks, Yujie Song

Major Papers

In this major paper, we study the influence of structural breaks in the financial market model with high-dimensional data. We present a model which is capable of detecting changes in factor loadings, determining the number of factors and detecting the break date. We consider the case where the break date is both known and unknown and identify the type of instability. For the unknown break date case, we propose a group-LASSO estimator to determine the number of pre- and post-break factors, the break date and the existence of instability of factor loadings when the number of factor is constant. We …