Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Biostatistics

Theses/Dissertations

2015

Institution
Keyword
Publication

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush Nov 2015

Niche-Based Modeling Of Japanese Stiltgrass (Microstegium Vimineum) Using Presence-Only Information, Nathan Bush

Masters Theses

The Connecticut River watershed is experiencing a rapid invasion of aggressive non-native plant species, which threaten watershed function and structure. Volunteer-based monitoring programs such as the University of Massachusetts’ OutSmart Invasives Species Project, Early Detection Distribution Mapping System (EDDMapS) and the Invasive Plant Atlas of New England (IPANE) have gathered valuable invasive plant data. These programs provide a unique opportunity for researchers to model invasive plant species utilizing citizen-sourced data. This study took advantage of these large data sources to model invasive plant distribution and to determine environmental and biophysical predictors that are most influential in dispersion, and to identify …


Physical Activity Classification With Conditional Random Fields, Evan L. Ray Nov 2015

Physical Activity Classification With Conditional Random Fields, Evan L. Ray

Doctoral Dissertations

In this thesis we develop methods for classifying physical activity using accelerometer recordings. We cast this as a problem of classification in time series with moderate to high dimensional observations at each time point. Specifically, we observe a vector of summary statistics of the accelerometer signal at each point in time, and we wish to use these observations to estimate the type and intensity of physical activity the individual engaged in as it changes over time. Our methods are based on Conditional Random Fields, which allow us to capture temporal dependence in an individual’s physical activity type without requiring us …


Bayesian Inference On Longitudinal Semi-Continuous Substance Abuse/Dependence Symptoms Data, Dongyuan Xing Sep 2015

Bayesian Inference On Longitudinal Semi-Continuous Substance Abuse/Dependence Symptoms Data, Dongyuan Xing

USF Tampa Graduate Theses and Dissertations

Substance use data such as alcohol drinking often contain a high proportion of zeros. In studies examining the alcohol consumption in college students, for instance, many students may not drink in the studied period, resulting in a number of zeros. Zero-inflated continuous data, also called semi continuous data, typically consist of a mixture of a degenerate distribution at the origin (zero) and a right-skewed, continuous distribution for the positive values. Ignoring the extreme non-normality in semi-continuous data may lead to substantially biased estimates and inference. Longitudinal or repeated measures of semi-continuous data present special challenges in statistical inference because of …


On The Estimation Of Intracluster Correlation For Time-To-Event Outcomes In Cluster Randomized Trials, Sumeet Kalia Aug 2015

On The Estimation Of Intracluster Correlation For Time-To-Event Outcomes In Cluster Randomized Trials, Sumeet Kalia

Electronic Thesis and Dissertation Repository

Cluster randomized trials (CRTs) involve the random assignment of intact social units rather than independent subjects to intervention groups. Time-to-event outcomes often are endpoints in CRTs where the intracluster correlation coefficient (ICC) serves as a descriptive parameter to assess the similarity among outcomes in a cluster. However, estimating the ICC in CRTs with time-to-event outcomes is a challenge due to the presence of censored observations. The ICC is estimated for two CRTs using the censoring indicators and observed outcomes.

A simulation study explores the effect of administrative censoring on estimating the ICC. Results show that the ICC estimators derived from …


Healthy And Unhealthy Statistics: Examining The Impact Of Erroneous Statistical Analyses In Health-Related Research, Britney Allen Aug 2015

Healthy And Unhealthy Statistics: Examining The Impact Of Erroneous Statistical Analyses In Health-Related Research, Britney Allen

Electronic Thesis and Dissertation Repository

Sound statistical analyses are essential to the advancement of medicine. Although certainly not always the case, far too many publications are based on weak or inappropriate statistical methodology, leading to questionable results. Statistical reporting guidelines and standards for research are being introduced which should help curb this problem. Wide recognition of the need for statistical methodologies aligned with research questions and study designs, and the impact when this is not the case, would help prevent this problem. In this thesis, I illustrate the consequences of erroneous statistical analyses on data from an observational study on Multiple Sclerosis and I investigate …


Germline Mutation Detection In Next Generation Sequencing Data And Tp53 Mutation Carrier Probability Estimation For Li-Fraumeni Syndrome, Gang Peng Aug 2015

Germline Mutation Detection In Next Generation Sequencing Data And Tp53 Mutation Carrier Probability Estimation For Li-Fraumeni Syndrome, Gang Peng

Dissertations & Theses (Open Access)

Next generation sequencing technology has been widely used in genomic analysis, but its application has been compromised by the missing true variants, especially when these variants are rare. We proposed a family-based variant calling method, FamSeq, integrating Mendelian transmission information with de novo mutation and sequencing data to improve the variant calling accuracy. We investigated the factors impacting the improvement of family-based variant calling in simulation data and validated it in real sequencing data. In both simulation and real data, FamSeq works better than the single individual based method.

In FamSeq, we implemented four different methods for the Mendelian genetic …


Using Capture-Mark-Recapture Techniques To Estimate Detection Probabilities & Fidelity Of Expression For The Critically Endangered James Spinymussel (Pleurobema Collina)., Alaina C. Esposito May 2015

Using Capture-Mark-Recapture Techniques To Estimate Detection Probabilities & Fidelity Of Expression For The Critically Endangered James Spinymussel (Pleurobema Collina)., Alaina C. Esposito

Masters Theses, 2010-2019

The critically endangered James Spinymussel (Pleurobema collina) is a species of freshwater mussel endemic to Virginia’s James and Dan River basins. In the last 20 years, P. collina has experienced a substantial decline in numbers and currently occupies approximately 10% of its original habitat; however, little information is known about this species to assist in conservation. A 230-meter reach of transitional habitat in Swift Run was selected for repeat observations to estimate detection probabilities using a Capture-Mark-Recapture framework. In June 2014, visual scouting began to locate and tag P. collina (including other mussels in the community) with PIT …


Novel Applications Of And Extensions To Linear Regression Methods For The Biomedical And Materials Sciences., Joe Bible May 2015

Novel Applications Of And Extensions To Linear Regression Methods For The Biomedical And Materials Sciences., Joe Bible

Electronic Theses and Dissertations

In this work we present three topics, each of which centered on either the application or modification of various linear regression methods. Our work with respect to the “Materials Genome” project while undermined by oversimplification and data integrity issues in its early stages, provides a sound platform from which the project can proceed successfully. Building upon a growing body of knowledge around the use of Weighted Generalized Estimating Equations (WGEE), our second investigation proposes an extension to that framework intended to address the inherent bias present in the analysis of clustered longitudinal data with potentially informative cluster sizes and temporal …


Genetics Of Obesity In Starr County, Texas Mexican Americans, Heather M. Highland May 2015

Genetics Of Obesity In Starr County, Texas Mexican Americans, Heather M. Highland

Dissertations & Theses (Open Access)

Currently, over two-thirds of Americans are classified as over-weight or obese. Obesity increases risk for many other diseases including type 2 diabetes, heart disease, stroke, and cancer, making obesity the largest public health problem in America and most other Westernized nations. Hispanics have a higher rate of both obesity and type 2 diabetes, making them a particularly interesting population in which to study obesity. For the last 33 years, the Starr County Health Studies has collected an array of phenotypes and biological samples from residents of Starr County, along Texas-Mexico border. This study includes 825 subjects who were not known …


Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula May 2015

Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula

Electronic Theses and Dissertations

Determining the best clustering algorithm and ideal number of clusters for a particular dataset is a fundamental difficulty in unsupervised clustering analysis. In biological research, data generated from Next Generation Sequencing technology and microarray gene expression data are becoming more and more common, so new tools and resources are needed to group such high dimensional data using clustering analysis. Different clustering algorithms can group data very differently. Therefore, there is a need to determine the best groupings in a given dataset using the most suitable clustering algorithm for that data. This paper presents the R package optCluster as an efficient …


Summary Of Survival Analysis With Sas Procedures., Derek Duane Childers 1990- May 2015

Summary Of Survival Analysis With Sas Procedures., Derek Duane Childers 1990-

Electronic Theses and Dissertations

The research conducted for this thesis was performed to summarize some of the most commonly used survival analysis techniques as well as to create one macro that will provide the solutions for these techniques. Some of the techniques that this thesis focuses on are survival and hazard functions, mean and median survival times, life table, log rank test, proportional hazards/model building, and competing risk. To further analyze these survival analysis techniques I will use the Bone Marrow Transplantation for Leukemia dataset. This trial consists of either acute myelocytic leukemia (AML 99 patients) or acute lymphoblastic leukemia (ALL 38 patients). There …


The Stability Of The Iris As A Biometric Modality, Benjamin Wright Petry Apr 2015

The Stability Of The Iris As A Biometric Modality, Benjamin Wright Petry

Open Access Theses

In this thesis, the question of the stability of a group of individual subjects' irises is examined and answered. This stability is examined in regards to the time scale of the month range. The covariate for this research was time. Images collected during one month of separation between captures were examined. The genuine and impostor scores for these images were calculated and then interpreted using the stability score index. This index produced a quantifiable value for the stability of iris match scores over the months of the examination. ^ Additionally, a new framework for collecting and analyzing time in biometrics …


Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula Apr 2015

Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula

Mathematics & Statistics Theses & Dissertations

It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. …


The Effect Of A New Hospital-Based Congestive Heart Failure Care Protocol On Rate Of 30-Day Readmission Among Chf Patients, Eric A. Cohen Mar 2015

The Effect Of A New Hospital-Based Congestive Heart Failure Care Protocol On Rate Of 30-Day Readmission Among Chf Patients, Eric A. Cohen

Masters Theses

Approximately 20% of congestive heart failure (CHF) patients are readmitted within 30 days of hospital discharge, a rate which may be affected by in-hospital and post-discharge care. Reducing this rate is important to hospitals, both to improve outcomes and to avoid reductions in Medicare reimbursement. Assessing outcomes within a short post-discharge window best measures the impact of the care, planning, and followup of that admission; but most research on the effects of changes in CHF care has measured outcomes over periods longer than 30 days, adding the unpredictable long-term course of CHF to the factors affecting the outcome. As well, …


Diffuse Optical Measurements Of Head And Neck Tumor Hemodynamics For Early Prediction Of Chemo-Radiation Therapy Outcomes, Lixin Dong Jan 2015

Diffuse Optical Measurements Of Head And Neck Tumor Hemodynamics For Early Prediction Of Chemo-Radiation Therapy Outcomes, Lixin Dong

Theses and Dissertations--Biomedical Engineering

Chemo-radiation therapy is a principal modality for the treatment of head and neck cancers, and its efficacy depends on the interaction of tumor oxygen with free radicals. In this study, we adopted a novel hybrid diffuse optical instrument combining a commercial frequency-domain tissue oximeter (Imagent) and a custom-made diffuse correlation spectroscopy (DCS) flowmeter, which allowed for simultaneous measurements of tumor blood flow and blood oxygenation. Using this hybrid instrument we continually measured tumor hemodynamic responses to chemo-radiation therapy over the treatment period of 7 weeks. We also explored monitoring dynamic tumor hemodynamic changes during radiation delivery. Blood flow data analysis …


A Meta-Analysis Of Association Between One-Carbon Metabolism Gene Polymorphisms And Risk Of Prostate Cancer, Mahmood Tazari Jan 2015

A Meta-Analysis Of Association Between One-Carbon Metabolism Gene Polymorphisms And Risk Of Prostate Cancer, Mahmood Tazari

Walden Dissertations and Doctoral Studies

Prostate cancer is the most common cancer among men. The purpose of this quantitative, meta-analysis study was to examine one-carbon metabolism gene polymorphisms in a group of genes to determine their association with prostate cancer risk. The genetic epidemiology theory provided the framework for the study. The data collected were from published articles. From over 2,800 individual studies, 20 articles were retained for results and data abstraction, following the title, abstract screen, and full text screening in the second phase. The data were analyzed by a meta-analysis statistical method, combining the results from selected studies to estimate the overall association. …


Developments In Nonparametric Regression Methods With Application To Raman Spectroscopy Analysis, Jing Guo Jan 2015

Developments In Nonparametric Regression Methods With Application To Raman Spectroscopy Analysis, Jing Guo

Theses and Dissertations--Epidemiology and Biostatistics

Raman spectroscopy has been successfully employed in the classification of breast pathologies involving basis spectra for chemical constituents of breast tissue and resulted in high sensitivity (94%) and specificity (96%) (Haka et al, 2005). Motivated by recent developments in nonparametric regression, in this work, we adapt stacking, boosting, and dynamic ensemble learning into a nonparametric regression framework with application to Raman spectroscopy analysis for breast cancer diagnosis. In Chapter 2, we apply compound estimation (Charnigo and Srinivasan, 2011) in Raman spectra analysis to classify normal, benign, and malignant breast tissue. We explore both the spectra profiles and their derivatives to …


Proof-Of-Concept Of Environmental Dna Tools For Atlantic Sturgeon Management, Jameson Hinkle Jan 2015

Proof-Of-Concept Of Environmental Dna Tools For Atlantic Sturgeon Management, Jameson Hinkle

Theses and Dissertations

Abstract

The Atlantic Sturgeon (Acipenser oxyrinchus oxyrinchus, Mitchell) is an anadromous species that spawns in tidal freshwater rivers from Canada to Florida. Overfishing, river sedimentation and alteration of the river bottom have decreased Atlantic Sturgeon populations, and NOAA lists the species as endangered. Ecologists sometimes find it difficult to locate individuals of a species that is rare, endangered or invasive. The need for methods less invasive that can create more resolution of cryptic species presence is necessary. Environmental DNA (eDNA) is a non-invasive means of detecting rare, endangered, or invasive species by isolating nuclear or mitochondrial DNA (mtDNA) from the …


Multi-State Models For Interval Censored Data With Competing Risk, Shaoceng Wei Jan 2015

Multi-State Models For Interval Censored Data With Competing Risk, Shaoceng Wei

Theses and Dissertations--Statistics

Multi-state models are often used to evaluate the effect of death as a competing event to the development of dementia in a longitudinal study of the cognitive status of elderly subjects. In this dissertation, both multi-state Markov model and semi-Markov model are used to characterize the flow of subjects from intact cognition to dementia with mild cognitive impairment and global impairment as intervening transient, cognitive states and death as a competing risk.

Firstly, a multi-state Markov model with three transient states: intact cognition, mild cognitive impairment (M.C.I.) and global impairment (G.I.) and one absorbing state: dementia is used to model …


Evaluation Of The Signature Molecular Descriptor With Blosum62 And An All-Atom Description For Use In Sequence Alignment Of Proteins, Lindsay M. Aichinger Jan 2015

Evaluation Of The Signature Molecular Descriptor With Blosum62 And An All-Atom Description For Use In Sequence Alignment Of Proteins, Lindsay M. Aichinger

Williams Honors College, Honors Research Projects

This Honors Project focused on a few aspects of this topic. The second is comparing the molecular signature kernels to three of the BLOSUM matrices (30, 62, and 90) to test the accuracy of the mathematical model. The kernel matrix was manipulated in order to improve the relationship by focusing on side groups and also by changing how the structure was represented in the matrix by increasing the initial height distance from the central atom (Height 1 and Height 2 included).

There were multiple design constraints for this project. The first was the comparison with the BLOSUM matrices (30, 62, …


Two Step Parsimonious Variable Selection For Right Censored Continuous Survival Time Models, Anju Menon Jan 2015

Two Step Parsimonious Variable Selection For Right Censored Continuous Survival Time Models, Anju Menon

Legacy Theses & Dissertations (2009 - 2024)

Variable selection is fundamental in any kind of statistical modeling. There has been ex- tensive research by different authors on methods of variable selection from linear regression models to more complex non-linear applications. Modeling survival data especially poses challenges because of a more complicated data structure as the time variable T is usually subject to censoring. This thesis presents a two step objective approach to choose between several candidate models based on the the ability of the model to predict survival times using loss functions. Once potentially important variables are selected using a screening method called Iterative Sure Independence Screening(ISIS) …


Developing A Weibull Model Extension To Estimate Cancer Latency Times, Diana L. Nadler Jan 2015

Developing A Weibull Model Extension To Estimate Cancer Latency Times, Diana L. Nadler

Legacy Theses & Dissertations (2009 - 2024)

More than one-third of all Americans will be diagnosed with cancer sometime in their lives. Though their illness may be invisible now, it presents a great, and largely unexamined, opportunity to find and treat their cancers early. Early detection represents one of the most promising approaches to reduce the growing cancer burden by identifying cancer while it is localized and curable, preventing not only mortality, but also reducing morbidity and costs.


The Distribution Of Type 1 Diabetes Onset In The United States By Demographic Factors, Margaret Beckstrand Jan 2015

The Distribution Of Type 1 Diabetes Onset In The United States By Demographic Factors, Margaret Beckstrand

Walden Dissertations and Doctoral Studies

Type 1 diabetes (T1D) is a chronic and lifelong condition, often diagnosed in childhood. Patients with T1D are at elevated risks of associated health complications, comorbidities, and mortality. Occurrence, clinical presentation, and complications related to T1D differ by age of onset, ethnicity, and gender. The last reported population-based estimates regarding the burden of T1D in children using the National Health and Nutrition Examination Survey (NHANES) were published in 2008, and these estimates were not well stratified by age of onset, ethnicity, and gender. The purpose of this study was to examine these demographics within the conceptual framework of the hygiene …


Nonlinear Hierarchical Models For Longitudinal Experimental Infection Studies, Michael David Singleton Jan 2015

Nonlinear Hierarchical Models For Longitudinal Experimental Infection Studies, Michael David Singleton

Theses and Dissertations--Epidemiology and Biostatistics

Experimental infection (EI) studies, involving the intentional inoculation of animal or human subjects with an infectious agent under controlled conditions, have a long history in infectious disease research. Longitudinal infection response data often arise in EI studies designed to demonstrate vaccine efficacy, explore disease etiology, pathogenesis and transmission, or understand the host immune response to infection. Viral loads, antibody titers, symptom scores and body temperature are a few of the outcome variables commonly studied. Longitudinal EI data are inherently nonlinear, often with single-peaked response trajectories with a common pre- and post-infection baseline. Such data are frequently analyzed with statistical methods …


Comparing Welch's Anova, A Kruskal-Wallis Test And Traditional Anova In Case Of Heterogeneity Of Variance, Hangcheng Liu Jan 2015

Comparing Welch's Anova, A Kruskal-Wallis Test And Traditional Anova In Case Of Heterogeneity Of Variance, Hangcheng Liu

Theses and Dissertations

Analysis of variance (ANOVA) is a robust test against the normality assumption, but it may be inappropriate when the assumption of homogeneity of variance has been violated. Welch ANOVA and the Kruskal-Wallis test (a non-parametric method) can be applicable for this case. In this study we compare the three methods in empirical type I error rate and power, when heterogeneity of variance occurs and find out which method is the most suitable with which cases including balanced/unbalanced, small/large sample size, and/or with normal/non-normal distributions.


Controlling For Confounding When Association Is Quantified By Area Under The Roc Curve, Hadiza I. Galadima Jan 2015

Controlling For Confounding When Association Is Quantified By Area Under The Roc Curve, Hadiza I. Galadima

Theses and Dissertations

In the medical literature, there has been an increased interest in evaluating association between exposure and outcomes using nonrandomized observational studies. However, because assignments to exposure are not done randomly in observational studies, comparisons of outcomes between exposed and non-exposed subjects must account for the effect of confounders. Propensity score methods have been widely used to control for confounding, when estimating exposure effect. Previous studies have shown that conditioning on the propensity score results in biased estimation of odds ratio and hazard ratio. However, there is a lack of research into the performance of propensity score methods for estimating the …


High-Throughput Data Analysis: Application To Micronuclei Frequency And T-Cell Receptor Sequencing, Mateusz Makowski Jan 2015

High-Throughput Data Analysis: Application To Micronuclei Frequency And T-Cell Receptor Sequencing, Mateusz Makowski

Theses and Dissertations

The advent of high-throughput sequencing has brought about the creation of an unprecedented amount of research data. Analytical methodology has not been able to keep pace with the plethora of data being produced. Two assays, ImmunoSEQ and the cytokinesisblock micronucleus (CBMN), that both produce count data and have few methods available to analyze them are considered.

ImmunoSEQ is a sequencing assay that measures the beta T-cell receptor (TCR) repertoire. The ImmunoSEQ assay was used to describe the TCR repertoires of patients that have undergone hematopoietic stem cell transplantation (HSCT). Several different methods for spectratype analysis were extended to the TCR …