Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 29 of 29

Full-Text Articles in Physical Sciences and Mathematics

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero Oct 2019

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero

Theses and Dissertations

This thesis discusses time series analysis of weather data in South Carolina for the last fifteen years (January 2003 to December 2017) for Columbia, Greenville and North Myrtle Beach. The first part presents a brief overview of different variables that are used in the analysis. That is, temperature, dew point, humidity and sea level pressure. A short discussion of time series data is also introduced. The second part is about modeling the variables. The models of choice are presented, fitted and model diagnostics is carried out. In the third part, we discuss background on climates of the cities and model …


Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson Sep 2019

Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson

Theses and Dissertations

This research centers on finding the statistical moments, network measures, and statistical tests that are most sensitive to various node degradations for the Barabási-Albert, Erdös-Rényi, and Watts-Strogratz network models. Thirty-five different graph structures were simulated for each of the random graph generation algorithms, and sensitivity analysis was undertaken on three different network measures: degree, betweenness, and closeness. In an effort to find the statistical moments that are the most sensitive to degradation within each network, four traditional moments: mean, variance, skewness, and kurtosis as well as three non-traditional moments: L-variance, L-skewness, and L-kurtosis were examined. Each of these moments were …


Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez Sep 2019

Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez

Theses and Dissertations

Hierarchical Linear Models (HLMs), also known as multi-level models, are an extension of multiple regression analysis and can aid in the understanding of human and machine workloads of a system. These models allow for prediction and testing in systems with hierarchies of two or more levels. The complex interrelated variability of these multi-level models exists in operational settings, such as the Air Force Distributed Common Ground System Full Motion Video (AF DCGS FMV) community which is composed of individuals (Level-1), groups (Level-2), units (Level-3), and organizations (Level-4). Through the development of sample size requirements and considerations for multi-level models, this …


Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir Aug 2019

Garch Modeling Of Value At Risk And Expected Shortfall Using Bayesian Model Averaging, Ismail Kheir

Theses and Dissertations

This thesis conducts Value at Risk (VaR) and Expected Shortfall (ES) estimation using GARCH modeling and Bayesian Model Averaging (BMA). BMA considers multiple models weighted by some information criterion. Through BMA, this thesis finds that VaR and ES estimates can be improved through enhanced modeling of the data generation process.


Multivariate Probit Models For Interval-Censored Failure Time Data, Yifan Zhang Jul 2019

Multivariate Probit Models For Interval-Censored Failure Time Data, Yifan Zhang

Theses and Dissertations

Survival analysis is an important branch of statistics that analyzes the time to event data. The events of interest can be death, disease occurrence, the failure of a machine part, etc.. One important feature of this type of data is censoring: information on time to event is not observed exactly due to loss to follow-up or non-occurrence of interested event before the trial ends. Censored data are commonly observed in clinical trials and epidemiological studies, since monitoring a person’s health over time after treatment is often required in medical or health studies. In this dissertation we focus on studying multivariate …


Statistical Analysis Of Interval-Censored Data Subject To Additional Complications, Qiang Zheng Jul 2019

Statistical Analysis Of Interval-Censored Data Subject To Additional Complications, Qiang Zheng

Theses and Dissertations

Survival analysis is an important branch of statistics that studies time to event data (or survival data), in which the response variable is time to a certain event of interest. The most prominent feature of survival data is that the response is not exactly observed due to limits of the study design or nature of the event of interest. Interval-censored data are a common type of survival data and occur frequently in real life studies where subjects are examined at periodical follow ups. The response time is usually not observed, but the status of the event of interest is known …


Estimation Problems For Pooled Data, Xichen Mou Jul 2019

Estimation Problems For Pooled Data, Xichen Mou

Theses and Dissertations

In epidemiological applications, individual specimens (e.g., blood, urine, etc.) are often pooled together to detect the presence of disease or to measure the concentration level of a specific biomarker. Due to the advantage of cost efficiency, pooled data are also seen in diverse areas such as genetics, animal ecology, and environmental science. With pooled data, individual observations are masked and new statistical methods are needed to estimate characteristics such as disease prevalence, the underlying density function of a biomarker, etc. We focus on three estimation problems for pooled data. Chapters 2 and 3 propose nonparametric estimators for the density function …


Investigations On Multiple Interval Estimators, Taeho Kim Jul 2019

Investigations On Multiple Interval Estimators, Taeho Kim

Theses and Dissertations

Multiple interval estimation for a set of parameters is investigated. To begin, a strategy of optimization for a multiple interval estimator (MIE) is introduced. This approach allocates distinct optimized levels to individual interval estimators so that the global expected content can be minimized while the global coverage probability is still maintained at a global level. This optimal allocation is achieved by a decision theoretic procedure which consists of two global risk functions. The major part of this manuscript is devoted to two multiple interval estimation procedures. Both procedures adopt prior information added to the classical setting, but these procedures do …


Extension Of Risk-Based Measure Of Time-Varying Prognostic Discrimination For Survival Models, Shujie Chen Jul 2019

Extension Of Risk-Based Measure Of Time-Varying Prognostic Discrimination For Survival Models, Shujie Chen

Theses and Dissertations

The Cox proportional hazards (PH) model and time dependent PH model are the most popular survival models in survival analysis. The hazard discrimination summary HDS(t) proposed by Liang and Heagerty [2017] is used to evaluate the mean hazard difference between cases and controls at time t. Liang and Heagerty [2017] evaluated the discrimination performance under the PH model and time dependent PH model with right censoring.

In this thesis, first, we further investigate their method via comprehensive simulations including 1) We extend the simulation in Liang and Heagerty [2017] under the PH model by adding more scenarios such as different …


Cocyclic Hadamard Matrices: An Efficient Search Based Algorithm, Jonathan S. Turner Jun 2019

Cocyclic Hadamard Matrices: An Efficient Search Based Algorithm, Jonathan S. Turner

Theses and Dissertations

This dissertation serves as the culmination of three papers. “Counting the decimation classes of binary vectors with relatively prime fixed-density" presents the first non-exhaustive decimation class counting algorithm. “A Novel Approach to Relatively Prime Fixed Density Bracelet Generation in Constant Amortized Time" presents a novel lexicon for binary vectors based upon the Discrete Fourier Transform, and develops a bracelet generation method based upon the same. “A Novel Legendre Pair Generation Algorithm" expands upon the bracelet generation algorithm and includes additional constraints imposed by Legendre Pairs. It further presents an efficient sorting and comparison algorithm based upon symmetric functions, as well …


Outlier-Resistant Models For Doubly Stochastic Point Processes, Leo Stephan Elsaesser May 2019

Outlier-Resistant Models For Doubly Stochastic Point Processes, Leo Stephan Elsaesser

Theses and Dissertations

This thesis proposes an outlier-resistant multiplicative component model for doubly stochastic point processes. The model is based on a principal component decomposition of the log-intensity functions, using heavy-tailed t-distributions for the component scores. As an example of application, the temporal distribution of bike check-out times in the Divvy bike sharing system of Chicago is analyzed using the t-model.


A Statistical Model For The Influence Of Temperature On Bike Demand In Bike-Sharing Systems, Tobias Tietze May 2019

A Statistical Model For The Influence Of Temperature On Bike Demand In Bike-Sharing Systems, Tobias Tietze

Theses and Dissertations

Efficient fleet management is essential for bike-sharing systems. Thus, it is important to understand the impact of environmental factors on bike demand. This thesis proposes a method to analyze the influence of temperature on bike demand. Hourly temperature data are approximated by smoothed curves and modeled by functional principal components. Bike check-out times, which can be seen as realizations of a doubly stochastic process, are modeled using multiplicative component models on the underlying intensity functions. The respective component scores are then related via a multivariate regression model. An analysis of data from the Divvy system of the City of Chicago …


A Statistical Model For The Influence Of Temperature On Bike Demand In Bike-Sharing Systems, Tobias Tietze May 2019

A Statistical Model For The Influence Of Temperature On Bike Demand In Bike-Sharing Systems, Tobias Tietze

Theses and Dissertations

Efficient fleet management is essential for bike-sharing systems. Thus, it is important to understand the impact of environmental factors on bike demand. This thesis proposes a method to analyze the influence of temperature on bike demand. Hourly temperature data are approximated by smoothed curves and modeled by functional principal components. Bike check-out times, which can be seen as realizations of a doubly stochastic process, are modeled using multiplicative component models on the underlying intensity functions. The respective component scores are then related via a multivariate regression model. An analysis of data from the Divvy system of the City of Chicago …


Outlier-Resistant Models For Doubly Stochastic Point Processes, Leo Stephan Elsaesser May 2019

Outlier-Resistant Models For Doubly Stochastic Point Processes, Leo Stephan Elsaesser

Theses and Dissertations

This thesis proposes an outlier-resistant multiplicative component model for doubly stochastic point processes. The model is based on a principal component decomposition of the log-intensity functions, using heavy-tailed t-distributions for the component scores. As an example of application, the temporal distribution of bike check-out times in the Divvy bike sharing system of Chicago is analyzed using the t-model.


Identifying And Incorporating Driver Behavior Variables Into Crash Prediction Models, Mohammad Razaur Rahman Shaon May 2019

Identifying And Incorporating Driver Behavior Variables Into Crash Prediction Models, Mohammad Razaur Rahman Shaon

Theses and Dissertations

All travelers are exposed to the risk for crashes on the road, as none of the roadways are entirely safe. Under Vision Zero, improving traffic safety on our nation’s highways is and will continue to be one of the most pivotal tasks on the national transportation agenda. For decades, researchers and transportation professionals have strived to identify causal relationships between crash occurrence and roadway geometry, and traffic-related variables on the mission of creating a safe environment for the traveling public. Although great achievements have been witnessed such as the publication of the Highway Safety Manual (HSM), research is rather limited …


Mle And Bayesian Methods To Analyze Data With Missing Values Below The Limit Of Detection, Xinxin Hu Apr 2019

Mle And Bayesian Methods To Analyze Data With Missing Values Below The Limit Of Detection, Xinxin Hu

Theses and Dissertations

As pesticides are widely used in agriculture, more and more people who work at places like farm are exposed to the pesticides. According to enviroment re- searches [Villarejo; 2003; Reigart and Roberts; 1999], being exposed to some kind of pesticides like Organophosphorus (OP) insecticides has significantly effected the health of farmworkers and their family. The actual level of pesticides can be detected with some limitation for now. However, it is hard to detect when the level is below the limit of detection (LOD). Therefore, the goal of our research is to propose several different methods to analyze data …


Inflated Standard Errors Of Mcmc Estimates In Irt, Dongho Shin Apr 2019

Inflated Standard Errors Of Mcmc Estimates In Irt, Dongho Shin

Theses and Dissertations

Two widely used algorithms for estimating item response theory (IRT) parameters are Markov chain Monte Carlo (MCMC) and the EM algorithm. In general, the MCMC algorithm has advantages over the EM algorithm - for example, the MCMC algorithm allows one to estimate the desired posterior distribution and also works more straightforwardly with complex IRT models. This ease of use, allows one to implement the MCMC algorithm without carefully consideration. Previous studies, Hendrix (2011) and Lee (2016), noted that the estimated standard errors from the MCMC algorithm are larger than those from the EM algorithm. Therefore, this study investigate the reason …


Randomization Analysis Driven Software, Steph-Yves Louis Apr 2019

Randomization Analysis Driven Software, Steph-Yves Louis

Theses and Dissertations

The application of a method of randomization for a clinical trial frequently summarizes to using Simple Randomization. Even though the latter method provides favorable characteristics, if the collected sample is not large enough, it still presents the highest chance of imbalance both marginally in the treatment groups and locally in terms of the covariates. Methods of Permuted Block Randomization, Urn Randomization, Stratified Permuted Block Randomization, and Minimization represent popular alternative methods that one should consider depending on the goal of the study. A comparison of the previously mentioned methods is carried to evaluate their performance with samples that are not …


Cluster Analysis Of Mixed-Mode Data, Yawei Liang Apr 2019

Cluster Analysis Of Mixed-Mode Data, Yawei Liang

Theses and Dissertations

In the modern world, data have become increasingly more complex and often contain different types of features. Two very common types of features are continuous and discrete variables. Clustering mixed-mode data, which include both continuous and discrete variables, can be done in various ways. Furthermore, a continuous variable can take any value between its minimum and maximum. Types of continuous vari- ables include bounded or unbounded normal variables, uniform variables, circular variables, etc. Discrete variables include types other than continuous variables, such as binary variables, categorical (nominal) variables, Poisson variables, etc. Difficulties in clustering mixed-mode data include handling the association …


Spatio-Temporal Analysis Of Precipitation And Flood Data From South Carolina, Haigang Liu Apr 2019

Spatio-Temporal Analysis Of Precipitation And Flood Data From South Carolina, Haigang Liu

Theses and Dissertations

Spatio-temporal data are everywhere: we encounter them on TV, in newspapers, on computer screens, on tablets, and on plain paper maps. As a result, researchers in di- verse areas are increasingly faced with the task of modeling geographically-referenced and temporally-correlated data. In this dissertation, we propose two different spa- tiotemporal models to capture the behavior of rainfall and flood data in the state of South Carolina.

Both models are built using a Bayesian hierarchical framework, which involves specifying the true underlying process in the first level and the spatio-temporal ran- dom effect in the second level of the hierarchy. The …


Regression For Pooled Testing Data With Biomedical Applications, Juexin Lin Apr 2019

Regression For Pooled Testing Data With Biomedical Applications, Juexin Lin

Theses and Dissertations

Since first introduced by Dorfman in 1943, pooled testing has been widely used as a cost and time effective testing protocol in the variety of applications. This dis- sertation consists of three projects that reveal the use of pooling techniques in the disease prevention from the perspective of regression. For disease monitoring and control, individual covariates information are often of practical interest and yield meaningful interpretations. It is natural to model the outcome of interest, which can be either a disease status (binary) or a biomarker concentration index (continuous), with individual-specific covariates through a regression analysis. Chapter 2 focuses on …


Assessing The Impact Of Incorporating Residential Histories Into The Spatial Analysis Of Cancer Risk, Anny-Claude Joseph Jan 2019

Assessing The Impact Of Incorporating Residential Histories Into The Spatial Analysis Of Cancer Risk, Anny-Claude Joseph

Theses and Dissertations

In many spatial epidemiologic studies, investigators use residential location at diagnosis as a surrogate for unknown environmental exposures or as a geographic basis for assigning measured exposures. Inherently, they make assumptions about the timing and location of pertinent exposures which may prove problematic when studying long latency diseases such as cancer.

In this work we explored how the association between environmental exposures and disease risk for long-latency health outcomes like cancer is affected by residential mobility. We used simulation studies conditioned on real data to evaluate the extent to which the commonly held assumption of no residential mobility 1) affected …


Site- And Location-Adjusted Approaches To Adaptive Allocation Clinical Trial Designs, Brian S. Di Pace Jan 2019

Site- And Location-Adjusted Approaches To Adaptive Allocation Clinical Trial Designs, Brian S. Di Pace

Theses and Dissertations

Response-Adaptive (RA) designs are used to adaptively allocate patients in clinical trials. These methods have been generalized to include Covariate-Adjusted Response-Adaptive (CARA) designs, which adjust treatment assignments for a set of covariates while maintaining features of the RA designs. Challenges may arise in multi-center trials if differential treatment responses and/or effects among sites exist. We propose Site-Adjusted Response-Adaptive (SARA) approaches to account for inter-center variability in treatment response and/or effectiveness, including either a fixed site effect or both random site and treatment-by-site interaction effects to calculate conditional probabilities. These success probabilities are used to update assignment probabilities for allocating patients …


Statistical Designs For Network A/B Testing, Victoria V. Pokhilko Jan 2019

Statistical Designs For Network A/B Testing, Victoria V. Pokhilko

Theses and Dissertations

A/B testing refers to the statistical procedure of experimental design and analysis to compare two treatments, A and B, applied to different testing subjects. It is widely used by technology companies such as Facebook, LinkedIn, and Netflix, to compare different algorithms, web-designs, and other online products and services. The subjects participating in these online A/B testing experiments are users who are connected in different scales of social networks. Two connected subjects are similar in terms of their social behaviors, education and financial background, and other demographic aspects. Hence, it is only natural to assume that their reactions to online products …


Methods For Joint Normalization And Comparison Of Hi-C Data, John C. Stansfield Jan 2019

Methods For Joint Normalization And Comparison Of Hi-C Data, John C. Stansfield

Theses and Dissertations

The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing …


Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell Jan 2019

Spectral Methods For The Detection And Characterization Of Topologically Associated Domains, Kellen Garrison Cresswell

Theses and Dissertations

The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops which is relatively stable across cell-lines and even across species. These TADs dynamically reorganize during development of disease, and exhibit cell- and conditionspecific differences. Identifying such hierarchical structures and how they change between conditions is a critical step in understanding genome regulation and disease development. Despite their importance, there are relatively few tools for identification of TADs and even fewer for …


Genome-Wide Systems Genetics Of Alcohol Consumption And Dependence, Kristin Mignogna Jan 2019

Genome-Wide Systems Genetics Of Alcohol Consumption And Dependence, Kristin Mignogna

Theses and Dissertations

Widely effective treatment for alcohol use disorder is not yet available, because the exact biological mechanisms that underlie this disorder are not completely understood. One way to gain a better understanding of these mechanisms is to examine the genetic frameworks that contribute to the risk for developing this disorder. This dissertation examines genetic association data in combination with gene expression networks in the brain to identify functional groups of genes associated with alcohol consumption and dependence.

The first study took advantage of the behavioral complexity of human samples, and experimental capabilities provided by mouse models, by co-analyzing gene expression networks …


Bayesian Nonparametric Analysis Of Longitudinal Data With Non-Ignorable Non-Monotone Missingness, Yu Cao Jan 2019

Bayesian Nonparametric Analysis Of Longitudinal Data With Non-Ignorable Non-Monotone Missingness, Yu Cao

Theses and Dissertations

In longitudinal studies, outcomes are measured repeatedly over time, but in reality clinical studies are full of missing data points of monotone and non-monotone nature. Often this missingness is related to the unobserved data so that it is non-ignorable. In such context, pattern-mixture model (PMM) is one popular tool to analyze the joint distribution of outcome and missingness patterns. Then the unobserved outcomes are imputed using the distribution of observed outcomes, conditioned on missing patterns. However, the existing methods suffer from model identification issues if data is sparse in specific missing patterns, which is very likely to happen with a …


Methods For Evaluating Dropout Attrition In Survey Data, Camille J. Hochheimer Jan 2019

Methods For Evaluating Dropout Attrition In Survey Data, Camille J. Hochheimer

Theses and Dissertations

As researchers increasingly use web-based surveys, the ease of dropping out in the online setting is a growing issue in ensuring data quality. One theory is that dropout or attrition occurs in phases that can be generalized to phases of high dropout and phases of stable use. In order to detect these phases, several methods are explored. First, existing methods and user-specified thresholds are applied to survey data where significant changes in the dropout rate between two questions is interpreted as the start or end of a high dropout phase. Next, survey dropout is considered as a time-to-event outcome and …