Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 42

Full-Text Articles in Physical Sciences and Mathematics

Spatiotemporal Subspace Feature Tracking By Mining Discriminatory Characteristics, Richard D. Appiah Oct 2017

Spatiotemporal Subspace Feature Tracking By Mining Discriminatory Characteristics, Richard D. Appiah

Doctoral Dissertations

Recent advancements in data collection technologies have made it possible to collect heterogeneous data at complex levels of abstraction, and at an alarming pace and volume. Data mining, and most recently data science seek to discover hidden patterns and insights from these data by employing a variety of knowledge discovery techniques. At the core of these techniques is the selection and use of features, variables or properties upon which the data were acquired to facilitate effective data modeling. Selecting relevant features in data modeling is critical to ensure an overall model accuracy and optimal predictive performance of future effects. The …


Motion-Capture-Based Hand Gesture Recognition For Computing And Control, Andrew Gardner Jul 2017

Motion-Capture-Based Hand Gesture Recognition For Computing And Control, Andrew Gardner

Doctoral Dissertations

This dissertation focuses on the study and development of algorithms that enable the analysis and recognition of hand gestures in a motion capture environment. Central to this work is the study of unlabeled point sets in a more abstract sense. Evaluations of proposed methods focus on examining their generalization to users not encountered during system training.

In an initial exploratory study, we compare various classification algorithms based upon multiple interpretations and feature transformations of point sets, including those based upon aggregate features (e.g. mean) and a pseudo-rasterization of the capture space. We find aggregate feature classifiers to be balanced across …


A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang May 2017

A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang

Graduate Theses and Dissertations

This thesis first describes the general idea behind Bayes Inference, various sampling methods based on Bayes theorem and many examples. Then a Bayes approach to model selection, called Stochastic Search Variable Selection (SSVS) is discussed. It was originally proposed by George and McCulloch (1993). In a normal regression model where the number of covariates is large, only a small subset tend to be significant most of the times. This Bayes procedure specifies a mixture prior for each of the unknown regression coefficient, the mixture prior was originally proposed by Geweke (1996). This mixture prior will be updated as data becomes …


A Study Of Mathematics Achievement, Placement, And Graduation Of Engineering Students, Sara Hahler Blazek Jan 2017

A Study Of Mathematics Achievement, Placement, And Graduation Of Engineering Students, Sara Hahler Blazek

Doctoral Dissertations

The purpose of this study was to determine how background knowledge impacts freshmen engineering students' success at Louisiana Tech University in terms of grades in two different freshman classes and graduation. To determine what factors impact students, three different studies were implemented. The first study used linear regression to analyze which demographic and academic variables significantly impacted freshman math and engineering courses. Using regression discontinuity, the second study determined if the university's placement requirement for Pre-Calculus was appropriate. The final study analyzed factors that impact graduation for engineering students as well as other disciplines to determine which significant variables were …


A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis Dec 2016

A Framework For The Statistical Analysis Of Mass Spectrometry Imaging Experiments, Kyle Bemis

Open Access Dissertations

Mass spectrometry (MS) imaging is a powerful investigation technique for a wide range of biological applications such as molecular histology of tissue, whole body sections, and bacterial films , and biomedical applications such as cancer diagnosis. MS imaging visualizes the spatial distribution of molecular ions in a sample by repeatedly collecting mass spectra across its surface, resulting in complex, high-dimensional imaging datasets. Two of the primary goals of statistical analysis of MS imaging experiments are classification (for supervised experiments), i.e. assigning pixels to pre-defined classes based on their spectral profiles, and segmentation (for unsupervised experiments), i.e. assigning pixels to newly …


Group Transformation And Identification With Kernel Methods And Big Data Mixed Logistic Regression, Chao Pan Dec 2016

Group Transformation And Identification With Kernel Methods And Big Data Mixed Logistic Regression, Chao Pan

Open Access Dissertations

Exploratory Data Analysis (EDA) is a crucial step in the life cycle of data analysis. Exploring data with effective methods would reveal main characteristics of data and provides guidance for model building. The goal of this thesis is to develop effective and efficient methods for data exploration in the regression setting.

First, we propose to use optimal group transformations as a general approach for exploring the relationship between predictor variables X and the response Y. This approach can be considered an automatic procedure to identify the best characteristic of P( Y|X) under which the relationship …


Computational Environment For Modeling And Analysing Network Traffic Behaviour Using The Divide And Recombine Framework, Ashrith Barthur Dec 2016

Computational Environment For Modeling And Analysing Network Traffic Behaviour Using The Divide And Recombine Framework, Ashrith Barthur

Open Access Dissertations

There are two essential goals of this research. The first goal is to design and construct a computational environment that is used for studying large and complex datasets in the cybersecurity domain. The second goal is to analyse the Spamhaus blacklist query dataset which includes uncovering the properties of blacklisted hosts and understanding the nature of blacklisted hosts over time.

The analytical environment enables deep analysis of very large and complex datasets by exploiting the divide and recombine framework. The capability to analyse data in depth enables one to go beyond just summary statistics in research. This deep analysis is …


Functional Regression Models In The Frame Work Of Reproducing Kernel Hilbert Space, Simeng Qu Dec 2016

Functional Regression Models In The Frame Work Of Reproducing Kernel Hilbert Space, Simeng Qu

Open Access Dissertations

The aim of this thesis is to systematically investigate some functional regression models for accurately quantifying the effect of functional predictors. In particular, three functional models are studied: functional linear regression model, functional Cox model, and function-on-scalar model. Both theoretical properties and numerical algorithms are studied in depth. The new models find broad applications in many areas.

For the functional linear regression model, the focus is on testing the nullity of the slope function, and a generalized likelihood ratio test based on easily implementable data-driven estimate is proposed. The quality of the test is measured by the minimal distance between …


Divide And Recombined For Large Complex Data: Nonparametric-Regression Modelling Of Spatial And Seasonal-Temporal Time Series, Xiaosu Tong Dec 2016

Divide And Recombined For Large Complex Data: Nonparametric-Regression Modelling Of Spatial And Seasonal-Temporal Time Series, Xiaosu Tong

Open Access Dissertations

In the first chapter of this dissertation, I briefly introduce one type of nonparametric regression method, namely local polynomial regression, followed by emphasis on one specific application of loess on time series decomposition, called Seasonal Trend Loess (STL). The chapter is closed by the introduction of D\&R; (Divide and Recombined) statistical framework. Data can be divided into subsets, each of which is applied with a statistical analysis method. This is an embarrassing parallel procedure since there is no communication between each subset. Then the analysis result for each subset are combined together to be the final analysis outcome for the …


Monte Carlo Methods In Bayesian Inference: Theory, Methods And Applications, Huarui Zhang Dec 2016

Monte Carlo Methods In Bayesian Inference: Theory, Methods And Applications, Huarui Zhang

Graduate Theses and Dissertations

Monte Carlo methods are becoming more and more popular in statistics due to the fast development of efficient computing technologies. One of the major beneficiaries of this advent is the field of Bayesian inference. The aim of this thesis is two-fold: (i) to explain the theory justifying the validity of the simulation-based schemes in a Bayesian setting (why they should work) and (ii) to apply them in several different types of data analysis that a statistician has to routinely encounter. In Chapter 1, I introduce key concepts in Bayesian statistics. Then we discuss Monte Carlo Simulation methods in detail. Our …


Analysis Of Break-Points In Financial Time Series, Jean Remy Habimana Dec 2016

Analysis Of Break-Points In Financial Time Series, Jean Remy Habimana

Graduate Theses and Dissertations

A time series is a set of random values collected at equal time intervals; this randomness makes these types of series not easy to predict because the structure of the series may change at any time. As discussed in previous research, the structure of time series may change at any time due to the change in mean and/or variance of the series. Consequently, based on this structure, it is wise not to assume that these series are stationary. This paper, discusses, a method of analyzing time series by considering the entire series non-stationary, assuming there is random change in unconditional …


Controlling For Confounding Network Properties In Hypothesis Testing And Anomaly Detection, Timothy La Fond Aug 2016

Controlling For Confounding Network Properties In Hypothesis Testing And Anomaly Detection, Timothy La Fond

Open Access Dissertations

An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process. Unfortunately, choosing network statistics that are dependent on confounding factors like the total number of nodes or edges can lead to incorrect conclusions (e.g., false positives and false negatives). In this dissertation we describe the challenges that face …


Extreme-Strike And Small-Time Asymptotics For Gaussian Stochastic Volatility Models, Xin Zhang Aug 2016

Extreme-Strike And Small-Time Asymptotics For Gaussian Stochastic Volatility Models, Xin Zhang

Open Access Dissertations

Asymptotic behavior of implied volatility is of our interest in this dissertation. For extreme strike, we consider a stochastic volatility asset price model in which the volatility is the absolute value of a continuous Gaussian process with arbitrary prescribed mean and covariance. By exhibiting a Karhunen-Loève expansion for the integrated variance, and using sharp estimates of the density of a general second-chaos variable, we derive asymptotics for the asset price density for large or small values of the variable, and study the wing behavior of the implied volatility in these models. Our main result provides explicit expressions for the first …


Model-Free Variable Screening, Sparse Regression Analysis And Other Applications With Optimal Transformations, Qiming Huang Aug 2016

Model-Free Variable Screening, Sparse Regression Analysis And Other Applications With Optimal Transformations, Qiming Huang

Open Access Dissertations

Variable screening and variable selection methods play important roles in modeling high dimensional data. Variable screening is the process of filtering out irrelevant variables, with the aim to reduce the dimensionality from ultrahigh to high while retaining all important variables. Variable selection is the process of selecting a subset of relevant variables for use in model construction. The main theme of this thesis is to develop variable screening and variable selection methods for high dimensional data analysis. In particular, we will present two relevant methods for variable screening and selection under a unified framework based on optimal transformations.

In the …


Maximum Empirical Likelihood Estimation In U-Statistics Based General Estimating Equations, Lingnan Li Aug 2016

Maximum Empirical Likelihood Estimation In U-Statistics Based General Estimating Equations, Lingnan Li

Open Access Dissertations

In the first part of this thesis, we study maximum empirical likelihood estimates (MELE's) in U-statistics based general estimating equations (UGEE's). Our technical maneuver is the jackknife empirical likelihood (JEL) approach. We give the local uniform asymptotic normality condition for the log-JEL for UGEE's. We derive the estimating equations for finding MELE's and provide their asymptotic normality. We obtain easy MELE's which have less computational burden than the usual MELE's and can be easily implemented using existing software. We investigate the use of side information of the data to improve efficiency. We exhibit that the MELE's are fully efficient, and …


Risk Estimation Toward A Natural History Model For Low Grade Glioma Patients, Anh Thi Hoang Pham May 2016

Risk Estimation Toward A Natural History Model For Low Grade Glioma Patients, Anh Thi Hoang Pham

Graduate Theses and Dissertations

Glioma is a common type of primary brain tumor that represents 28% of all brain tumors and 80% of malignant tumors. According to a recent study by the Centers for Disease Control and Prevention (CDC), gliomas account for 53%, 35% and 29% of all brain tumors (68%, 74% and 81% of malignant brain tumors) among children (aged 0-14), teenagers (aged 15-19) and young adults, respectively. Gliomas are often diagnosed through radiological imaging and histopathology. There are two main groups of gliomas following World Health Organization’s classification: Low grade gliomas (LGG), or grade I and II gliomas; and high grade gliomas …


Spread Trading In Corn Futures Market, Ryan D. Napier May 2016

Spread Trading In Corn Futures Market, Ryan D. Napier

Graduate Theses and Dissertations

The non-linear relationship between old crop – new crop year spreads in corn futures market and stock-to-use (S-U) ratios published by the United States Department of Agriculture is analyzed. Using a non-linear logarithmic smooth transition regression (LSTR) model, we capture asymmetric market behaviors in high and low S-U regimes. Capturing this relationship and understanding the non-linear aspects of the relationship is of interest of grain merchandizers and speculators in the market. A spread trading strategy is simulated for the sample period, January 1985 through April 2015, to determine if the non-linear relationship is a profitable arbitrage opportunity in the market.


Identification Of Biomarkers For The Overall Survival Of Ovarian Cancer Patients, Kristi Mai May 2016

Identification Of Biomarkers For The Overall Survival Of Ovarian Cancer Patients, Kristi Mai

Graduate Theses and Dissertations

Rapid advance in sequencing technology has led to genome-wide analysis of genetic and epigenetic features simultaneously, making it possible to understand the biological mechanisms underlying cancer initiation and progression. However, how to identify important prognostic features poses a great challenge for both statistical modeling and computing. In this thesis, a network-based approach is applied to the Cancer Genome Atlas (TCGA) ovarian cancer data to identify important genes related to the overall survival of ovarian cancer patients. In the first step, a stepwise correlation-based selector is used to reduce the dimensionality of TCGA data, by filtering out a large number of …


Statistical Modeling Of The Temporal Dynamics In A Large Scale-Citation Network, Luis Javier Ek Jr. May 2016

Statistical Modeling Of The Temporal Dynamics In A Large Scale-Citation Network, Luis Javier Ek Jr.

Graduate Theses and Dissertations

Citation Networks of papers are vast networks that grow over time. The manner or the form a citation network grows is not entirely a random process, but a preferential attachment relationship; highly cited papers are more likely to be cited by newly published papers. The result is a network whose degree distribution follows a power law. This growth of citation network of papers will be modeled with a negative binomial regression coupled with logistic growth and/or Cauchy distribution curve. Then a Barabasi-Albert model, based on the negative binomial models, and a combination of the Dirichlet distribution and multinomial will be …


User-Centric Workload Analytics: Towards Better Cluster Management, Suhas Raveesh Javagal Apr 2016

User-Centric Workload Analytics: Towards Better Cluster Management, Suhas Raveesh Javagal

Open Access Theses

Effective management of computing clusters and providing a high quality customer support is not a trivial task. Due to rise of community clusters there is an increase in the diversity of workloads and the user demographic. Owing to this and privacy concerns of the user, it is difficult to identify performance issues, reduce resource wastage and understand implicit user demands. In this thesis, we perform in-depth analysis of user behavior, performance issues, resource usage patterns and failures in the workloads collected from a university-wide community cluster and two clusters maintained by a government lab. We also introduce a set of …


Implementation And Validation Of A Probabilistic Open Source Baseball Engine (Posbe): Modeling Hitters And Pitchers, Rhett Tracy Schaefer Apr 2016

Implementation And Validation Of A Probabilistic Open Source Baseball Engine (Posbe): Modeling Hitters And Pitchers, Rhett Tracy Schaefer

Open Access Theses

This manuscript details the implementation and validation of an open source probabilistic baseball engine (POSBE) that focuses on the hitter and pitcher model of the simulation. The simulation produced outcomes that parallel those observed in actual professional Major League Baseball games. The observed data were taken from the nineteen games played between the New York Yankees (NYY) and Boston Red Sox (BOS) during the 2015 season. The potential hitter/pitcher outcomes of interest were singles, doubles, triples, homeruns, walks, hit-by-pitch, and strikeouts. The nineteen game series was simulated 1000 times, resulting in a total of 19,000 simulations. The eighteen hitters and …


A Flexible And Versatile Framework For Statistical Design And Analysis Of Quantitative Mass Spectrometry-Based Proteomic Experiments, Meena Choi Feb 2016

A Flexible And Versatile Framework For Statistical Design And Analysis Of Quantitative Mass Spectrometry-Based Proteomic Experiments, Meena Choi

Open Access Dissertations

Quantitative mass spectrometry (MS)-based proteomics is an indispensable technology for biological and clinical research. As the proteomics field grows, MS-based proteomic workflows are becoming more complex and diverse. The accuracy and the throughput of the MS measurements and of the signal processing tools dramatically increased. However, many existing statistical tools and workflows have not followed the technological development. Therefore, there is a need for flexible statistical tools, which reflect diverse and complex workflows, are computationally efficient for large datasets, and maximize the reproducibility of the results.

We propose a family of linear mixed effects models, and a split-plot view of …


Calorimetry And Body Composition Research In Broilers And Broiler Breeders, Justina Victoria Caldas Cueva Dec 2015

Calorimetry And Body Composition Research In Broilers And Broiler Breeders, Justina Victoria Caldas Cueva

Graduate Theses and Dissertations

Indirect calorimetry to study heat production (HP) and dual energy X-ray absorptiometry (DEXA) for body composition (BC) are powerful techniques to study the dynamics of energy and protein utilization in poultry. The first two chapters present the BC (dry matter, lean, protein, and fat, bone mineral, calcium and phosphorus) of modern broilers from 1 – 60 d of age analyzed by chemical analysis and DEXA. DEXA has been validated for precision, standardized for position, and equations and validations developed for chickens under two different feeding levels. These equations are unique to the machine and software in use. Research in broilers …


Probabilistic Graphical Modeling On Big Data, Ming-Hua Chung Dec 2015

Probabilistic Graphical Modeling On Big Data, Ming-Hua Chung

Graduate Theses and Dissertations

The rise of Big Data in recent years brings many challenges to modern statistical analysis and modeling. In toxicogenomics, the advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on key word search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past …


Analytical Comparison Of Contrasting Approaches To Estimating Competing Risks Models, Brian Stephen Rickard May 2015

Analytical Comparison Of Contrasting Approaches To Estimating Competing Risks Models, Brian Stephen Rickard

Graduate Theses and Dissertations

Survival analysis is a commonly used tool in many fields but has seen little use in education research despite a common number of research questions for which it is well suited. Researchers often use logistic regression instead; however, this omits useful information. In research on retention and graduation for example, the timing of the event is an important piece of information omitted when using logistic regression. A simulation study was conducted to evaluate four methods of analyzing competing risks survival data, Cox proportional hazards regression, Weibull regression, Fine and Gray's Method, and Cox proportional hazards regression with frailty. College student …


Sensitivity Of Mixed Models To Computational Algorithms Of Time Series Data, Gunaime Nevine Apr 2015

Sensitivity Of Mixed Models To Computational Algorithms Of Time Series Data, Gunaime Nevine

Doctoral Dissertations

Statistical analysis is influenced by implementation of the algorithms used to execute the computations associated with various statistical techniques. Over many years; very important criteria for model comparison has been studied and examined, and two algorithms on a single dataset have been performed numerous times. The goal of this research is not comparing two or more models on one dataset, but comparing models with numerical algorithms that have been used to solve them on the same dataset.

In this research, different models have been broadly applied in modeling and their contrasting which are affected by the numerical algorithms in different …


Online Detection Of Outliers And Structural Breaks Using Sequential Monte Carlo Methods, Richard Wanjohi Dec 2014

Online Detection Of Outliers And Structural Breaks Using Sequential Monte Carlo Methods, Richard Wanjohi

Graduate Theses and Dissertations

Outliers and structural breaks occur quite frequently in time series data. Whereas outliers often contain valuable information

about the process under study, they are known to have serious negative impact on statistical data analysis. Most obvious effect is model misspecification and biased parameter estimation which results in wrong conclusions and inaccurate predictions. Structural time series consist of underlying features such as level, slope, cycles or seasonal components. Structural breaks are permanent disruptions of one or more of these components and might be a signal of serious changes in the observed process.

Detecting outliers and estimating the location of structural breaks …


Application Of Bayesian Networks In Consumer Service Industry, Yuan Gao Oct 2014

Application Of Bayesian Networks In Consumer Service Industry, Yuan Gao

Open Access Theses

Gao, Yuan. M.S.I.E., Purdue University. December 2014. Application of Bayesian Networks in Consumer Service Industry. Major professor: Vincent G. Duffy The purpose of the present study is to explore the application of Bayesian networks in the consumer service industry to model causal relationships within complex risk factor structures using aggregate data. An analysis of the Hawaii tourism market was conducted to find out how visitor characteristics affect their behavior and experience as consumers during the trips, and influence the tourism market outcomes represented by measurable factors. Two hypotheses were proposed regarding the use of aggregate data and the influence of …


Poisson Distributed Individuals Control Charts With Optimal Limits, Negin Enayaty Ahangar May 2014

Poisson Distributed Individuals Control Charts With Optimal Limits, Negin Enayaty Ahangar

Graduate Theses and Dissertations

The conventional method used in attribute control charts is the Shewhart three sigma limits. The implicit assumption of the Normal distribution in this approach is not appropriate for skewed distributions such as Poisson, Geometric and Negative Binomial. Normal approximations perform poorly in the tail area of the these distributions. In this research, a type of attribute control chart is introduced to monitor the processes that provide count data. The economic objective of this chart is to minimize the cost of its errors which is determined by the designer. This objective is a linear function of type I and II errors. …


Performance Modeling And Optimization Techniques For Heterogeneous Computing, Supada Laosooksathit Jan 2014

Performance Modeling And Optimization Techniques For Heterogeneous Computing, Supada Laosooksathit

Doctoral Dissertations

Since Graphics Processing Units (CPUs) have increasingly gained popularity amoung non-graphic and computational applications, known as General-Purpose computation on GPU (GPGPU), CPUs have been deployed in many clusters, including the world's fastest supercomputer. However, to make the most efficiency from a GPU system, one should consider both performance and reliability of the system.

This dissertation makes four major contributions. First, the two-level checkpoint/restart protocol that aims to reduce the checkpoint and recovery costs with a latency hiding strategy in a system between a CPU (Central Processing Unit) and a GPU is proposed. The experimental results and analysis reveals some benefits, …