Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Purdue University

Keyword
Publication Year
Publication
Publication Type

Articles 31 - 60 of 77

Full-Text Articles in Physical Sciences and Mathematics

Functional Regression Models In The Frame Work Of Reproducing Kernel Hilbert Space, Simeng Qu Dec 2016

Functional Regression Models In The Frame Work Of Reproducing Kernel Hilbert Space, Simeng Qu

Open Access Dissertations

The aim of this thesis is to systematically investigate some functional regression models for accurately quantifying the effect of functional predictors. In particular, three functional models are studied: functional linear regression model, functional Cox model, and function-on-scalar model. Both theoretical properties and numerical algorithms are studied in depth. The new models find broad applications in many areas.

For the functional linear regression model, the focus is on testing the nullity of the slope function, and a generalized likelihood ratio test based on easily implementable data-driven estimate is proposed. The quality of the test is measured by the minimal distance between …


Divide And Recombined For Large Complex Data: Nonparametric-Regression Modelling Of Spatial And Seasonal-Temporal Time Series, Xiaosu Tong Dec 2016

Divide And Recombined For Large Complex Data: Nonparametric-Regression Modelling Of Spatial And Seasonal-Temporal Time Series, Xiaosu Tong

Open Access Dissertations

In the first chapter of this dissertation, I briefly introduce one type of nonparametric regression method, namely local polynomial regression, followed by emphasis on one specific application of loess on time series decomposition, called Seasonal Trend Loess (STL). The chapter is closed by the introduction of D\&R; (Divide and Recombined) statistical framework. Data can be divided into subsets, each of which is applied with a statistical analysis method. This is an embarrassing parallel procedure since there is no communication between each subset. Then the analysis result for each subset are combined together to be the final analysis outcome for the …


Nondestructive Testing And Structural Health Monitoring Based On Adams And Svm Techniques, Gang Jiang, Yi Ming Deng, Ji Tai Niu Oct 2016

Nondestructive Testing And Structural Health Monitoring Based On Adams And Svm Techniques, Gang Jiang, Yi Ming Deng, Ji Tai Niu

The 8th International Conference on Physical and Numerical Simulation of Materials Processing

No abstract provided.


Design Optimization Of A Stochastic Multi-Objective Problem: Gaussian Process Regressions For Objective Surrogates, Juan Sebastian Martinez, Piyush Pandita, Rohit K. Tripathy, Ilias Bilionis Aug 2016

Design Optimization Of A Stochastic Multi-Objective Problem: Gaussian Process Regressions For Objective Surrogates, Juan Sebastian Martinez, Piyush Pandita, Rohit K. Tripathy, Ilias Bilionis

The Summer Undergraduate Research Fellowship (SURF) Symposium

Multi-objective optimization (MOO) problems arise frequently in science and engineering situations. In an optimization problem, we want to find the set of input parameters that generate the set of optimal outputs, mathematically known as the Pareto frontier (PF). Solving the MOO problem is a challenge since expensive experiments can be performed only a constrained number of times and there is a limited set of data to work with, e.g. a roll-to-roll microwave plasma chemical vapor deposition (MPCVD) reactor for manufacturing high quality graphene. State-of-the-art techniques, e.g. evolutionary algorithms; particle swarm optimization, require a large amount of observations and do not …


Passive Visual Analytics Of Social Media Data For Detection Of Unusual Events, Kush Rustagi, Junghoon Chae Aug 2016

Passive Visual Analytics Of Social Media Data For Detection Of Unusual Events, Kush Rustagi, Junghoon Chae

The Summer Undergraduate Research Fellowship (SURF) Symposium

Now that social media sites have gained substantial traction, huge amounts of un-analyzed valuable data are being generated. Posts containing images and text have spatiotemporal data attached as well, having immense value for increasing situational awareness of local events, providing insights for investigations and understanding the extent of incidents, their severity, and consequences, as well as their time-evolving nature. However, the large volume of unstructured social media data hinders exploration and examination. To analyze such social media data, the S.M.A.R.T system provides the analyst with an interactive visual spatiotemporal analysis and spatial decision support environment that assists in evacuation planning …


Controlling For Confounding Network Properties In Hypothesis Testing And Anomaly Detection, Timothy La Fond Aug 2016

Controlling For Confounding Network Properties In Hypothesis Testing And Anomaly Detection, Timothy La Fond

Open Access Dissertations

An important task in network analysis is the detection of anomalous events in a network time series. These events could merely be times of interest in the network timeline or they could be examples of malicious activity or network malfunction. Hypothesis testing using network statistics to summarize the behavior of the network provides a robust framework for the anomaly detection decision process. Unfortunately, choosing network statistics that are dependent on confounding factors like the total number of nodes or edges can lead to incorrect conclusions (e.g., false positives and false negatives). In this dissertation we describe the challenges that face …


Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier Aug 2016

Learning From Data: Plant Breeding Applications Of Machine Learning, Alencar Xavier

Open Access Dissertations

Increasingly, new sources of data are being incorporated into plant breeding pipelines. Enormous amounts of data from field phenomics and genotyping technologies places data mining and analysis into a completely different level that is challenging from practical and theoretical standpoints. Intelligent decision-making relies on our capability of extracting from data useful information that may help us to achieve our goals more efficiently. Many plant breeders, agronomists and geneticists perform analyses without knowing relevant underlying assumptions, strengths or pitfalls of the employed methods. The study endeavors to assess statistical learning properties and plant breeding applications of supervised and unsupervised machine learning …


Extreme-Strike And Small-Time Asymptotics For Gaussian Stochastic Volatility Models, Xin Zhang Aug 2016

Extreme-Strike And Small-Time Asymptotics For Gaussian Stochastic Volatility Models, Xin Zhang

Open Access Dissertations

Asymptotic behavior of implied volatility is of our interest in this dissertation. For extreme strike, we consider a stochastic volatility asset price model in which the volatility is the absolute value of a continuous Gaussian process with arbitrary prescribed mean and covariance. By exhibiting a Karhunen-Loève expansion for the integrated variance, and using sharp estimates of the density of a general second-chaos variable, we derive asymptotics for the asset price density for large or small values of the variable, and study the wing behavior of the implied volatility in these models. Our main result provides explicit expressions for the first …


The Design And Statistical Analysis Of Single-Cell Rna-Sequencing Experiments, Faye H. Zheng Aug 2016

The Design And Statistical Analysis Of Single-Cell Rna-Sequencing Experiments, Faye H. Zheng

Open Access Dissertations

Next-generation DNA- and RNA-sequencing (RNA-seq) technologies have expanded rapidly in both throughput and accuracy within the last decade. The momentum continues as emerging techniques become increasingly capable of profiling molecular content at the level of individual cells. One goal of this research is to put forward best practices in the design of single-cell RNA-sequencing (scRNA-seq) experiments, specifically as it relates to choices regarding the trade-off between sequencing depth and sample size. In addition to general guidelines, an interactive tool is presented to aid researchers in making experiment-specific decisions that are informed by real data and practical constraints. Further, a new …


Model-Free Variable Screening, Sparse Regression Analysis And Other Applications With Optimal Transformations, Qiming Huang Aug 2016

Model-Free Variable Screening, Sparse Regression Analysis And Other Applications With Optimal Transformations, Qiming Huang

Open Access Dissertations

Variable screening and variable selection methods play important roles in modeling high dimensional data. Variable screening is the process of filtering out irrelevant variables, with the aim to reduce the dimensionality from ultrahigh to high while retaining all important variables. Variable selection is the process of selecting a subset of relevant variables for use in model construction. The main theme of this thesis is to develop variable screening and variable selection methods for high dimensional data analysis. In particular, we will present two relevant methods for variable screening and selection under a unified framework based on optimal transformations.

In the …


Maximum Empirical Likelihood Estimation In U-Statistics Based General Estimating Equations, Lingnan Li Aug 2016

Maximum Empirical Likelihood Estimation In U-Statistics Based General Estimating Equations, Lingnan Li

Open Access Dissertations

In the first part of this thesis, we study maximum empirical likelihood estimates (MELE's) in U-statistics based general estimating equations (UGEE's). Our technical maneuver is the jackknife empirical likelihood (JEL) approach. We give the local uniform asymptotic normality condition for the log-JEL for UGEE's. We derive the estimating equations for finding MELE's and provide their asymptotic normality. We obtain easy MELE's which have less computational burden than the usual MELE's and can be easily implemented using existing software. We investigate the use of side information of the data to improve efficiency. We exhibit that the MELE's are fully efficient, and …


Failure Of Surface Color Cues Under Natural Changes In Lighting, David H. Foster, Iván Marín-Franch May 2016

Failure Of Surface Color Cues Under Natural Changes In Lighting, David H. Foster, Iván Marín-Franch

MODVIS Workshop

Color allows us to effortlessly discriminate and identify surfaces and objects by their reflected light. Although the reflected spectrum changes with the illumination spectrum, cone photoreceptor signals can be transformed to give useful cues for surface color. But what happens when both the spectrum and the geometry of the illumination change, as with lighting from the sun and sky? Is it possible, as a matter of principle, to obtain reliable cues by processing cone signals alone? This question was addressed here by estimating the information provided by cone signals from time-lapse hyperspectral radiance images of five outdoor scenes under natural …


Is Metabolism Goal-Directed? Investigating The Validity Of Modeling Biological Systems With Cybernetic Control Via Omic Data, Frank T. Devilbiss Apr 2016

Is Metabolism Goal-Directed? Investigating The Validity Of Modeling Biological Systems With Cybernetic Control Via Omic Data, Frank T. Devilbiss

Open Access Dissertations

Cybernetic models are uniquely juxtaposed to other metabolic modeling frameworks in that they describe the time-dependent regulation of cellular reactions in terms of dynamic "metabolic goals." This approach contrasts starkly with purely mechanistic descriptions of metabolic regulation which seek to explain metabolic processes in high resolution — a clearly daunting undertaking. Over a span of three decades, cybernetic models have been used to predict metabolic phenomena ranging from resource consumption in mixed-substrate environments to intracellular reaction fluxes of intricate metabolic networks. While the cybernetic approach has been validated in its utility for the prediction of metabolic phenomena, its central feature, …


User-Centric Workload Analytics: Towards Better Cluster Management, Suhas Raveesh Javagal Apr 2016

User-Centric Workload Analytics: Towards Better Cluster Management, Suhas Raveesh Javagal

Open Access Theses

Effective management of computing clusters and providing a high quality customer support is not a trivial task. Due to rise of community clusters there is an increase in the diversity of workloads and the user demographic. Owing to this and privacy concerns of the user, it is difficult to identify performance issues, reduce resource wastage and understand implicit user demands. In this thesis, we perform in-depth analysis of user behavior, performance issues, resource usage patterns and failures in the workloads collected from a university-wide community cluster and two clusters maintained by a government lab. We also introduce a set of …


Implementation And Validation Of A Probabilistic Open Source Baseball Engine (Posbe): Modeling Hitters And Pitchers, Rhett Tracy Schaefer Apr 2016

Implementation And Validation Of A Probabilistic Open Source Baseball Engine (Posbe): Modeling Hitters And Pitchers, Rhett Tracy Schaefer

Open Access Theses

This manuscript details the implementation and validation of an open source probabilistic baseball engine (POSBE) that focuses on the hitter and pitcher model of the simulation. The simulation produced outcomes that parallel those observed in actual professional Major League Baseball games. The observed data were taken from the nineteen games played between the New York Yankees (NYY) and Boston Red Sox (BOS) during the 2015 season. The potential hitter/pitcher outcomes of interest were singles, doubles, triples, homeruns, walks, hit-by-pitch, and strikeouts. The nineteen game series was simulated 1000 times, resulting in a total of 19,000 simulations. The eighteen hitters and …


A Flexible And Versatile Framework For Statistical Design And Analysis Of Quantitative Mass Spectrometry-Based Proteomic Experiments, Meena Choi Feb 2016

A Flexible And Versatile Framework For Statistical Design And Analysis Of Quantitative Mass Spectrometry-Based Proteomic Experiments, Meena Choi

Open Access Dissertations

Quantitative mass spectrometry (MS)-based proteomics is an indispensable technology for biological and clinical research. As the proteomics field grows, MS-based proteomic workflows are becoming more complex and diverse. The accuracy and the throughput of the MS measurements and of the signal processing tools dramatically increased. However, many existing statistical tools and workflows have not followed the technological development. Therefore, there is a need for flexible statistical tools, which reflect diverse and complex workflows, are computationally efficient for large datasets, and maximize the reproducibility of the results.

We propose a family of linear mixed effects models, and a split-plot view of …


System-Wide Prediction Of General, All-Cause, Preventable Hospital Readmissions, Ken Musselman, Brandon Pope, Steve Witz, Zhiyi Tian, Lingsong Zhang, Linda Leon, Ann Davis Dec 2015

System-Wide Prediction Of General, All-Cause, Preventable Hospital Readmissions, Ken Musselman, Brandon Pope, Steve Witz, Zhiyi Tian, Lingsong Zhang, Linda Leon, Ann Davis

RCHE Publications

Existing studies of hospital readmissions typically focus on specific diagnoses, age groups, discharge dispositions, payer classes, or hospitals, and often use small samples. It is not clear how predictive models generated from such studies generalize across diseases, hospitals, or time periods. In this study, a logistic regression model of readmission risk within 30 days based on hospital administrative data was constructed and validated across hospitals and time periods. The hospitals included both general and specialty hospitals such as long-term care, women’s, and children’s hospitals. The administrative data included information on patient’s demographics, diagnoses, procedures, and discharge disposition. Derivation and validation …


Model Selection For Gaussian Mixture Models For Uncertainty Qualification, Yiyi Chen, Guang Lin, Xuan Liu Aug 2015

Model Selection For Gaussian Mixture Models For Uncertainty Qualification, Yiyi Chen, Guang Lin, Xuan Liu

The Summer Undergraduate Research Fellowship (SURF) Symposium

Clustering is task of assigning the objects into different groups so that the objects are more similar to each other than in other groups. Gaussian Mixture model with Expectation Maximization method is the one of the most general ways to do clustering on large data set. However, this method needs the number of Gaussian mode as input(a cluster) so it could approximate the original data set. Developing a method to automatically determine the number of single distribution model will help to apply this method to more larger context. In the original algorithm, there is a variable represent the weight of …


Image Segmentation Using Fuzzy-Spatial Taxon Cut, Lauren Barghout May 2015

Image Segmentation Using Fuzzy-Spatial Taxon Cut, Lauren Barghout

MODVIS Workshop

Images convey multiple meanings that depend on the context in which the viewer perceptually organizes the scene. This presents a problem for automated image segmentation, because it adds uncertainty to the process of selecting which objects to include or not include within a segment. I’ll discuss the implementation of a fuzzy-logic-natural-vision-processing engine that solves this problem by assuming the scene architecture prior to processing. The scene architecture, a standardized natural-scene-perception-taxonomy comprised of a hierarchy of nested spatial-taxons. Spatial-taxons are regions (pixel-sets) that are figure-like, in that they are perceived as having a contour, are either `thing-like', or a `group of …


Video Event Understanding With Pattern Theory, Fillipe Souza, Sudeep Sarkar, Anuj Srivastava, Jingyong Su May 2015

Video Event Understanding With Pattern Theory, Fillipe Souza, Sudeep Sarkar, Anuj Srivastava, Jingyong Su

MODVIS Workshop

We propose a combinatorial approach built on Grenander’s pattern theory to generate semantic interpretations of video events of human activities. The basic units of representations, termed generators, are linked with each other using pairwise connections, termed bonds, that satisfy predefined relations. Different generators are specified for different levels, from (image) features at the bottom level to (human) actions at the highest, providing a rich representation of items in a scene. The resulting configurations of connected generators provide scene interpretations; the inference goal is to parse given video data and generate high-probability configurations. The probabilistic structures are imposed using energies that …


Metacognition: Using Confidence Ratings For Type 2 And Type 1 Roc Curves, S A. Klein May 2015

Metacognition: Using Confidence Ratings For Type 2 And Type 1 Roc Curves, S A. Klein

MODVIS Workshop

In the past five years there has been a surge of renewed interest in metacognition ("thinking about thinking"). The typical experiment involves a binary judgment followed by a multilevel confidence rating. It is a confusing topic because the rating could be made either on one's confidence in the binary response (standard rating Type 1 ROC) or on one's confidence sorted by whether the response was correct (Type 2 ROC). Both are metacognition. After a few remarks on challenging aspects of the Type 2 approach, I will present some interesting results for Type 1 ROC for both memory and vision research. …


Binocular 3d Motion Perception As Bayesian Inference, Martin Lages, Suzanne Heron May 2015

Binocular 3d Motion Perception As Bayesian Inference, Martin Lages, Suzanne Heron

MODVIS Workshop

The human visual system encodes monocular motion and binocular disparity input before it is integrated into a single 3D percept. Here we propose a geometric-statistical model of human 3D motion perception that solves the aperture problem in 3D by assuming that (i) velocity constraints arise from inverse projection of local 2D velocity constraints in a binocular viewing geometry, (ii) noise from monocular motion and binocular disparity processing is independent, and (iii) slower motions are more likely to occur than faster ones. In two experiments we found that instantiation of this Bayesian model can explain perceived 3D line motion direction under …


Overcoming Uncertainty For Within-Network Relational Machine Learning, Joseph J. Pfeiffer Apr 2015

Overcoming Uncertainty For Within-Network Relational Machine Learning, Joseph J. Pfeiffer

Open Access Dissertations

People increasingly communicate through email and social networks to maintain friendships and conduct business, as well as share online content such as pictures, videos and products. Relational machine learning (RML) utilizes a set of observed attributes and network structure to predict corresponding labels for items; for example, to predict individuals engaged in securities fraud, we can utilize phone calls and workplace information to make joint predictions over the individuals. However, in large scale and partially observed network domains, missing labels and edges can significantly impact standard relational machine learning methods by introducing bias into the learning and inference processes. In …


Stability Of Machine Learning Algorithms, Wei Sun Apr 2015

Stability Of Machine Learning Algorithms, Wei Sun

Open Access Dissertations

In the literature, the predictive accuracy is often the primary criterion for evaluating a learning algorithm. In this thesis, I will introduce novel concepts of stability into the machine learning community. A learning algorithm is said to be stable if it produces consistent predictions with respect to small perturbation of training samples. Stability is an important aspect of a learning procedure because unstable predictions can potentially reduce users' trust in the system and also harm the reproducibility of scientific conclusions. As a prototypical example, stability of the classification procedure will be discussed extensively. In particular, I will present two new …


The Stability Of The Iris As A Biometric Modality, Benjamin Wright Petry Apr 2015

The Stability Of The Iris As A Biometric Modality, Benjamin Wright Petry

Open Access Theses

In this thesis, the question of the stability of a group of individual subjects' irises is examined and answered. This stability is examined in regards to the time scale of the month range. The covariate for this research was time. Images collected during one month of separation between captures were examined. The genuine and impostor scores for these images were calculated and then interpreted using the stability score index. This index produced a quantifiable value for the stability of iris match scores over the months of the examination. ^ Additionally, a new framework for collecting and analyzing time in biometrics …


Divide And Recombine For Large Complex Data: The Subset Likelihood Modeling Approach To Recombination, Philip Gautier Apr 2015

Divide And Recombine For Large Complex Data: The Subset Likelihood Modeling Approach To Recombination, Philip Gautier

Open Access Dissertations

Divide and recombine (D&R) is a statistical framework for the analysis of large complex data. The data are divided into subsets. Numeric and visualization methods, which collectively are analytic methods, are applied to each subset. For each analytic method, the outputs of the application of the method to the subsets are recombined. So each analytic method has associated with it a division method and a recombination method. Here we study D&R methods for likelihood-based model fitting. We introduce a notion of likelihood analysis and modeling. We divide the data and fit a likelihood model on each subset. The fitted model …


A Pure-Jump Market-Making Model For High-Frequency Trading, Chi Wai Law Apr 2015

A Pure-Jump Market-Making Model For High-Frequency Trading, Chi Wai Law

Open Access Dissertations

We propose a new market-making model which incorporates a number of realistic features relevant for high-frequency trading. In particular, we model the dependency structure of prices and order arrivals with novel self- and cross-exciting point processes. Furthermore, instead of assuming the bid and ask prices can be adjusted continuously by the market maker, we formulate the market maker's decisions as an optimal switching problem. Moreover, the risk of overtrading has been taken into consideration by allowing each order to have different size, and the market maker can make use of market orders, which are treated as impulse control, to get …


Spatial Analysis Of Passenger Vehicle Use And Ownership And Its Impact On The Sustainability Of Highway Infrastructure Funding, Matthew Volovski Apr 2015

Spatial Analysis Of Passenger Vehicle Use And Ownership And Its Impact On The Sustainability Of Highway Infrastructure Funding, Matthew Volovski

Open Access Dissertations

Across the United States, the sustainability of highway funding is at risk due to increasing need and uncertainty in the factors that drive revenue. Past studies on highway funding sustainability have identified that the root cause of changing highway revenue are the shifts in social demographics and economic characteristics. Unfortunately, from the revenue perspective (the focus of this dissertation), the ability of previous research to account for these factors has been rather limited in two ways; first, the inability to accurately assess current regional vehicle use (a typical prerequisite for statistical modeling of highway revenues) due to difficulties associated with …


Probabilistic Uncertainty Quantification And Experiment Design For Nonlinear Models: Applications In Systems Biology, Vu Cao Duy Thien Dinh Oct 2014

Probabilistic Uncertainty Quantification And Experiment Design For Nonlinear Models: Applications In Systems Biology, Vu Cao Duy Thien Dinh

Open Access Dissertations

Despite the ever-increasing interest in understanding biology at the system level, there are several factors that hinder studies and analyses of biological systems. First, unlike systems from other applied fields whose parameters can be effectively identified, biological systems are usually unidentifiable, even in the ideal case when all possible system outputs are known with high accuracy. Second, the presence of multivariate bifurcations often leads the system to behaviors that are completely different in nature. In such cases, system outputs (as function of parameters/inputs) are usually discontinuous or have sharp transitions across domains with different behaviors. Finally, models from systems biology …


Application Of Bayesian Networks In Consumer Service Industry, Yuan Gao Oct 2014

Application Of Bayesian Networks In Consumer Service Industry, Yuan Gao

Open Access Theses

Gao, Yuan. M.S.I.E., Purdue University. December 2014. Application of Bayesian Networks in Consumer Service Industry. Major professor: Vincent G. Duffy The purpose of the present study is to explore the application of Bayesian networks in the consumer service industry to model causal relationships within complex risk factor structures using aggregate data. An analysis of the Hawaii tourism market was conducted to find out how visitor characteristics affect their behavior and experience as consumers during the trips, and influence the tourism market outcomes represented by measurable factors. Two hypotheses were proposed regarding the use of aggregate data and the influence of …