Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Statistical Models (15)
- Computer Sciences (13)
- Statistical Methodology (12)
- Multivariate Analysis (9)
- Engineering (8)
-
- Mathematics (8)
- Artificial Intelligence and Robotics (7)
- Life Sciences (7)
- Probability (7)
- Social and Behavioral Sciences (7)
- Applied Mathematics (6)
- Data Science (5)
- Biostatistics (4)
- Business (4)
- Education (4)
- Environmental Sciences (4)
- Longitudinal Data Analysis and Time Series (4)
- Other Computer Sciences (4)
- Statistical Theory (4)
- Animal Sciences (3)
- Bioinformatics (3)
- Categorical Data Analysis (3)
- Ecology and Evolutionary Biology (3)
- Educational Assessment, Evaluation, and Research (3)
- Forest Sciences (3)
- Medicine and Health Sciences (3)
- Zoology (3)
- Institution
- Keyword
-
- Pure sciences (7)
- Applied sciences (5)
- Machine learning (4)
- Algorithms (2)
- Bayesian (2)
-
- Climate change (2)
- Deep learning (2)
- High-performance computing (2)
- Image segmentation (2)
- Machine Learning (2)
- Model selection (2)
- Optimization (2)
- Spatial ecology (2)
- Statistics (2)
- Time Series (2)
- Accelerated lifetime (1)
- Active contours without edges (1)
- Affymetrix microarray images (1)
- Akaike Information Criterion (ATC) (1)
- Application failures (1)
- Artificial Intelligence (1)
- Artificial intelligence (1)
- Assessment (1)
- Association measure (1)
- Asymmetrical (1)
- Autometrics (1)
- Bayesian Regression (1)
- Bias correction (1)
- Bicluster Analysis (1)
- Big Data (1)
Articles 1 - 30 of 47
Full-Text Articles in Applied Statistics
Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako
Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako
Doctoral Dissertations
This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …
Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero
Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero
Doctoral Dissertations
With the continuous improvements in biological data collection, new techniques are needed to better understand the complex relationships in genomic and other biological data sets. Explainable Artificial Intelligence (X-AI) techniques like Iterative Random Forest (iRF) excel at finding interactions within data, such as genomic epistasis. Here, the introduction of new methods to mine for these complex interactions is shown in a variety of scenarios. The application of iRF as a method for Genomic Wide Epistasis Studies shows that the method is robust in finding interacting sets of features in synthetic data, without requiring the exponentially increasing computation time of many …
Sparse Model Selection Using Information Complexity, Yaojin Sun
Sparse Model Selection Using Information Complexity, Yaojin Sun
Doctoral Dissertations
This dissertation studies and uses the application of information complexity to statistical model selection through three different projects. Specifically, we design statistical models that incorporate sparsity features to make the models more explanatory and computationally efficient.
In the first project, we propose a Sparse Bridge Regression model for variable selection when the number of variables is much greater than the number of observations if model misspecification occurs. The model is demonstrated to have excellent explanatory power in high-dimensional data analysis through numerical simulations and real-world data analysis.
The second project proposes a novel hybrid modeling method that utilizes a mixture …
Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan
Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan
Doctoral Dissertations
Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …
Using Generalizability And Rasch Measurement Theory To Ensure Rigorous Measurement In An International Development Education Evaluation, Louise Bahry
Doctoral Dissertations
Between the United States and Great Britain, over 30 billion USD was spent in 2018 on international aid, over a billion of which is dedicated to education programs alone. Recently, there has been increased attention on the rigorous evaluation of aid-funded programs, moving beyond counting outputs to the measurement of educational impact. The current study uses two methodological approaches (Generalizability (Brennan, 1992, 2001) and Rasch Measurement Theory (Andrich, 1978; Rasch, 1980; Wright & Masters, 1982) to analyze data from math and literacy assessments, and self-report surveys used in an international evaluation of an educational initiative in the Democratic Republic of …
Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang
Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang
Doctoral Dissertations
In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …
Motor Control-Based Assessment Of Therapy Effects In Individuals Post-Stroke: Implications For Prediction Of Response And Subject-Specific Modifications, Ashley Rice
Doctoral Dissertations
Producing a coordinated motion such as walking is, at its root, the result of healthy communication pathways between the central nervous system and the musculoskeletal system. The central nervous system produces an electrical signal responsible for the excitation of a muscle, and the musculoskeletal system contains the necessary equipment for producing a movement-driving force to achieve a desired motion. Motor control refers to the ability an individual has to produce a desired motion, and the complexity of motor control is a mathematical concept stemming from how the electrical signals from the central nervous system translate to muscle activations. Exercising a …
Geometric Representation Learning, Luke Vilnis
Geometric Representation Learning, Luke Vilnis
Doctoral Dissertations
Vector embedding models are a cornerstone of modern machine learning methods for knowledge representation and reasoning. These methods aim to turn semantic questions into geometric questions by learning representations of concepts and other domain objects in a lower-dimensional vector space. In that spirit, this work advocates for density- and region-based representation learning. Embedding domain elements as geometric objects beyond a single point enables us to naturally represent breadth and polysemy, make asymmetric comparisons, answer complex queries, and provides a strong inductive bias when labeled data is scarce. We present a model for word representation using Gaussian densities, enabling asymmetric entailment …
Root Stage Distributions And Their Importance In Plant-Soil Feedback Models, Tyler Poppenwimer
Root Stage Distributions And Their Importance In Plant-Soil Feedback Models, Tyler Poppenwimer
Doctoral Dissertations
Roots are fundamental to PSFs, being a key mediator of these feedbacks by interacting with and affecting the soil environment and soil microbial communities. However, most PSF models aggregate roots into a homogeneous component or only implicitly simulate roots via functions. Roots are not homogeneous and root traits (nutrient and water uptake, turnover rate, respiration rate, mycorrhizal colonization, etc.) vary with age, branch order, and diameter. Trait differences among a plant’s roots lead to variation in root function and roots can be disaggregated according to their function. The impact on plant growth and resource cycling of changes in the distribution …
Bayesian Topological Machine Learning, Christopher A. Oballe
Bayesian Topological Machine Learning, Christopher A. Oballe
Doctoral Dissertations
Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …
Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren
Interacting Effects Of Climate And Biotic Factors On Mesocarnivore Distribution And Snowshoe Hare Demography Along The Boreal-Temperate Ecotone, Alexej P. Siren
Doctoral Dissertations
The motivation of my dissertation research was to understand the influence of climate and biotic factors on range limits with a focus on winter-adapted species, including the Canada lynx (Lynx canadensis), American marten (Martes americana), and snowshoe hare (Lepus americanus). I investigated range dynamics along the boreal-temperate ecotone of the northeastern US. Through an integrative literature review, I developed a theoretical framework building from existing thinking on range limits and ecological theory. I used this theory for my second chapter to evaluate direct and indirect causes of carnivore range limits in the northeastern US, …
Latent Class Models For At-Risk Populations, Shuaimin Kang
Latent Class Models For At-Risk Populations, Shuaimin Kang
Doctoral Dissertations
Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …
Allocative Poisson Factorization For Computational Social Science, Aaron Schein
Allocative Poisson Factorization For Computational Social Science, Aaron Schein
Doctoral Dissertations
Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …
Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder
Population Viability And Connectivity Of The Federally Threatened Eastern Indigo Snake In Central Peninsular Florida, Javan Bauder
Doctoral Dissertations
Understanding the factors influencing the likelihood of persistence of real-world populations requires both an accurate understanding of the traits and behaviors of individuals within those populations (e.g., movement, habitat selection, survival, fecundity, dispersal) but also an understanding of how those traits and behaviors are influenced by landscape features. The federally threatened eastern indigo snake (EIS, Drymarchon couperi) has declined throughout its range primarily due to anthropogenically-induced habitat loss and fragmentation making spatially-explicit assessments of population viability and connectivity essential for understanding its current status and directing future conservation efforts. The primary goal of my dissertation was to understand how …
Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest
Essays In Financial Economics: Announcement Effects In Fixed Income Markets, James J. Forest
Doctoral Dissertations
ABSTRACT ESSAYS IN FINANCIAL ECONOMICS: ANNOUNCEMENT EFFECTS IN FIXED INCOME MARKETS PHD IN FINANCE MAY 2018 JAMES J FOREST B.A., FRAMINGHAM STATE UNIVERSITY M.S., NORTHEASTERN UNIVERSITY Ph.D., UNIVERSITY OF MASSACHUSETTS – AMHERST Directed by: Professor Hossein B. Kazemi This dissertation demonstrates the use of empirical techniques for dealing with modeling issues that arise when analyzing announcement effects in fixed income markets. It describes empirical challenges in achieving unbiased and efficient parameter estimates and shows the importance of modelling a wide range of macroeconomic announcement effects to avoid omitted variable bias. Employing techniques common in Macroeconomics, financial market researchers are better …
Deep Energy-Based Models For Structured Prediction, David Belanger
Deep Energy-Based Models For Structured Prediction, David Belanger
Doctoral Dissertations
We introduce structured prediction energy networks (SPENs), a flexible frame- work for structured prediction. A deep architecture is used to define an energy func- tion over candidate outputs and predictions are produced by gradient-based energy minimization. This deep energy captures dependencies between labels that would lead to intractable graphical models, and allows us to automatically discover discrim- inative features of the structured output. Furthermore, practitioners can explore a wide variety of energy function architectures without having to hand-design predic- tion and learning methods for each model. This is because all of our prediction and learning methods interact with the energy …
Spatiotemporal Subspace Feature Tracking By Mining Discriminatory Characteristics, Richard D. Appiah
Spatiotemporal Subspace Feature Tracking By Mining Discriminatory Characteristics, Richard D. Appiah
Doctoral Dissertations
Recent advancements in data collection technologies have made it possible to collect heterogeneous data at complex levels of abstraction, and at an alarming pace and volume. Data mining, and most recently data science seek to discover hidden patterns and insights from these data by employing a variety of knowledge discovery techniques. At the core of these techniques is the selection and use of features, variables or properties upon which the data were acquired to facilitate effective data modeling. Selecting relevant features in data modeling is critical to ensure an overall model accuracy and optimal predictive performance of future effects. The …
Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang
Statistical Methods On Risk Management Of Extreme Events, Zijing Zhang
Doctoral Dissertations
The goal of the dissertation is the investigation of financial risk analysis methodologies, using the schemes for extreme value modeling as well as techniques from copula modeling. Extreme value theory is concerned with probabilistic and statistical questions re- lated to unusual behavior or rare events. The subject has a rich mathematical theory and also a long tradition of applications in a variety of areas. We are interested in its application in risk management, with a focus on estimating and forcasting the Value-at-Risk of financial time series data. Extremal data are inherently scarce, thus making inference challenging. In order to obtain …
Inference In Networking Systems With Designed Measurements, Chang Liu
Inference In Networking Systems With Designed Measurements, Chang Liu
Doctoral Dissertations
Networking systems consist of network infrastructures and the end-hosts have been essential in supporting our daily communication, delivering huge amount of content and large number of services, and providing large scale distributed computing. To monitor and optimize the performance of such networking systems, or to provide flexible functionalities for the applications running on top of them, it is important to know the internal metrics of the networking systems such as link loss rates or path delays. The internal metrics are often not directly available due to the scale and complexity of the networking systems. This motivates the techniques of inference …
A Study Of Mathematics Achievement, Placement, And Graduation Of Engineering Students, Sara Hahler Blazek
A Study Of Mathematics Achievement, Placement, And Graduation Of Engineering Students, Sara Hahler Blazek
Doctoral Dissertations
The purpose of this study was to determine how background knowledge impacts freshmen engineering students' success at Louisiana Tech University in terms of grades in two different freshman classes and graduation. To determine what factors impact students, three different studies were implemented. The first study used linear regression to analyze which demographic and academic variables significantly impacted freshman math and engineering courses. Using regression discontinuity, the second study determined if the university's placement requirement for Pre-Calculus was appropriate. The final study analyzed factors that impact graduation for engineering students as well as other disciplines to determine which significant variables were …
Development And Validation Of The Statistics Assessment Of Graduate Students, Dammika Lakmal Walpitage
Development And Validation Of The Statistics Assessment Of Graduate Students, Dammika Lakmal Walpitage
Doctoral Dissertations
This study developed the Statistics Assessment of Graduate Students (SAGS) instrument, and established its preliminary item characteristics, reliability, and validity evidence. Even though there are limited number of assessments available for measuring different aspects of statistical cognition, these previously available assessments have numerous limitations. The SAGS instrument was developed using Rasch modeling approach to create a new measure of statistical research methodology knowledge of graduate students in education and other behavioral and social sciences. Thirty-five multiple-choice questions were written with stems representing applied research situations and response options distinguishing between appropriate use of various statistical tests or procedures. A focus …
Variable Selection Via Penalized Regression And The Genetic Algorithm Using Information Complexity, With Applications For High-Dimensional -Omics Data, Tyler J. Massaro
Variable Selection Via Penalized Regression And The Genetic Algorithm Using Information Complexity, With Applications For High-Dimensional -Omics Data, Tyler J. Massaro
Doctoral Dissertations
This dissertation is a collection of examples, algorithms, and techniques for researchers interested in selecting influential variables from statistical regression models. Chapters 1, 2, and 3 provide background information that will be used throughout the remaining chapters, on topics including but not limited to information complexity, model selection, covariance estimation, stepwise variable selection, penalized regression, and especially the genetic algorithm (GA) approach to variable subsetting.
In chapter 4, we fully develop the framework for performing GA subset selection in logistic regression models. We present advantages of this approach against stepwise and elastic net regularized regression in selecting variables from a …
Advanced Sequential Monte Carlo Methods And Their Applications To Sparse Sensor Network For Detection And Estimation, Kai Kang
Doctoral Dissertations
The general state space models present a flexible framework for modeling dynamic systems and therefore have vast applications in many disciplines such as engineering, economics, biology, etc. However, optimal estimation problems of non-linear non-Gaussian state space models are analytically intractable in general. Sequential Monte Carlo (SMC) methods become a very popular class of simulation-based methods for the solution of optimal estimation problems. The advantages of SMC methods in comparison with classical filtering methods such as Kalman Filter and Extended Kalman Filter are that they are able to handle non-linear non-Gaussian scenarios without relying on any local linearization techniques. In this …
Wind Power Capacity Value Metrics And Variability: A Study In New England, Frederick W. Letson
Wind Power Capacity Value Metrics And Variability: A Study In New England, Frederick W. Letson
Doctoral Dissertations
Capacity value is the contribution of a power plant to the ability of the power system to meet high demand. As wind power penetration in New England, and worldwide, increases so does the importance of identifying the capacity contribution made by wind power plants. It is critical to accurately characterize the capacity value of these wind power plants and the variability of the capacity value over the long term. This is important in order to avoid the cost of keeping extra power plants operational while still being able to cover the demand for power reliably. This capacity value calculation is …
Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang
Variable Selection In Single Index Varying Coefficient Models With Lasso, Peng Wang
Doctoral Dissertations
Single index varying coefficient model is a very attractive statistical model due to its ability to reduce dimensions and easy-of-interpretation. There are many theoretical studies and practical applications with it, but typically without features of variable selection, and no public software is available for solving it. Here we propose a new algorithm to fit the single index varying coefficient model, and to carry variable selection in the index part with LASSO. The core idea is a two-step scheme which alternates between estimating coefficient functions and selecting-and-estimating the single index. Both in simulation and in application to a Geoscience dataset, we …
Evaluating The Effects Of Standardized Patient Care Pathways On Clinical Outcomes, Anna V. Romanova
Evaluating The Effects Of Standardized Patient Care Pathways On Clinical Outcomes, Anna V. Romanova
Doctoral Dissertations
The main focus of this study is to create a standardized approach to evaluating the impact of the patient care pathways across all major disease categories and key outcome measures in a hospital setting when randomized clinical trials are not feasible. Toward this goal I identify statistical methods, control factors, and adjustments that can correct for potential confounding in observational studies. I investigate the efficiency of existing bias correction methods under varying conditions of imbalanced samples through a Monte Carlo simulation. The simulation results are then utilized in a case study for one of the largest primary diagnosis areas, chronic …
Sensitivity Of Mixed Models To Computational Algorithms Of Time Series Data, Gunaime Nevine
Sensitivity Of Mixed Models To Computational Algorithms Of Time Series Data, Gunaime Nevine
Doctoral Dissertations
Statistical analysis is influenced by implementation of the algorithms used to execute the computations associated with various statistical techniques. Over many years; very important criteria for model comparison has been studied and examined, and two algorithms on a single dataset have been performed numerous times. The goal of this research is not comparing two or more models on one dataset, but comparing models with numerical algorithms that have been used to solve them on the same dataset.
In this research, different models have been broadly applied in modeling and their contrasting which are affected by the numerical algorithms in different …
Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae
Incorporating Boltzmann Machine Priors For Semantic Labeling In Images And Videos, Andrew Kae
Doctoral Dissertations
Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field …
Predicting High-Stakes Tests Of Math Achievement Using A Group-Administered Rti Instrument: Validating Skills Measured By The Monitoring Instructional Responsiveness: Math, Jeremy Thomas Coles
Predicting High-Stakes Tests Of Math Achievement Using A Group-Administered Rti Instrument: Validating Skills Measured By The Monitoring Instructional Responsiveness: Math, Jeremy Thomas Coles
Doctoral Dissertations
Three universal screeners and nine progress monitoring probes from the Monitoring Instructional Responsiveness: Math (MIR:M), a silent, group-administered math assessment designed for implementation with an RTI Model, were administered to 223 fifth-grade students. The growth parameters of the overall MIR:M composite and two global composites (math calculation and math reasoning) identified significant variation in student growth, within significant linear and quadratic trajectories. However, there were significant differences in the nature of the growth trajectories that have applied educational implications. In addition, growth parameters across the three composites provided significant predictive potential when using the Tennessee Comprehensive Assessment Program (TCAP) Achievement …
Impacts Of Climate Change On The Evolution Of The Electrical Grid, Melissa Ree Allen
Impacts Of Climate Change On The Evolution Of The Electrical Grid, Melissa Ree Allen
Doctoral Dissertations
Maintaining interdependent infrastructures exposed to a changing climate requires understanding 1) the local impact on power assets; 2) how the infrastructure will evolve as the demand for infrastructure changes location and volume and; 3) what vulnerabilities are introduced by these changing infrastructure topologies. This dissertation attempts to develop a methodology that will a) downscale the climate direct effect on the infrastructure; b) allow population to redistribute in response to increasing extreme events that will increase under climate impacts; and c) project new distributions of electricity demand in the mid-21st century.
The research was structured in three parts. The first …