Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 12100

Full-Text Articles in Physical Sciences and Mathematics

"A Comparison Of Variable Selection Methods Using Bootstrap Samples From Environmental Metal Mixture Data", Paul-Yvann Djamen 4785403, Paul-Yvann Djamen Jul 2020

"A Comparison Of Variable Selection Methods Using Bootstrap Samples From Environmental Metal Mixture Data", Paul-Yvann Djamen 4785403, Paul-Yvann Djamen

Mathematics & Statistics ETDs

In this thesis, I studied a newly developed variable selection method SODA, and three customarily used variable selection methods: LASSO, Elastic net, and Random forest for environmental mixture data. The motivating datasets have neuro-developmental status as responses and metal measurements and demographic variables as covariates. The challenges for variable selections include (1) many measured metal concentrations are highly correlated, (2) there are many possible ways of modeling interactions among the metals, (3) the relationships between the outcomes and explanatory variables are possibly nonlinear, (4) the signal to noise ratio in the real data may be low. To compare these methods ...


At The Interface Of Algebra And Statistics, Tai-Danae Bradley Jun 2020

At The Interface Of Algebra And Statistics, Tai-Danae Bradley

All Dissertations, Theses, and Capstone Projects

This thesis takes inspiration from quantum physics to investigate mathematical structure that lies at the interface of algebra and statistics. The starting point is a passage from classical probability theory to quantum probability theory. The quantum version of a probability distribution is a density operator, the quantum version of marginalizing is an operation called the partial trace, and the quantum version of a marginal probability distribution is a reduced density operator. Every joint probability distribution on a finite set can be modeled as a rank one density operator. By applying the partial trace, we obtain reduced density operators whose diagonals ...


Integrated Multiple Mediation Analysis: A Robustness–Specificity Trade-Off In Causal Structure, An-Shun Tai, Sheng-Hsuan Lin May 2020

Integrated Multiple Mediation Analysis: A Robustness–Specificity Trade-Off In Causal Structure, An-Shun Tai, Sheng-Hsuan Lin

Harvard University Biostatistics Working Paper Series

Recent methodological developments in causal mediation analysis have addressed several issues regarding multiple mediators. However, these developed methods differ in their definitions of causal parameters, assumptions for identification, and interpretations of causal effects, making it unclear which method ought to be selected when investigating a given causal effect. Thus, in this study, we construct an integrated framework, which unifies all existing methodologies, as a standard for mediation analysis with multiple mediators. To clarify the relationship between existing methods, we propose four strategies for effect decomposition: two-way, partially forward, partially backward, and complete decompositions. This study reveals how the direct and ...


Waiting-Time Paradox In 1922, Naoki Masuda, Takayuki Hiraoka May 2020

Waiting-Time Paradox In 1922, Naoki Masuda, Takayuki Hiraoka

Northeast Journal of Complex Systems (NEJCS)

We present an English translation and discussion of an essay that a Japanese physicist, Torahiko Terada, wrote in 1922. In the essay, he described the waiting-time paradox, also called the bus paradox, which is a known mathematical phenomenon in queuing theory, stochastic processes, and modern temporal network analysis. He also observed and analyzed data on Tokyo City trams to verify the relevance of the waiting-time paradox to busy passengers in Tokyo at the time. This essay seems to be one of the earliest documentations of the waiting-time paradox in a sufficiently scientific manner.


On Statistical Significance Of Discriminant Function Coefficients, Tolulope T. Sajobi, Gordon H. Fick, Lisa M. Lix May 2020

On Statistical Significance Of Discriminant Function Coefficients, Tolulope T. Sajobi, Gordon H. Fick, Lisa M. Lix

Journal of Modern Applied Statistical Methods

Discriminant function coefficients are useful for describing group differences and identifying variables that distinguish between groups. Test procedures were compared based on asymptotically approximations, empirical, and exact distributions for testing hypotheses about discriminant function coefficients. These tests are useful for assessing variable importance in multivariate group designs.


Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen May 2020

Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen

Statistical Science Theses and Dissertations

In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.


Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible ...


Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma May 2020

Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma

Department of Education Policy and Leadership Theses and Dissertations

The estimation of parameters in structural equation modeling (SEM) has been primarily based on the maximum likelihood estimator (MLE) and relies on large sample asymptotic theory. Consequently, the results of the SEM analyses with small samples may not be as satisfactory as expected. In contrast, informative priors typically do not require a large sample, and they may be helpful for improving the quality of estimates in the SEM models with small samples. However, the role of informative priors in the Bayesian SEM has not been thoroughly studied to date. Given the limited body of evidence, specifying effective informative priors remains ...


Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden May 2020

Statistical Inference Of Adaptation At Multiple Genomic Scales Using Supervised Classification And A Hidden Markov Model, Lauren A. Sugden

Biology and Medicine Through Mathematics Conference

No abstract provided.


Support Vector Machine-Based Modified Sp Statistic For Subset Selection With Non-Normal Error Terms, Shivaji Shripati Desai, D N. Kashid May 2020

Support Vector Machine-Based Modified Sp Statistic For Subset Selection With Non-Normal Error Terms, Shivaji Shripati Desai, D N. Kashid

Journal of Modern Applied Statistical Methods

Support vector machine (SVM) is used for estimation of regression parameters to modify the sum of cross products (Sp). It works well for some nonnormal error distributions. The performance of existing robust methods and the modified Sp is evaluated through simulated and real data. The results show the performance of the modified Sp is good.


Recurrence Relations For Marginal And Joint Moment Generating Functions Of Topp-Leone Generated Exponential Distribution Based On Record Values And Its Characterization, Zaki Anwar, Neetu Gupta, Mohd Akram Raza Khan, Qazi Azhad Jamal May 2020

Recurrence Relations For Marginal And Joint Moment Generating Functions Of Topp-Leone Generated Exponential Distribution Based On Record Values And Its Characterization, Zaki Anwar, Neetu Gupta, Mohd Akram Raza Khan, Qazi Azhad Jamal

Journal of Modern Applied Statistical Methods

The exact expressions and some recurrence relations are derived for marginal and joint moment generating functions of kth lower record values from Topp-Leone Generated (TLG) Exponential distribution. This distribution is characterized by using the recurrence relation of the marginal moment generating function of kth lower record values.


An Improved Two Independent-Samples Randomization Test For Single-Case Ab-Type Intervention Designs: A 20-Year Journey, Joel R. Levin, John M. Ferron, Boris S. Gafurov May 2020

An Improved Two Independent-Samples Randomization Test For Single-Case Ab-Type Intervention Designs: A 20-Year Journey, Joel R. Levin, John M. Ferron, Boris S. Gafurov

Journal of Modern Applied Statistical Methods

Detailed is a 20-year arduous journey to develop a statistically viable two-phase (AB) single-case two independent-samples randomization test procedure. The test is designed to compare the effectiveness of two different interventions that are randomly assigned to cases. In contrast to the unsatisfactory simulation results produced by an earlier proposed randomization test, the present test consistently exhibited acceptable Type I error control under various design and effect-type configurations, while at the same time possessing adequate power to detect moderately sized intervention-difference effects. Selected issues, applications, and a multiple-baseline extension of the two-sample test are discussed.


A New Exponential Approach For Reducing The Mean Squared Errors Of The Estimators Of Population Mean Using Conventional And Non-Conventional Location Parameters, Housila P. Singh, Anita Yadav May 2020

A New Exponential Approach For Reducing The Mean Squared Errors Of The Estimators Of Population Mean Using Conventional And Non-Conventional Location Parameters, Housila P. Singh, Anita Yadav

Journal of Modern Applied Statistical Methods

Classes of ratio-type estimators t (say) and ratio-type exponential estimators te (say) of the population mean are proposed, and their biases and mean squared errors under large sample approximation are presented. It is the class of ratio-type exponential estimators te provides estimators more efficient than the ratio-type estimators.


First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc May 2020

First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc

Electronic Theses and Dissertations

This study examined student perceptions and experiences of an introductory Computer Science course at the University of Maine; COS 125: Introduction to Problem Solving Using Computer Programs. It also explored the pathways that students pursue after taking COS 125, depending on their success in the course, and their motivation to persist. Through characterizing student populations and their performance in their first semester in the Computer Science program, they can be placed into one of three categories that explain their path; a “continuer” (passed COS 125 and decided to stay in the major), a “persister” (did not pass COS 125 and ...


Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim May 2020

Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim

Engineering and Applied Science Theses & Dissertations

Electronic Health Records (EHR) are widely adopted and used throughout healthcare systems and are able to collect and store longitudinal information data that can be used to describe patient phenotypes. From the underlying data structures used in the EHR, discrete data can be extracted and analyzed to improve patient care and outcomes via tasks such as risk stratification and prospective disease management. Temporality in EHR is innately present given the nature of these data, however, and traditional classification models are limited in this context by the cross- sectional nature of training and prediction processes. Finding temporal patterns in EHR is ...


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Dissertations and Theses in Statistics

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The ...


Ragweed And Sagebrush Pollen Can Distinguish Between Vegetation Types At Broad Spatial Scales, Hannah M. Carroll, Alan D. Wanamaker, Lynn G. Clark, Brian J. Wilsey May 2020

Ragweed And Sagebrush Pollen Can Distinguish Between Vegetation Types At Broad Spatial Scales, Hannah M. Carroll, Alan D. Wanamaker, Lynn G. Clark, Brian J. Wilsey

Ecology, Evolution and Organismal Biology Publications

Patterns of vegetation distribution at regional to subcontinental scales can inform understanding of climate. Delineating ecoregion boundaries over geologic time is complicated by the difficulty of distinguishing between prairie types at broad spatial scales using the pollen record. Pollen ratios are sometimes employed to distinguish between vegetation types, although their applicability is often limited to a geographic range. The Neotoma Paleoecology Database offers an unparalleled opportunity to synthesize a large number of pollen datasets. Ambrosia (ragweed) is a genus of mesic‐adapted species sensitive to summer moisture. Artemisia (sagebrush, wormwood, mugwort) is a genus of dry‐mesic‐adapted species resilient ...


The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte May 2020

The Effects Of Zoledronate And Sleep Deprivation On The Distal Femur Trabecular Thickness Of Ovariectomized Rats: Application Of Different Statistical Methods, Erin Nolte

Student Scholar Symposium Abstracts and Posters

Osteoporosis is a disease that causes the degradation of bone, leading to an increased risk of fracture. 1 in 3 women over the age of 50 will be affected by Osteoporosis. This study aims to understand how bone is affected by sleep deprivation in estrogen-deficient rats, and how Zoledronate might negate the inimical effects of sleep deprivation on bone. As bone mineral density (BMD) is a crude evaluation of the architectural changes seen in Osteoporosis, trabecular thickness may serve as a better single evaluation of bone health. 31 Wistar female rats were ovariectomized and separated into 4 random groups. The ...


Gait Characterization Using Computer Vision Video Analysis, Martha T. Gizaw May 2020

Gait Characterization Using Computer Vision Video Analysis, Martha T. Gizaw

Undergraduate Honors Theses

The World Health Organization reports that falls are the second-leading cause of accidental death among senior adults around the world. Currently, a research team at William & Mary’s Department of Kinesiology & Health Sciences attempts to recognize and correct aging-related factors that can result in falling. To meet this goal, the members of that team videotape walking tests to examine individual gait parameters of older subjects. However, they undergo a slow, laborious process of analyzing video frame by video frame to obtain such parameters. This project uses computer vision software to reconstruct walking models from residents of an independent living retirement ...


'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst May 2020

'Lmshapemaker': Utilizing The 'Rmapshaper' R Package To Modify Shapefiles For Use In Linked Micromap Plots, Braden D. Probst

All Graduate Theses and Dissertations

In order to effectively create map-based visualizations, some map modifications need to be conducted to ensure the map is readable and interpretable. There are several issues that need to be addressed to achieve this. The boundaries of a country may be overly complex which is particularly true with coastal areas of countries. Regions may be small and not seen in the final plot, as is the case with many capital cities in the world’s countries such as Washington D.C. and the Federal District of Mexico City. In other countries, regions may geographically lie far away from the rest ...


Modeling Movement: A Machine-Learning Approach To Track Migration Routes After Displacement, Ethan Harrison May 2020

Modeling Movement: A Machine-Learning Approach To Track Migration Routes After Displacement, Ethan Harrison

Undergraduate Honors Theses

Over the past decade, the number of individuals internally displaced by conflict (IDPs) has reached unprecedented levels. Humanitarian actors and first-responders face persistent information gaps in meeting the needs of these populations. Specifically, they face challenges in understanding where and how IDPs move after they are displaced, which is necessary to locate them in conflict-affected situations and provide them with life-saving assistance. In this paper, I propose a framework, using established machine-learning methods, to forecast the migration routes of these displaced populations (Chapter 1). In a case study of displacement in Yemen, my models predict 80% of IDPs' migration routes ...


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and ...


Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever Apr 2020

Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever

HCA Healthcare Journal of Medicine

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.


Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill Apr 2020

Rdc Data Alternatives: Conducting Research During Covid-19, Kristi Thompson, Elizabeth Hill

Western Libraries Presentations

Recent physical distancing protocols pertaining to the COVID-19 Pandemic have meant that RDC researchers need to find alternatives ways of carrying out their research. The Real Time Remote Access (RTRA) program offers one alternative way to access confidential Statistics Canada data. Other options include using the Statistics Canada public use files and analyzing data from other sources.

The presenters, data librarians from Western Libraries will discuss the differences between the data that can be accessed through the RTRA the RDC. RTRA data is a very useful option for some types of questions but also has some important limitations. We will ...


A Simulation Study On Increasing Capture Periods In Bayesian Closed Population Capture-Recapture Models With Heterogeneity, Ross M. Gosky, Joel Sanqui Apr 2020

A Simulation Study On Increasing Capture Periods In Bayesian Closed Population Capture-Recapture Models With Heterogeneity, Ross M. Gosky, Joel Sanqui

Journal of Modern Applied Statistical Methods

Capture-Recapture models are useful in estimating unknown population sizes. A common modeling challenge for closed population models involves modeling unequal animal catchability in each capture period, referred to as animal heterogeneity. Inference about population size N is dependent on the assumed distribution of animal capture probabilities in the population, and that different models can fit a data set equally well but provide contradictory inferences about N. Three common Bayesian Capture-Recapture heterogeneity models are studied with simulated data to study the prevalence of contradictory inferences is in different population sizes with relatively low capture probabilities, specifically at different numbers of capture ...


Logistic Growth Modeling With Markov Chain Monte Carlo Estimation, Jaehwa Choi, Jinsong Chen, Jeffrey R. Harring Apr 2020

Logistic Growth Modeling With Markov Chain Monte Carlo Estimation, Jaehwa Choi, Jinsong Chen, Jeffrey R. Harring

Journal of Modern Applied Statistical Methods

A new growth modeling approach is proposed to can fit inherently nonlinear (i.e., logistic) function without constraint nor reparameterization. A simulation study is employed to investigate the feasibility and performance of a Markov chain Monte Carlo method within Bayesian estimation framework to estimate a fully random version of a logistic growth curve model under manipulated conditions such as the number and timing of measurement occasions and sample sizes.


Forecasting San Francisco Bay Area Rapid Transit (Bart) Ridership, Swee K. Chew, Alec Lepe, Aaron Tomkins, Peter Scheirer Apr 2020

Forecasting San Francisco Bay Area Rapid Transit (Bart) Ridership, Swee K. Chew, Alec Lepe, Aaron Tomkins, Peter Scheirer

SMU Data Science Review

In this paper, we present a forecasting analysis of the San Francisco Bay Area Rapid Transit (BART) ridership data utilizing a number of different time series methods. BART is a major public transportation system in the Bay Area and it relies heavily on its riders' fares; having models that generate accurate ridership numbers better enables the agency to project revenue and help manage future expenses. For our time series modeling, we utilized autoregressive integrated moving average (ARIMA), deep neural networks (DNN), state space models, and long short-term memory (LSTM) to predict monthly ridership. As there is such a wide range ...


484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair Apr 2020

484— Modeling Social Distancing Methods And Their Effectiveness In Combating The Spread Of Ebola, Rachel Fair

GREAT Day

Ebola Virus Disease (EVD) is a rare but severe disease that is transmitted among humans through direct-contact with, and close proximity to, infected bodily fluids. From 2014-16, West Africa experienced the largest Ebola outbreak ever recorded, infecting over 28,000 people, and killing over 11,000. Although the symptoms of EVD are treatable, the disease can be extremely deadly, with an average of 50% EVD cases resulting in fatality. In areas where healthcare is scarce and vaccinations are not readily available, the practices of social distancing and self-quarantining have been shown to be highly effective in combating the spread of ...


483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal Apr 2020

483— Effectiveness Of Mmr Vaccination In Orthodox Jewish Neighborhoods, Meenu Mundackal

GREAT Day

Measles is a highly contagious disease, where large outbreaks arise by direct contact between susceptible (unvaccinated) and infectious individuals. Many Orthodox Jewish neighborhoods were affected by measles from 2018-2019. To quantify the vaccination effort on this susceptible population, a retrospective analysis was used to study the NYC and Rockland County populations using a differential equations model. A subsequent model, known as a realistically-structured network model, studied only the NYC population, in relation to typical household size. Vaccination strategies were applied to three cohorts: unvaccinated family members, members with 1 prior MMR dose, and members with 2 prior MMR doses. The ...


465— Modeling Vaccine Efficacy For Tuberculosis In A Prison Population, Kaitlyn Mundackal Apr 2020

465— Modeling Vaccine Efficacy For Tuberculosis In A Prison Population, Kaitlyn Mundackal

GREAT Day

Tuberculosis is a highly contagious disease and is particularly problematic in confined communities such as prisons. I simulated how Tuberculosis moves through a prison population and tested how much vaccination effort is needed to control its spread. To explore this, I tested adding ever increasing numbers of randomly placed edges in a network and determined the size of the largest component. Afterwards, I removed edges in the model using two different methods, one illustrating if the edges were removed randomly and the other starting with prisoners that had the most connections, to simulate the effect of vaccination. My results show ...