Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

12,631 Full-Text Articles 19,908 Authors 6,911,751 Downloads 286 Institutions

All Articles in Statistics and Probability

Faceted Search

12,631 full-text articles. Page 102 of 434.

Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang 2020 Louisiana State University

Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang

LSU Doctoral Dissertations

Large volumes of temporal event data, such as online check-ins and electronic records of hospital admissions, are becoming increasingly available in a wide variety of applications including healthcare analytics, smart cities, and social network analysis. Those temporal events are often asynchronous, interdependent, and exhibiting self-exciting properties. For example, in the patient's diagnosis events, the elevated risk exists for a patient that has been recently at risk. Machine learning that leverages event sequence data can improve the prediction accuracy of future events and provide valuable services. For example, in e-commerce and network traffic diagnosis, the analysis of user activities can be …


Decision Tree For Predicting The Party Of Legislators, Afsana Mimi 2020 CUNY New York City College of Technology

Decision Tree For Predicting The Party Of Legislators, Afsana Mimi

Publications and Research

The motivation of the project is to identify the legislators who voted frequently against their party in terms of their roll call votes using Office of Clerk U.S. House of Representatives Data Sets collected in 2018 and 2019. We construct a model to predict the parties of legislators based on their votes. The method we used is Decision Tree from Data Mining. Python was used to collect raw data from internet, SAS was used to clean data, and all other calculations and graphical presentations are performed using the R software.


A Statistical Analysis Of The Unm Facets Design Identity & Beliefs Survey Data, Clarissa A. Sorensen-Unruh 2020 University of New Mexico - Main Campus

A Statistical Analysis Of The Unm Facets Design Identity & Beliefs Survey Data, Clarissa A. Sorensen-Unruh

Mathematics & Statistics ETDs

The NSF-funded FACETS (Formation of Accomplished Chemical Engineers for Transforming Society, NSF Award 1623105) grant aims to transform the undergraduate engineering experience in the Department of Chemical and Biological Engineering at the University of New Mexico to address attrition within engineering majors, especially among underserved populations (Brainard & Carlin, 1998). The UNM FACETS Design Identity & Beliefs survey, an assessment tool used as part of the research of the grant, generated the dataset used in this study. I performed several different statistical analyses on the dataset, including confirmatory factor analysis (CFA), principal component analysis (PCA), and cluster analysis. The …


Can Auxiliary Information Improve Rasch Estimation At Small Sample Sizes?, Derek Sauder 2020 James Madison University

Can Auxiliary Information Improve Rasch Estimation At Small Sample Sizes?, Derek Sauder

Dissertations, 2020-current

The Rasch model is commonly used to calibrate multiple choice items. However, the sample sizes needed to estimate the Rasch model can be difficult to attain (e.g., consider a small testing company trying to pretest new items). With small sample sizes, auxiliary information besides the item responses may improve estimation of the item parameters. The purpose of this study was to determine if incorporating item property information (i.e., characteristics of the items related to item difficulty) in a random effects linear logistic test model (RE-LLTM) would improve estimation of item difficulty. A simulation study was conducted that varied sample size, …


Statistical Analysis Of Land Cover Conversion Trends In Northwest Ohio, Chaska McGowan 2020 Bowling Green State University

Statistical Analysis Of Land Cover Conversion Trends In Northwest Ohio, Chaska Mcgowan

Honors Projects

Agricultural land in the U.S. is abundant but not infinite. Change in cropland impacts national and local economies and the natural environment. The Black Swamp Conservancy (BSC), a non-profit land trust in Perrysburg, Ohio, is committed to preserving agricultural and natural lands in the Northwest Ohio region for future generations. This project was designed in collaboration with the BSC to illustrate the spatial distribution of land cover change within their sixteen-county service area in Northwest Ohio and to find a list of factors associated with land cover change in the region. The primary data source was the National Land Cover …


First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. LeBlanc 2020 University of Maine

First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc

Electronic Theses and Dissertations

This study examined student perceptions and experiences of an introductory Computer Science course at the University of Maine; COS 125: Introduction to Problem Solving Using Computer Programs. It also explored the pathways that students pursue after taking COS 125, depending on their success in the course, and their motivation to persist. Through characterizing student populations and their performance in their first semester in the Computer Science program, they can be placed into one of three categories that explain their path; a “continuer” (passed COS 125 and decided to stay in the major), a “persister” (did not pass COS 125 and …


Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters 2020 James Madison University

Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters

Masters Theses, 2020-current

American ginseng (Panax quinquefolius) is a well-known and sought-after medicinal plant native to North America that is facing increased threat of extinction due to overharvesting, herbivory, and habitat loss. Species distribution and habitat suitability models may be valuable to landowners interested in sustainable harvest or to institutions interested in the conservation and restoration of the species. With unequal sampling efforts across a region of interest, it is likely that some locations with appropriate habitat may be misrepresented in model predictions. This study refined a state-derived species distribution model for ginseng through increased sampling effort across the Cumberland Plateau …


Splitting Up A Complex Mess: The Effectiveness Of Statistical Analysis On Delimiting Species Complexes, Sara N. Schoen 2020 James Madison University

Splitting Up A Complex Mess: The Effectiveness Of Statistical Analysis On Delimiting Species Complexes, Sara N. Schoen

Masters Theses, 2020-current

Recent studies have highlighted a need for more refined tools in species delimitation. This is especially true when considering diversity within species complexes, where members are morphologically similar and traditional tools have thus far failed to provide clearly defined boundaries between species. This project seeks to refine our traditional tools of species delimitation and apply new tools to the challenges created by species complexes. The focus organisms of this study are the anurans of the Limnonectes kuhlii complex. This species complex comprises more than 25 species of stream frogs from Southeast Asia. Traditionally, morphometrics (particularly linear measures) has been the …


Propensity Score Matching And Generalized Boosted Modeling In The Context Of Model Misspecification: A Simulation Study, Briana G. Craig 2020 James Madison University

Propensity Score Matching And Generalized Boosted Modeling In The Context Of Model Misspecification: A Simulation Study, Briana G. Craig

Masters Theses, 2020-current

In the absence of random assignment, researchers must consider the impact of selection bias – pre-existing covariate differences between groups due to differences among those entering into treatment and those otherwise unable to participate. Propensity score matching (PSM) and generalized boosted modeling (GBM) are two quasi-experimental pre-processing methods that strive to reduce the impact of selection bias before analyzing a treatment effect. PSM and GBM both examine a treatment and comparison group and either match or weight members of those groups to create new, balanced groups. The new, balanced groups theoretically can then be used as a proxy for the …


Attack And Defense In Security Analytics, Yiyun Zhou 2020 Kennesaw State University

Attack And Defense In Security Analytics, Yiyun Zhou

Doctor of Data Science and Analytics Dissertations

The security problem has gained increasing awareness due to the various kinds of global threats. Security analytics is the process of using streaming data acquisition, collection, and artificial intelligence algorithms for security monitoring and threat disclosure. In this dissertation work, we utilize practical data-driven security analytics to identify the potential threat and explore the robustness of the machine learning model. We focus on two aspects: (1) Security Analytics: utilize machine learning and statistical analytics tools to identify and resolve the threat in real life, such as cybersecurity, abnormal activities. (2) Analytic Security: Explore the security issues of the machine learning …


Biomarker Development For Use In Regression Calibration, Yiwen Zhang 2020 University of Wisconsin-Milwaukee

Biomarker Development For Use In Regression Calibration, Yiwen Zhang

Theses and Dissertations

It is challenging to alleviate systematic measurement error in self-reported data when studying the associations between dietary intakes and chronic disease risk. The regression calibration method has been used for this purpose when an objectively measured biomarker that satisfies a classical measurement error assumption is available. The requirement for the biomarkers needs to be quite strong and very few dietary intake biomarkers as such have been developed. Feeding studies provide opportunities to develop such potential biomarkers using regression methods with a much larger variety of dietary variables. However, the measurement error for the resulting biomarkers will be of Berkson type …


Infant Mortality In The United States: Socioeconomic Factors Predicting Infant Survival In Late Neo-Natal And Post Neo-Natal Infants From Birth Certificate Data, Mark Brunk-Grady 2020 University of Wisconsin-Milwaukee

Infant Mortality In The United States: Socioeconomic Factors Predicting Infant Survival In Late Neo-Natal And Post Neo-Natal Infants From Birth Certificate Data, Mark Brunk-Grady

Theses and Dissertations

According to the Centers for Disease Control and Prevention, the infant mortality rate in the United States in 2018 was 5.6 deaths per 1000 live births. Infant mortality is defined as a child being born alive but dying before their first birthday. This study aimed to determine if adding socioeconomic factors to traditional predictive survival models improved the predictive power in terms of survival for late and post neonatal infants. Secondly, this study looked to develop a risk score to and predict which mothers would be classified as “High” or “Low” risk for infant death.

Data were analyzed from a …


Physical Therapy Nontreatment Events With Primary Physical Therapist, Stephen Johnson 2020 University of Nevada, Las Vegas

Physical Therapy Nontreatment Events With Primary Physical Therapist, Stephen Johnson

UNLV Theses, Dissertations, Professional Papers, and Capstones

Background: Physical therapy improves prognosis reduces stay and is generally helpful in aiding recovery from a wide range of ailments. Nontreatment rates occur for multiple reasons and are also related to the personalities of physical therapists.

Methods: We used data from a research project involving physical therapy at an acute care facility in our community. Our study focused on the retrospectively determined primary physical therapist for each patient. We used the chi-squared tests to compare nontreatment rates between days of the week and disease type and the reasons for nontreatment events. Repeated-measure models were used to evaluate the effect of …


Predicting The Federal Funds Rate, Danielle Herzberg 2020 University of Lynchburg

Predicting The Federal Funds Rate, Danielle Herzberg

Undergraduate Theses and Capstone Projects

This thesis examines various economic indicators to select those that are the most significant in a predictive model of the Effective Federal Funds Rate. Three different statistical models were built to show how monetary policy changed over time. These three models frame the last economic downturns in the United States; the tech bubble, the housing bubble, and the Great Recession. Many iterations of statistical regressions were conducted in order to achieve the final three models that highlight variables with the highest levels of significance. It is important to note the economic data has high levels of autocorrelation, and that these …


Motivational Predictors Of Academic Risk-Taking, Danette Dee Barber 2020 University of Nevada, Las Vegas

Motivational Predictors Of Academic Risk-Taking, Danette Dee Barber

UNLV Theses, Dissertations, Professional Papers, and Capstones

Students benefit when they are willing to engage in optimal challenges (Clifford, 1991). Engagement in challenges, however, comes with academic risks, as failure may be a result. This study investigated motivational factors, including expectancy, subjective task value, mastery goal orientation, and performance avoidance goal orientation as predictors of achievement-related outcomes, including course grade and academic risk-taking. Data were collected from 317 university students enrolled in education classes. Students were given a reading passage and asked to choose questions to answer based on the passage. Students who chose harder questions were categorized as taking more risk. Students also answered questions about …


Smoothed Quantiles For Claim Frequency Models, With Applications To Risk Measurement, Ponmalar Suruliraj Ratnam 2020 University of Wisconsin-Milwaukee

Smoothed Quantiles For Claim Frequency Models, With Applications To Risk Measurement, Ponmalar Suruliraj Ratnam

Theses and Dissertations

Statistical models for the claim severity and claim frequency variables are routinely constructed and utilized by actuaries. Typical applications of such models include identification of optimal deductibles for selected loss elimination ratios, pricing of contract layers, determining credibility factors, risk and economic capital measures, and evaluation of effects of inflation, market trends and other quantities arising in insurance. While the actuarial literature on the severity models is extensive and rapidly growing, that for the claim frequency models lags behind. One of the reasons for such a gap is that various actuarial metrics do not possess ``nice'' statistical properties for the …


Age At Migration And The Risk Of Psychotic Disorders: A Systematic Review And Meta-Analysis., Kelly K. Anderson, Jordan Edwards 2020 Western University

Age At Migration And The Risk Of Psychotic Disorders: A Systematic Review And Meta-Analysis., Kelly K. Anderson, Jordan Edwards

Epidemiology and Biostatistics Publications

OBJECTIVE: To conduct a systematic review and meta-analysis of the existing evidence on the association between age at migration and the risk of psychotic disorders.

METHODS: Observational studies were eligible for inclusion if they presented data on the association between age at migration and the risk of psychotic disorders among first-generation migrant groups. We used two random effects meta-analyses to pool effect estimates for each stratum of age at migration relative to (i) a native-born reference category and (ii) the youngest age stratum (0 to 2 years).

RESULTS: Ten studies met inclusion criteria, and five were included in the meta-analysis. …


Using Saddlepoint Approximations And Likelihood-Based Methods To Conduct Statistical Inference For The Mean Of The Beta Distribution, Bryn Brakefield 2020 Stephen F. Austin State University

Using Saddlepoint Approximations And Likelihood-Based Methods To Conduct Statistical Inference For The Mean Of The Beta Distribution, Bryn Brakefield

Electronic Theses and Dissertations

The prevalence of conducting statistical inference for the mean of the beta distribution has been rising in various fields of academic research, such as in immunology that analyzes proportions of rare cell population subsets. For our purposes, we will address this statistical inference problem by using likelihood-based applications to hypothesis testing, along with a relatively new statistical method called saddlepoint approximations. Through simulation work, we will compare the performance of these statistical procedures and provide both the statistical and scientific communities with recommendations on best practices.


Fitting Of Lotka-Volterra Model For Coupled Population Growth Data Through Least-Squares Estimation Of Parameters, Jessica Ann Harter 2020 University of Wisconsin-Milwaukee

Fitting Of Lotka-Volterra Model For Coupled Population Growth Data Through Least-Squares Estimation Of Parameters, Jessica Ann Harter

Theses and Dissertations

The population of two types of bacteria found in the Gulf Coast of Florida, V.chagasii and V. harveyi, can be described by the Lotka-Voltera competition model. Using data gathered in experiments conducted by Bury and Pickett (2015), we take a different approach to find parameter estimates using numerical methods in R. In particular, we find a numerical solution to the coupled set of ODEs and minimize the sum of squared errors in order to obtain the optimal parameter estimates that will fit the data best. In order to get a sense of accuracy of these parameter estimates, we use bootstrap …


An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard 2020 The University of Southern Mississippi

An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard

Master's Theses

Between 1997 and 2011, The National Marine Fisheries Service conducted 50 depletion experiments to estimate survey gear efficiency and stock density for Atlantic surfclam (Spisula solidissima) and ocean quahog (Arctica islandica) populations using commercial hydraulic dredges. The Patch Model was formulated to estimate gear efficiency and organism density from the data. The range of efficiencies estimated is substantial, leading to uncertainty in the application of these estimates in stock assessment. Analysis of depletion experiment simulations showed that uncertainty in the estimates of gear efficiency from depletion experiments was reduced by higher numbers of dredge tows per experiment, more tow overlap …


Digital Commons powered by bepress