Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

10929 Full-Text Articles 13991 Authors 1580531 Downloads 162 Institutions

All Articles in Statistics and Probability

Faceted Search

10929 full-text articles. Page 1 of 297.

Multilevel Models For Longitudinal Data, Aastha Khatiwada 2016 East Tennessee State University

Multilevel Models For Longitudinal Data, Aastha Khatiwada

Electronic Theses and Dissertations

Longitudinal data arise when individuals are measured several times during an ob- servation period and thus the data for each individual are not independent. There are several ways of analyzing longitudinal data when different treatments are com- pared. Multilevel models are used to analyze data that are clustered in some way. In this work, multilevel models are used to analyze longitudinal data from a case study. Results from other more commonly used methods are compared to multilevel models. Also, comparison in output between two software, SAS and R, is done. Finally a method consisting of fitting individual models for each ...


The Influence Of The Electric Supply Industry On Economic Growth In Less Developed Countries, Edward Richard Bee 2016 University of Southern Mississippi

The Influence Of The Electric Supply Industry On Economic Growth In Less Developed Countries, Edward Richard Bee

Dissertations

This study measures the impact that electrical outages have on manufacturing production in 135 less developed countries using stochastic frontier analysis and data from World Bank’s Investment Climate surveys. Outages of electricity, for firms with and without backup power sources, are the most frequently cited constraint on manufacturing growth in these surveys.

Outages are shown to reduce output below the production frontier by almost five percent in Africa and by a lower percentage in South Asia, Southeast Asia and the Middle East and North Africa. Production response to outages is quadratic in form. Outages also increase labor cost, reduce ...


Scalable Collaborative Targeted Learning For Large Scale And High-Dimensional Data, Cheng Ju, Susan Gruber, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. van der Laan 2016 Division of Biostatistics, University of California, Berkeley

Scalable Collaborative Targeted Learning For Large Scale And High-Dimensional Data, Cheng Ju, Susan Gruber, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The collaborative double robust targeted maximum likelihood estimator (C-TMLE) is an extension of targeted minimum loss-based estimators (TMLE) that pursues an optimal strategy for estimation of the nuisance parameter. The original implementation of C-TMLE algorithm uses a greedy forward stepwise selection procedure to construct a nested sequence of candidate nuisance parameter estimators. Cross-validation is then used to select the candidate that minimizes bias in the estimate of the target parameter, rather than basing selection on the fit of the nuisance parameter model. C-TMLE has exhibited superior relative performance in analyses of sparse data, but the time complexity of the algorithm ...


Propensity Score Prediction For Electronic Healthcare Dataset Using Super Learner And High-Dimensional Propensity Score Method, Cheng Ju, Mary Combs, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. van der Laan 2016 Division of Biostatistics, University of California, Berkeley

Propensity Score Prediction For Electronic Healthcare Dataset Using Super Learner And High-Dimensional Propensity Score Method, Cheng Ju, Mary Combs, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. To select the best algorithm for a given set of data we must therefore use cross-validation to compare several candidate algorithms. Super Learner (SL) is an ensemble learning algorithm that uses cross-validation to select among a "library" of candidate algorithms. The SL is not restricted to a single prediction algorithm, but uses the strengths of a variety of learning algorithms to adapt to different datasets.
While the SL has been shown to perform well in a number of settings, it has not been evaluated in large electronic ...


Tmle For Marginal Structural Models Based On An Instrument, Boriska Toth, Mark J. van der Laan 2016 University of California, Berkeley, Division of Biostatistics

Tmle For Marginal Structural Models Based On An Instrument, Boriska Toth, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider estimation of a causal effect of a possibly continuous treatment when treatment assignment is potentially subject to unmeasured confounding, but an instrumental variable is available. Our focus is on estimating heterogeneous treatment effects, so that the treatment effect can be a function of an arbitrary subset of the observed covariates. One setting where this framework is especially useful is with clinical outcomes. Allowing the causal dose-response curve to depend on a subset of the covariates, we define our parameter of interest to be the projection of the true dose-response curve onto a user-supplied working marginal structural model. We ...


Using A Data Quality Framework To Clean Data Extracted From The Electronic Health Record: A Case Study., Oliwier Dziadkowiec, Tiffany Callahan, Mustafa Ozkaynak, Blaine Reeder, John Welton 2016 University of Colorado, College of Nursing, Anschutz Medical Campus

Using A Data Quality Framework To Clean Data Extracted From The Electronic Health Record: A Case Study., Oliwier Dziadkowiec, Tiffany Callahan, Mustafa Ozkaynak, Blaine Reeder, John Welton

eGEMs (Generating Evidence & Methods to improve patient outcomes)

Objectives: Examine (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a dataset extracted from two EPIC databases; and (2) the differences in statistical parameter estimates on a dataset cleaned with the DQ framework and dataset not cleaned with the DQ framework.

Background: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large datasets for analysis, the results might be erroneous, which might affect clinical decision making or results of Comparative ...


Improving Precision By Adjusting For Baseline Variables In Randomized Trials With Binary Outcomes, Without Regression Model Assumptions, Jon Arni Steingrmisson, Daniel F. Hanley, Michael Rosenblum 2016 Johns Hopkins Bloomberg School of Public Health

Improving Precision By Adjusting For Baseline Variables In Randomized Trials With Binary Outcomes, Without Regression Model Assumptions, Jon Arni Steingrmisson, Daniel F. Hanley, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

Background: A recent guideline issued by the the European Medicines Agency discusses adjustment for prognostic baseline variables to improve precision and power in randomized trials.They state ``in case of a strong or moderate association between a baseline covariate(s) and the primary outcome measure, adjustment for such covariate(s) generally improves the efficiency of the analysis and avoids conditional bias from chance covariate imbalance.'' A challenge is that there are multiple statistical methods for adjusting for baseline variables, and little guidance on which to use. We investigate the pros and cons of two such adjustment methods.

Methods: We compare ...


Correction Of Verication Bias Using Log-Linear Models For A Single Binaryscale Diagnostic Tests, Haresh Rochani, Hani M. Samawi, Robert L. Vogel, Jingjing Yin 2016 Georgia Southern University

Correction Of Verication Bias Using Log-Linear Models For A Single Binaryscale Diagnostic Tests, Haresh Rochani, Hani M. Samawi, Robert L. Vogel, Jingjing Yin

Haresh Rochani

In diagnostic medicine, the test that determines the true disease status without an error is referred to as the gold standard. Even when a gold standard exists, it is extremely difficult to verify each patient due to the issues of costeffectiveness and invasive nature of the procedures. In practice some of the patients with test results are not selected for verification of the disease status which results in verification bias for diagnostic tests. The ability of the diagnostic test to correctly identify the patients with and without the disease can be evaluated by measures such as sensitivity, specificity and predictive ...


How Long Does That 10-Year Smoke Alarm Really Last? A Survival Analysis Of Smoke Alarms Installed Through The Saife Program In Rural Georgia, Haresh Rochani, Valamar Malika Reagon, Steve Davidson 2016 Georgia Southern University

How Long Does That 10-Year Smoke Alarm Really Last? A Survival Analysis Of Smoke Alarms Installed Through The Saife Program In Rural Georgia, Haresh Rochani, Valamar Malika Reagon, Steve Davidson

Haresh Rochani

Background: When functioning properly, a smoke alarm alerts individuals in the residence that smoke is near the alarm. Smoke alarms serve as a primary prevention mechanism to abate morbidity and mortality related to residential fires. Methods: Using survival analysis, we examined the length of operability of 10-year lithium battery powered smoke alarms installed through the Georgia Public Health/CDC SAIFE program in Moultrie, Georgia. Attempts were made to reach all homes in the city limits. The premise of the study is that geographic clusters (in the case of Moultrie city quadrants) are associated with decreases in the length of time ...


Initiation And Early Development Of Fiber In Wild And Cultivated Cotton, Kara M. Butterworth, Dean C. Adams, Harry T. Horner, Jonathan F. Wendel 2016 Iowa State University

Initiation And Early Development Of Fiber In Wild And Cultivated Cotton, Kara M. Butterworth, Dean C. Adams, Harry T. Horner, Jonathan F. Wendel

Harry Horner

Cultivated cotton fiber has undergone transformation from short, coarse fibers found in progenitor wild species to economically important, long, fine fibers grown globally. Morphological transformation requires understanding of development of wild fiber and developmental differences between wild and cultivated fiber.We examined early development of fibers, including abundance and placement on seed surface, nucleus position, presence of vacuoles, and fiber size and shape. Four species were studied using microscopic, morphometric, and statistical methods: Gossypium raimondii (wild D genome), Gossypium herbaceum (cultivated A genome), Gossypium hirsutum (wild tetraploid), and Gossypium hirsutum (cultivated tetraploid). Early fiber development is highly asynchronous in G ...


Mapping Morels: Predicting The Locations Of Morchella Species Through Environmental Factors Using The Gis System, Emily M. Stanevicius 2016 Augustana College - Rock Island

Mapping Morels: Predicting The Locations Of Morchella Species Through Environmental Factors Using The Gis System, Emily M. Stanevicius

Celebration of Learning

Morel mushrooms, Morechella esculenta and M. deliciosa, are known delicacies across the globe, ranging from exquisite dishes in French cuisine to Eastern palates such as Japanese Matsutake. According to literature, true morels diverged as their own genus about 129 million years, again which has led to the development of more than 177 species and have been part of the human diet since their beginning. However, the elusiveness of morels has contributed to the mushrooms infamy for rarity, and has even been known to sell for more than $40 per pound. This project seeks to aid in the search for morels ...


Octahedral Dice, Todd Estroff, Jeremiah Farrell 2016 Butler University

Octahedral Dice, Todd Estroff, Jeremiah Farrell

Jeremiah Farrell

All five Platonic solids have been used as random number generators in games involving chance with the cube being the most popular. Martin Gardenr, in his article on dice (MG 1977) remarks: "Why cubical?... It is the easiest to make, its six sides accomodate a set of numbers neither too large nor too small, and it rolls easily enough but not too easily."

Gardner adds that the octahedron has been the next most popular as a randomizer. We offer here several problems and games using octahedral dice. The first two are extensions from Gardner's article. All answers will be ...


Multiple Imputation Based Clustering Validation (Miv) For Big Longitudinal Trial Data With Missing Values In Ehealth, Zhaoyang Zhang, Hua (Julia) Fang, Honggang Wang 2016 University of Massachusetts Medical School

Multiple Imputation Based Clustering Validation (Miv) For Big Longitudinal Trial Data With Missing Values In Ehealth, Zhaoyang Zhang, Hua (Julia) Fang, Honggang Wang

Quantitative Health Sciences Publications and Presentations

Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of ...


Thinking Poker Through Game Theory, Damian Palafox 2016 California State University, San Bernardino

Thinking Poker Through Game Theory, Damian Palafox

Electronic Theses, Projects, and Dissertations

Poker is a complex game to analyze. In this project we will use the mathematics of game theory to solve some simplified variations of the game. Probability is the building block behind game theory. We must understand a few concepts from probability such as distributions, expected value, variance, and enumeration methods to aid us in studying game theory. We will solve and analyze games through game theory by using different decision methods, decision trees, and the process of domination and simplification. Poker models, with and without cards, will be provided to illustrate optimal strategies. Extensions to those models will be ...


Putting Prep Into Practice: Lessons Learned From Early-Adopting U.S. Providers' Firsthand Experiences Providing Hiv Pre-Exposure Prophylaxis And Associated Care, S. K. Calabrese, Manya Magnus, K. H. Mayer, D. S. Krakower, A. I. Eldahan, L. A. Gaston Hawkins, +5 additional authors 2016 George Washington University

Putting Prep Into Practice: Lessons Learned From Early-Adopting U.S. Providers' Firsthand Experiences Providing Hiv Pre-Exposure Prophylaxis And Associated Care, S. K. Calabrese, Manya Magnus, K. H. Mayer, D. S. Krakower, A. I. Eldahan, L. A. Gaston Hawkins, +5 Additional Authors

Epidemiology and Biostatistics Faculty Publications

Optimizing access to HIV pre-exposure prophylaxis (PrEP), an evidence-based HIV prevention resource, requires expanding healthcare providers' adoption of PrEP into clinical practice. This qualitative study explored PrEP providers' firsthand experiences relative to six commonly-cited barriers to prescription-financial coverage, implementation logistics, eligibility determination, adherence concerns, side effects, and anticipated behavior change (risk compensation)-as well as their recommendations for training PrEP-inexperienced providers. U.S.-based PrEP providers were recruited via direct outreach and referral from colleagues and other participants (2014-2015). One-on-one interviews were conducted in person or by phone, transcribed, and analyzed. The sample (n = 18) primarily practiced in the Northeastern ...


Homeolog Specific Expression Bias, Ronald D. Smith 2016 College of William and Mary

Homeolog Specific Expression Bias, Ronald D. Smith

Biology and Medicine Through Mathematics Conference

No abstract provided.


Heterogeneous Responses To Viral Infection: Insights From Mathematical Modeling Of Yellow Fever Vaccine, James R. Moore 2016 Emory University

Heterogeneous Responses To Viral Infection: Insights From Mathematical Modeling Of Yellow Fever Vaccine, James R. Moore

Biology and Medicine Through Mathematics Conference

No abstract provided.


Elements Of The Mathematical Formulation Of Quantum Mechanics, Keunjae Go 2016 Washington University in Saint Louis

Elements Of The Mathematical Formulation Of Quantum Mechanics, Keunjae Go

Senior Honors Papers / Undergraduate Theses

In this paper, we will explore some of the basic elements of the mathematical formulation of quantum mechanics. In the first section, I will list the motivations for introducing a probability model that is quite different from that of the classical probability theory, but still shares quite a few significant commonalities. Later in the paper, I will discuss the quantum probability theory in detail, while paying a brief attention to some of the axioms (by Birkhoff and von Neumann) that illustrate both the commonalities and differences between classical mechanics and quantum mechanics. This paper will end with a presentation of ...


Facets: Allele-Specific Copy Number And Clonal Heterogeneity Analysis Tool Estimates For High-Throughput Dna Sequencing, Ronglai Shen, Venkatraman Seshan 2016 Memorial Sloan-Kettering Cancer Center

Facets: Allele-Specific Copy Number And Clonal Heterogeneity Analysis Tool Estimates For High-Throughput Dna Sequencing, Ronglai Shen, Venkatraman Seshan

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Allele-specific copy number analysis (ASCN) from next generation sequenc- ing (NGS) data can greatly extend the utility of NGS beyond the iden- tification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of “actionable” mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy-, and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS- based cancer gene copy number data in the context of clinical ...


Distributed Target Tracking And Synchronization In Wireless Sensor Networks, Jichuan Li 2016 Washington University in St. Louis

Distributed Target Tracking And Synchronization In Wireless Sensor Networks, Jichuan Li

Engineering and Applied Science Theses & Dissertations

Wireless sensor networks provide useful information for various applications but pose challenges in scalable information processing and network maintenance. This dissertation focuses on statistical methods for distributed information fusion and sensor synchronization for target tracking in wireless sensor networks.

We perform target tracking using particle filtering. For scalability, we extend centralized particle filtering to distributed particle filtering via distributed fusion of local estimates provided by individual sensors. We derive a distributed fusion rule from Bayes' theorem and implement it via average consensus. We approximate each local estimate as a Gaussian mixture and develop a sampling-based approach to the nonlinear fusion ...


Digital Commons powered by bepress