Open Access. Powered by Scholars. Published by Universities.®

Biostatistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Biostatistics

Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre Aug 2020

Compressed Dna Representation For Efficient Amr Classification, John Partee, Robert Hazell, Anjli Solsi, John Santerre

SMU Data Science Review

In this paper, we explore a representation methodology for the compression of DNA isolates. Using lossless string compression via tokenization of frequently repeated segments of DNA, we reduce the length of the isolates to be counted as k-mers for classification. With this new representation, we apply a previously established feature sampling method to dramatically reduce the feature space. In understanding the genetic diversity, we also look at conserving biological function across these spaces. Using a random forest model we were able to predict the resistance or susceptibility of bacteria with 85-90\% accuracy, with a 30-50\% reduction in overall isolate length, …


A Differential Geometry-Based Machine Learning Algorithm For The Brain Age Problem, Justin Asher, Khoa Tan Dang, Maxwell Masters Aug 2020

A Differential Geometry-Based Machine Learning Algorithm For The Brain Age Problem, Justin Asher, Khoa Tan Dang, Maxwell Masters

The Journal of Purdue Undergraduate Research

No abstract provided.


Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery Durbin, Vignesh Jayaraman Muralidharan, Megan Novak, Brendon Cornett Jul 2020

Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery Durbin, Vignesh Jayaraman Muralidharan, Megan Novak, Brendon Cornett

HCA Healthcare Journal of Medicine

Electronic health records (EHRs) are an excellent source for secondary data analysis. Studies based on EHR-derived data, if designed properly, can answer previously unanswerable clinical research questions. In this paper we will highlight the benefits of large retrospective studies from secondary sources such as EHRs, examine retrospective cohort and case-control study design challenges, as well as methodological and statistical adjustment that can be made to overcome some of the inherent design limitations, in order to increase the generalizability, validity and reliability of the results obtained from these studies.


Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever Apr 2020

Introduction To Research Statistical Analysis: An Overview Of The Basics, Christian Vandever

HCA Healthcare Journal of Medicine

This article covers many statistical ideas essential to research statistical analysis. Sample size is explained through the concepts of statistical significance level and power. Variable types and definitions are included to clarify necessities for how the analysis will be interpreted. Categorical and quantitative variable types are defined, as well as response and predictor variables. Statistical tests described include t-tests, ANOVA and chi-square tests. Multiple regression is also explored for both logistic and linear regression. Finally, the most common statistics produced by these methods are explored.