Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Theses/Dissertations

Open Access Theses & Dissertations

Resampling

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil May 2021

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil

Open Access Theses & Dissertations

With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first …


Resampling-Based Multiple Comparisons For Generalized Linear Models, Josephine Sarpong Akosa Jan 2014

Resampling-Based Multiple Comparisons For Generalized Linear Models, Josephine Sarpong Akosa

Open Access Theses & Dissertations

Diverse applications in medical and epidemiological research routinely utilize generalized linear modeling to explain the relationship between the incidence of disease and particular risk factors. Researchers' interest in such models are estimated quantities from the model such as the response probabilities, the relative risks or the odds ratios and not the model itself. Often, the simultaneous estimation of these quantities or a subset of the quantities are warranted. The results are usually reported via confidence intervals at a pre-specified level of significance. Utilizing the usual 95% pointwise confidence intervals for the simultaneous inference inflates the risk of making type I …