Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil May 2021

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches, Abhijeet R Patil

Open Access Theses & Dissertations

With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first …


Multiple Imputation Using Influential Exponential Tilting In Case Of Non-Ignorable Missing Data, Kavita Gohil Jan 2020

Multiple Imputation Using Influential Exponential Tilting In Case Of Non-Ignorable Missing Data, Kavita Gohil

Electronic Theses and Dissertations

Modern research strategies rely predominantly on three steps, data collection, data analysis, and inference. In research, if the data is not collected as designed, researchers may face challenges of having incomplete data, especially when it is non-ignorable. These situations affect the subsequent steps of evaluation and make them difficult to perform. Inference with incomplete data is a challenging task in data analysis and clinical trials when missing data related to the condition under the study. Moreover, results obtained from incomplete data are prone to biases. Parameter estimation with non-ignorable missing data is even more challenging to handle and extract useful …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …


Resampling-Based Multiple Comparisons For Generalized Linear Models, Josephine Sarpong Akosa Jan 2014

Resampling-Based Multiple Comparisons For Generalized Linear Models, Josephine Sarpong Akosa

Open Access Theses & Dissertations

Diverse applications in medical and epidemiological research routinely utilize generalized linear modeling to explain the relationship between the incidence of disease and particular risk factors. Researchers' interest in such models are estimated quantities from the model such as the response probabilities, the relative risks or the odds ratios and not the model itself. Often, the simultaneous estimation of these quantities or a subset of the quantities are warranted. The results are usually reported via confidence intervals at a pre-specified level of significance. Utilizing the usual 95% pointwise confidence intervals for the simultaneous inference inflates the risk of making type I …


A Comparison Of Microarray Analyses: A Mixed Models Approach Versus The Significance Analysis Of Microarrays, Nathan Wallace Stephens Nov 2006

A Comparison Of Microarray Analyses: A Mixed Models Approach Versus The Significance Analysis Of Microarrays, Nathan Wallace Stephens

Theses and Dissertations

DNA microarrays are a relatively new technology for assessing the expression levels of thousands of genes simultaneously. Researchers hope to find genes that are differentially expressed by hybridizing cDNA from known treatment sources with various genes spotted on the microarrays. The large number of tests involved in analyzing microarrays has raised new questions in multiple testing. Several approaches for identifying differentially expressed genes have been proposed. This paper considers two: (1) a mixed models approach, and (2) the Signiffcance Analysis of Microarrays.


Comparing The Statistical Tests For Homogeneity Of Variances., Zhiqiang Mu Aug 2006

Comparing The Statistical Tests For Homogeneity Of Variances., Zhiqiang Mu

Electronic Theses and Dissertations

Testing the homogeneity of variances is an important problem in many applications since statistical methods of frequent use, such as ANOVA, assume equal variances for two or more groups of data. However, testing the equality of variances is a difficult problem due to the fact that many of the tests are not robust against non-normality. It is known that the kurtosis of the distribution of the source data can affect the performance of the tests for variance. We review the classical tests and their latest, more robust modifications, some other tests that have recently appeared in the literature, and use …