Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics

Machine learning

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 39

Full-Text Articles in Entire DC Network

An Investigation Of Information Structures In Dna, Joel Mohrmann May 2024

An Investigation Of Information Structures In Dna, Joel Mohrmann

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

The information-containing nature of the DNA molecule has been long known and observed. One technique for quantifying the relationships existing within the information contained in DNA sequences is an entity from information theory known as the average mutual information (AMI) profile. This investigation sought to use principally the AMI profile along with a few other metrics to explore the structure of the information contained in DNA sequences.

Treating DNA sequences as an information source, several computational methods were employed to model their information structure. Maximum likelihood and maximum a posteriori estimators were used to predict missing bases in DNA sequences. …


Convolutional Neural Network-Based Gene Prediction Using Buffalograss As A Model System, Michael Morikone Nov 2023

Convolutional Neural Network-Based Gene Prediction Using Buffalograss As A Model System, Michael Morikone

Complex Biosystems PhD Program: Dissertations

The task of gene prediction has been largely stagnant in algorithmic improvements compared to when algorithms were first developed for predicting genes thirty years ago. Rather than iteratively improving the underlying algorithms in gene prediction tools by utilizing better performing models, most current approaches update existing tools through incorporating increasing amounts of extrinsic data to improve gene prediction performance. The traditional method of predicting genes is done using Hidden Markov Models (HMMs). These HMMs are constrained by having strict assumptions made about the independence of genes that do not always hold true. To address this, a Convolutional Neural Network (CNN) …


Application Of Machine Learning Approaches To Empower Drug Development, Yue Shen May 2023

Application Of Machine Learning Approaches To Empower Drug Development, Yue Shen

Doctoral Dissertations

Human health, one of the major topics in Life Science, is facing intensified challenges, including cancer, pandemic outbreaks, and antimicrobial resistance. Thus, new medicines with unique advantages, including peptide-based vaccines and permeable small molecule antimicrobials, are in urgent need. However, the drug development process is long, complex, and risky with no guarantee of success. Also, the improvements in techniques applied in genomics, proteomics, computational biology, and clinical trials significantly increase the data complexity and volume, which imposes higher requirements on the drug development pipeline. In recent years, machine learning (ML) methods were employed to support drug development in various aspects …


Evaluation Of Precision Livestock Technology And Human Scoring Of Nursery Pigs In A Controlled Immune Challenge Experiment, Eduarda M. Bortoluzzi, Mikayla J. Goering, Sara J. Ochoa, Aaron J. Holliday, Jared M. Mumm, Catherine E. Nelson, Hui Wu, Benny E. Mote, Eric T. Psota, Ty B. Schmidt, Majid Jaberi-Douraki, Lindsey E. Hulbert Jan 2023

Evaluation Of Precision Livestock Technology And Human Scoring Of Nursery Pigs In A Controlled Immune Challenge Experiment, Eduarda M. Bortoluzzi, Mikayla J. Goering, Sara J. Ochoa, Aaron J. Holliday, Jared M. Mumm, Catherine E. Nelson, Hui Wu, Benny E. Mote, Eric T. Psota, Ty B. Schmidt, Majid Jaberi-Douraki, Lindsey E. Hulbert

Department of Animal Science: Faculty Publications

The objectives were to determine the sensitivity, specificity, and cutoff values of a visual-based precision livestock technology (NUtrack), and determine the sensitivity and specificity of sickness score data collected with the live observation by trained human observers. At weaning, pigs (n = 192; gilts and barrows) were randomly assigned to one of twelve pens (16/pen) and treatments were randomly assigned to pens. Sham-pen pigs all received subcutaneous saline (3 mL). For LPS-pen pigs, all pigs received subcutaneous lipopolysaccharide (LPS; 300 µg/kg BW; E. coli O111:B4; in 3 mL of saline). For the last treatment, eight pigs were randomly …


Evaluation Of Precision Livestock Technology And Human Scoring Of Nursery Pigs In A Controlled Immune Challenge Experiment, Eduarda M. Bortoluzzi, Mikayla J. Goering, Sara J. Ochoa, Aaron J. Holliday, Jared M. Mumm, Catherine E. Nelson, Hui Wu, Benny Mote, Eric T. Psota, Ty B. Schmidt, Majid Jaberi-Douraki, Lindsey E. Hulbert Jan 2023

Evaluation Of Precision Livestock Technology And Human Scoring Of Nursery Pigs In A Controlled Immune Challenge Experiment, Eduarda M. Bortoluzzi, Mikayla J. Goering, Sara J. Ochoa, Aaron J. Holliday, Jared M. Mumm, Catherine E. Nelson, Hui Wu, Benny Mote, Eric T. Psota, Ty B. Schmidt, Majid Jaberi-Douraki, Lindsey E. Hulbert

Department of Animal Science: Faculty Publications

The objectives were to determine the sensitivity, specificity, and cutoff values of a visual-based precision livestock technology (NUtrack), and determine the sensitivity and specificity of sickness score data collected with the live observation by trained human observers. At weaning, pigs (n = 192; gilts and barrows) were randomly assigned to one of twelve pens (16/pen) and treatments were randomly assigned to pens. Sham-pen pigs all received subcutaneous saline (3 mL). For LPS-pen pigs, all pigs received subcutaneous lipopolysaccharide (LPS; 300 μg/kg BW; E. coli O111:B4; in 3 mL of saline). For the last treatment, eight pigs were randomly …


An Approach To Developing Benchmark Datasets For Protein Secondary Structure Segmentation From Cryo-Em Density Maps, Thu Nguyen, Yongcheng Mu, Jiangwen Sun, Jing He Jan 2023

An Approach To Developing Benchmark Datasets For Protein Secondary Structure Segmentation From Cryo-Em Density Maps, Thu Nguyen, Yongcheng Mu, Jiangwen Sun, Jing He

Computer Science Faculty Publications

More and more deep learning approaches have been proposed to segment secondary structures from cryo-electron density maps at medium resolution range (5--10Å). Although the deep learning approaches show great potential, only a few small experimental data sets have been used to test the approaches. There is limited understanding about potential factors, in data, that affect the performance of segmentation. We propose an approach to generate data sets with desired specifications in three potential factors - the protein sequence identity, structural contents, and data quality. The approach was implemented and has generated a test set and various training sets to study …


What I Talk About When I Talk About Integration Of Single-Cell Data, Yang Xu Aug 2022

What I Talk About When I Talk About Integration Of Single-Cell Data, Yang Xu

Doctoral Dissertations

Over the past decade, single-cell technologies evolved from profiling hundreds of cells to millions of cells, and emerged from a single modality of data to cover multiple views at single-cell resolution, including genome, epigenome, transcriptome, and so on. With advance of these single-cell technologies, the booming of multimodal single-cell data creates a valuable resource for us to understand cellular heterogeneity and molecular mechanism at a comprehensive level. However, the large-scale multimodal single-cell data also presents a huge computational challenge for insightful integrative analysis. Here, I will lay out problems in data integration that single-cell research community is interested in and …


Individual Beef Cattle Identification Using Muzzle Images And Deep Learning Techniques, Guoming Li, Galen E. Erickson, Yijie Xiong May 2022

Individual Beef Cattle Identification Using Muzzle Images And Deep Learning Techniques, Guoming Li, Galen E. Erickson, Yijie Xiong

Department of Animal Science: Faculty Publications

The ability to identify individual animals has gained great interest in beef feedlots to allow for animal tracking and all applications for precision management of individuals. This study assessed the feasibility and performance of a total of 59 deep learning models in identifying individual cattle with muzzle images. The best identification accuracy was 98.7%, and the fastest processing speed was 28.3 ms/image. A dataset containing 268 US feedlot cattle and 4923 muzzle images was published along with this article. This study demonstrates the great potential of using deep learning techniques to identify individual cattle using muzzle images and to support …


Identifying Conifer Tree Vs. Deciduous Shrub And Tree Regeneration Trajectories In A Space-For-Time Boreal Peatland Fire Chronosequence Using Multispectral Lidar, Humaira Enayetullah, Laura Chasmer, Christopher Hopkinson, Dan Thompson, Danielle Cobbaert Jan 2022

Identifying Conifer Tree Vs. Deciduous Shrub And Tree Regeneration Trajectories In A Space-For-Time Boreal Peatland Fire Chronosequence Using Multispectral Lidar, Humaira Enayetullah, Laura Chasmer, Christopher Hopkinson, Dan Thompson, Danielle Cobbaert

Aspen Bibliography

Wildland fires and anthropogenic disturbances can cause changes in vegetation species composition and structure in boreal peatlands. These could potentially alter regeneration trajectories following severe fire or through cumulative impacts of climate-mediated drying, fire, and/or anthropogenic disturbance. We used lidar-derived point cloud metrics, and site-specific locational attributes to assess trajectories of post-disturbance vegetation regeneration in boreal peatlands south of Fort McMurray, Alberta, Canada using a space-for-time-chronosequence. The objectives were to (a) develop methods to identify conifer trees vs. deciduous shrubs and trees using multi-spectral lidar data, (b) quantify the proportional coverage of shrubs and trees to determine environmental conditions driving …


Cbp60-Db: An Alphafold-Predicted Plant Kingdom-Wide Database Of The Calmodulin-Binding Protein 60 (Cbp60) Protein Family With A Novel Structural Clustering Algorithm, Keaun Amani, Vanessa Shivnauth, Christian Castroverde Jan 2022

Cbp60-Db: An Alphafold-Predicted Plant Kingdom-Wide Database Of The Calmodulin-Binding Protein 60 (Cbp60) Protein Family With A Novel Structural Clustering Algorithm, Keaun Amani, Vanessa Shivnauth, Christian Castroverde

Biology Faculty Publications

Molecular genetic analyses in the model species Arabidopsis thaliana have demonstrated the major roles of different CAM-BINDING PROTEIN 60 (CBP60) proteins in growth, stress signaling, and immune responses. Prominently, CBP60g and SARD1 are paralogous CBP60 transcription factors that regulate numerous components of the immune system, such as cell surface and intracellular immune receptors, MAP kinases, WRKY transcription factors, and biosynthetic enzymes for immunity-activating metabolites salicylic acid (SA) and N-hydroxypipecolic acid (NHP). However, their function, regulation and diversification in most species remain unclear. Here we have created CBP60-DB, a structural and bioinformatic database that comprehensively characterized 1052 CBP60 gene homologs …


Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray Dec 2021

Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray

Department of Statistics: Dissertations, Theses, and Student Work

Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …


Statistical Potentials For Rna-Protein Interactions Optimized By Cma-Es, Takayuki Kimura, Nobuaki Yasuo, Masakazu Sekijima, Brooke Lustig Oct 2021

Statistical Potentials For Rna-Protein Interactions Optimized By Cma-Es, Takayuki Kimura, Nobuaki Yasuo, Masakazu Sekijima, Brooke Lustig

Faculty Research, Scholarly, and Creative Activity

Characterizing RNA-protein interactions remains an important endeavor, complicated by the difficulty in obtaining the relevant structures. Evaluating model structures via statistical potentials is in principle straight-forward and effective. However, given the relatively small size of the existing learning set of RNA-protein complexes optimization of such potentials continues to be problematic. Notably, interaction-based statistical potentials have problems in addressing large RNA-protein complexes. In this study, we adopted a novel strategy with covariance matrix adaptation (CMA-ES) to calculate statistical potentials, successfully identifying native docking poses.


Artificial Image Objects For Classification Of Schizophrenia With Gwas-Selected Snvs And Convolutional Neural Network, Xiangning Chen, Daniel G. Chen, Zhongming Zhao, Justin Zhan, Changrong Ji, Jingchun Chen Aug 2021

Artificial Image Objects For Classification Of Schizophrenia With Gwas-Selected Snvs And Convolutional Neural Network, Xiangning Chen, Daniel G. Chen, Zhongming Zhao, Justin Zhan, Changrong Ji, Jingchun Chen

School of Medicine Faculty Publications

In this article, we propose a new approach to analyze large genomics data. We considered individual genetic variants as pixels in an image and transformed a collection of variants into an artificial image object (AIO), which could be classified as a regular image by CNN algorithms. Using schizophrenia as a case study, we demonstrate the principles and their applications with 3 datasets. With 4,096 SNVs, the CNN models achieved an accuracy of 0.678 ± 0.007 and an AUC of 0.738 ± 0.008 for the diagnosis phenotype. With 44,100 SNVs, the models achieved class-specific accuracies of 0.806 ± 0.032 and 0.820 …


Detection Of European Aspen (Populus Tremula L.) Based On An Unmanned Aerial Vehicle Approach In Boreal Forests, Anton Kuzmin, Lauri Korhonen, Sonja Kivinen, Pekka Hurskainen, Pasi Korpelainen, Topi Tanhuanpää, Matti Maltamo, Petteri Vihervaara, Timo Kumpula Apr 2021

Detection Of European Aspen (Populus Tremula L.) Based On An Unmanned Aerial Vehicle Approach In Boreal Forests, Anton Kuzmin, Lauri Korhonen, Sonja Kivinen, Pekka Hurskainen, Pasi Korpelainen, Topi Tanhuanpää, Matti Maltamo, Petteri Vihervaara, Timo Kumpula

Aspen Bibliography

European aspen (Populus tremula L.) is a keystone species for biodiversity of boreal forests. Large-diameter aspens maintain the diversity of hundreds of species, many of which are threatened in Fennoscandia. Due to a low economic value and relatively sparse and scattered occurrence of aspen in boreal forests, there is a lack of information of the spatial and temporal distribution of aspen, which hampers efficient planning and implementation of sustainable forest management practices and conservation efforts. Our objective was to assess identification of European aspen at the individual tree level in a southern boreal forest using high-resolution photogrammetric point cloud …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Applications Of Machine Learning In Microbial Forensics, Ryan B. Ghannam Jan 2021

Applications Of Machine Learning In Microbial Forensics, Ryan B. Ghannam

Dissertations, Master's Theses and Master's Reports

Microbial ecosystems are complex, with hundreds of members interacting with each other and the environment. The intricate and hidden behaviors underlying these interactions make research questions challenging – but can be better understood through machine learning. However, most machine learning that is used in microbiome work is a black box form of investigation, where accurate predictions can be made, but the inner logic behind what is driving prediction is hidden behind nontransparent layers of complexity.

Accordingly, the goal of this dissertation is to provide an interpretable and in-depth machine learning approach to investigate microbial biogeography and to use micro-organisms as …


Pathway‐Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Bagchee‐Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan Dec 2020

Pathway‐Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Bagchee‐Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan

Biochemistry Publications

Cancer chemotherapy responses have been related to multiple pharmacogenetic biomarkers, often for the same drug. This study utilizes machine learning to derive multi‐gene expression signatures that predict individual patient responses to specific tyrosine kinase inhibitors, including erlotinib, gefitinib, sorafenib, sunitinib, lapatinib and imatinib. Support vector machine (SVM) learning was used to train mathematical models that distinguished sensitivity from resistance to these drugs using a novel systems biology‐based approach. This began with expression of genes previously implicated in specific drug responses, then expanded to evaluate genes whose products were related through biochemical pathways and interactions. Optimal pathway‐extended SVMs predicted responses in …


Pathway-Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Jem Bagchee-Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan Nov 2020

Pathway-Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Jem Bagchee-Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan

Biochemistry Publications

No abstract provided.


A Review Of Integrative Imputation For Multi-Omics Datasets, Meng Song, Jonathan Greenbaum, Joseph Luttrell, Weihua Zhou, Chong Wu, Hui Shen, Ping Gong, Chaoyang Zhang, Hong Wen Deng Oct 2020

A Review Of Integrative Imputation For Multi-Omics Datasets, Meng Song, Jonathan Greenbaum, Joseph Luttrell, Weihua Zhou, Chong Wu, Hui Shen, Ping Gong, Chaoyang Zhang, Hong Wen Deng

Faculty Publications

Multi-omics studies, which explore the interactions between multiple types of biological factors, have significant advantages over single-omics analysis for their ability to provide a more holistic view of biological processes, uncover the causal and functional mechanisms for complex diseases, and facilitate new discoveries in precision medicine. However, omics datasets often contain missing values, and in multi-omics study designs it is common for individuals to be represented for some omics layers but not all. Since most statistical analyses cannot be applied directly to the incomplete datasets, imputation is typically performed to infer the missing values. Integrative imputation techniques which make use …


Machine Learning Approaches For Fracture Risk Assessment: A Comparative Analysis Of Genomic And Phenotypic Data In 5130 Older Men, Qing Wu, Fatma Nasoz, Jongyun Jung, Bibek Bhattarai, Mira V. Han Jul 2020

Machine Learning Approaches For Fracture Risk Assessment: A Comparative Analysis Of Genomic And Phenotypic Data In 5130 Older Men, Qing Wu, Fatma Nasoz, Jongyun Jung, Bibek Bhattarai, Mira V. Han

Public Health Faculty Publications

The study aims were to develop fracture prediction models by using machine learning approaches and genomic data, as well as to identify the best modeling approach for fracture prediction. The genomic data of Osteoporotic Fractures in Men, cohort Study (n = 5130), were analyzed. After a comprehensive genotype imputation, genetic risk score (GRS) was calculated from 1103 associated Single Nucleotide Polymorphisms for each participant. Data were normalized and split into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and logistic regression were used to develop prediction models for major osteoporotic fractures …


Machine Learning Prediction Of Glioblastoma Patient One-Year Survival, Andrew Du '20, Warren Mcgee, Jane Y. Wu Jan 2020

Machine Learning Prediction Of Glioblastoma Patient One-Year Survival, Andrew Du '20, Warren Mcgee, Jane Y. Wu

Student Publications & Research

Glioblastoma (GBM) is a grade IV astrocytoma formed primarily from cancerous astrocytes and sustained by intense angiogenesis. GBM often causes non-specific symptoms, creating difficulty for diagnosis. This study aimed to utilize machine learning techniques to provide an accurate one-year survival prognosis for GBM patients using clinical and genomic data from the Chinese Glioma Genome Atlas. Logistic regression (LR), support vector machines (SVM), random forest (RF), and ensemble models were used to identify and select predictors for GBM survival and to classify patients into those with an overall survival (OS) of less than one year and one year or greater. With …


Transcription Factor Binding Site Clusters Identify Target Genes With Similar Tissue-Wide Expression And Buffer Against Mutations., Peter Rogan, Ruipeng Lu Jan 2019

Transcription Factor Binding Site Clusters Identify Target Genes With Similar Tissue-Wide Expression And Buffer Against Mutations., Peter Rogan, Ruipeng Lu

Biochemistry Publications

Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were …


A Comparative Evaluation Of The Generalised Predictive Ability Of Eight Machine Learning Algorithms Across Ten Clinical Metabolomics Data Sets For Binary Classification, Kevin M. Mendez, Stacey N. Reinke, David I. Broadhurst Jan 2019

A Comparative Evaluation Of The Generalised Predictive Ability Of Eight Machine Learning Algorithms Across Ten Clinical Metabolomics Data Sets For Binary Classification, Kevin M. Mendez, Stacey N. Reinke, David I. Broadhurst

Research outputs 2014 to 2021

Introduction:

Metabolomics is increasingly being used in the clinical setting for disease diagnosis, prognosis and risk prediction. Machine learning algorithms are particularly important in the construction of multivariate metabolite prediction. Historically, partial least squares (PLS) regression has been the gold standard for binary classification. Nonlinear machine learning methods such as random forests (RF), kernel support vector machines (SVM) and artificial neural networks (ANN) may be more suited to modelling possible nonlinear metabolite covariance, and thus provide better predictive models.

Objectives:

We hypothesise that for binary classification using metabolomics data, non-linear machine learning methods will provide superior generalised predictive ability when …


Data And Statistical Methods To Analyze The Human Microbiome, Levi Waldron Mar 2018

Data And Statistical Methods To Analyze The Human Microbiome, Levi Waldron

Publications and Research

The Waldron lab for computational biostatistics bridges the areas of cancer genomics and microbiome studies for public health, developing methods to exploit publicly available data resources and to integrate-omics studies.


Big Data Analytics And Precision Animal Agriculture Symposium: Machine Learning And Data Mining Advance Predictive Big Data Analysis In Precision Animal Agriculture, Gota Morota, Ricardo V. Ventura, Fabyano F. Silva, Masanori Koyama, Samodha C. Fernando Jan 2018

Big Data Analytics And Precision Animal Agriculture Symposium: Machine Learning And Data Mining Advance Predictive Big Data Analysis In Precision Animal Agriculture, Gota Morota, Ricardo V. Ventura, Fabyano F. Silva, Masanori Koyama, Samodha C. Fernando

Department of Animal Science: Faculty Publications

Precision animal agriculture is poised to rise to prominence in the livestock enterprise in the domains of management, production, welfare, sustainability, health surveillance, and environmental footprint. Considerable progress has been made in the use of tools to routinely monitor and collect information from animals and farms in a less laborious manner than before. These efforts have enabled the animal sciences to embark on information technology-driven discoveries to improve animal agriculture. However, the growing amount and complexity of data generated by fully automated, high-throughput data recording or phenotyping platforms, including digital images, sensor and sound data, unmanned systems, and information obtained …


Recurrent Neural Networks And Their Applications To Rna Secondary Structure Inference, Devin Willmott Jan 2018

Recurrent Neural Networks And Their Applications To Rna Secondary Structure Inference, Devin Willmott

Theses and Dissertations--Mathematics

Recurrent neural networks (RNNs) are state of the art sequential machine learning tools, but have difficulty learning sequences with long-range dependencies due to the exponential growth or decay of gradients backpropagated through the RNN. Some methods overcome this problem by modifying the standard RNN architecure to force the recurrent weight matrix W to remain orthogonal throughout training. The first half of this thesis presents a novel orthogonal RNN architecture that enforces orthogonality of W by parametrizing with a skew-symmetric matrix via the Cayley transform. We present rules for backpropagation through the Cayley transform, show how to deal with the Cayley …


Understanding Huntington's Disease Using Machine Learning Approaches, Sonali Lokhande Dec 2017

Understanding Huntington's Disease Using Machine Learning Approaches, Sonali Lokhande

KGI Theses and Dissertations

Huntington’s disease (HD) is a debilitating neurodegenerative disorder with a complex pathophysiology. Despite extensive studies to study the disease, the sequence of events through which mutant Huntingtin (mHtt) protein executes its action still remains elusive. The phenotype of HD is an outcome of numerous processes initiated by the mHtt protein along with other proteins that act as either suppressors or enhancers of the effects of mHtt protein and PolyQ aggregates. Utilizing an integrative systems biology approach, I construct and analyze a Huntington’s disease integrome using human orthologs of protein interactors of wild type and mHtt protein. Analysis of this integrome …


Pattern Discovery In Brain Imaging Genetics Via Scca Modeling With A Generic Non-Convex Penalty, Lei Du, Kefei Liu, Xiaohui Yao, Jingwen Yan, Shannon L. Risacher, Junwei Han, Lei Guo, Andrew J. Saykin, Li Shen, Michael W. Weiner, Paul Aisen, Ronald Petersen, Clifford R. Jack, William Jagust, John Q. Trojanowki, Arthur W. Toga, Laurel Beckett, Robert C. Green, John Morris, Leslie M. Shaw, Zaven Khachaturian, Greg Sorensen, Maria Carrillo, Lew Kuller, Marc Raichle, Steven Paul, Peter Davies, Howard Fillit, Franz Hefti, David Holtzman, Charles D. Smith, Gregory Jicha, Peter A. Hardy, Partha Sinha, Elizabeth Oates, Gary Conrad Oct 2017

Pattern Discovery In Brain Imaging Genetics Via Scca Modeling With A Generic Non-Convex Penalty, Lei Du, Kefei Liu, Xiaohui Yao, Jingwen Yan, Shannon L. Risacher, Junwei Han, Lei Guo, Andrew J. Saykin, Li Shen, Michael W. Weiner, Paul Aisen, Ronald Petersen, Clifford R. Jack, William Jagust, John Q. Trojanowki, Arthur W. Toga, Laurel Beckett, Robert C. Green, John Morris, Leslie M. Shaw, Zaven Khachaturian, Greg Sorensen, Maria Carrillo, Lew Kuller, Marc Raichle, Steven Paul, Peter Davies, Howard Fillit, Franz Hefti, David Holtzman, Charles D. Smith, Gregory Jicha, Peter A. Hardy, Partha Sinha, Elizabeth Oates, Gary Conrad

Neurology Faculty Publications

Brain imaging genetics intends to uncover associations between genetic markers and neuroimaging quantitative traits. Sparse canonical correlation analysis (SCCA) can discover bi-multivariate associations and select relevant features, and is becoming popular in imaging genetic studies. The L1-norm function is not only convex, but also singular at the origin, which is a necessary condition for sparsity. Thus most SCCA methods impose 1-norm onto the individual feature or the structure level of features to pursuit corresponding sparsity. However, the 1-norm penalty over-penalizes large coefficients and may incurs estimation bias. A number of non-convex penalties are proposed to reduce …


Accurate Cytogenetic Biodosimetry Through Automated Dicentric Chromosome Curation And Metaphase Cell Selection, Jin Liu, Yanxin Li, Ruth Wilkins, Canadian Nuclear Laboratories, Joan H. Knoll, Peter Rogan Aug 2017

Accurate Cytogenetic Biodosimetry Through Automated Dicentric Chromosome Curation And Metaphase Cell Selection, Jin Liu, Yanxin Li, Ruth Wilkins, Canadian Nuclear Laboratories, Joan H. Knoll, Peter Rogan

Biochemistry Publications

Accurate digital image analysis of abnormal microscopic structures relies on high quality images and on minimizing the rates of false positive (FP) and negative objects in images. Cytogenetic biodosimetry detects dicentric chromosomes (DCs) that arise from exposure to ionizing radiation, and determines radiation dose received based on DC frequency. Improvements in automated DC recognition increase the accuracy of dose estimates by reclassifying FP DCs as monocentric chromosomes or chromosome fragments. We also present image segmentation methods to rank high quality digital metaphase images and eliminate suboptimal metaphase cells. A set of chromosome morphology segmentation methods selectively filtered out FP DCs …


Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun Apr 2017

Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun

Biostatistics Faculty Publications

In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer …