Genetics and Genomics | Open Access Articles

An Approach To Developing Benchmark Datasets For Protein Secondary Structure Segmentation From Cryo-Em Density Maps, Thu Nguyen, Yongcheng Mu, Jiangwen Sun, Jing He

Computer Science Faculty Publications

More and more deep learning approaches have been proposed to segment secondary structures from cryo-electron density maps at medium resolution range (5--10Å). Although the deep learning approaches show great potential, only a few small experimental data sets have been used to test the approaches. There is limited understanding about potential factors, in data, that affect the performance of segmentation. We propose an approach to generate data sets with desired specifications in three potential factors - the protein sequence identity, structural contents, and data quality. The approach was implemented and has generated a test set and various training sets to study …

Go to article

Pathway‐Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Bagchee‐Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan

Biochemistry Publications

Cancer chemotherapy responses have been related to multiple pharmacogenetic biomarkers, often for the same drug. This study utilizes machine learning to derive multi‐gene expression signatures that predict individual patient responses to specific tyrosine kinase inhibitors, including erlotinib, gefitinib, sorafenib, sunitinib, lapatinib and imatinib. Support vector machine (SVM) learning was used to train mathematical models that distinguished sensitivity from resistance to these drugs using a novel systems biology‐based approach. This began with expression of genes previously implicated in specific drug responses, then expanded to evaluate genes whose products were related through biochemical pathways and interactions. Optimal pathway‐extended SVMs predicted responses in …

Go to article

Pathway-Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Jem Bagchee-Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan

Biochemistry Publications

No abstract provided.

Go to article

Machine Learning Approaches For Fracture Risk Assessment: A Comparative Analysis Of Genomic And Phenotypic Data In 5130 Older Men, Qing Wu, Fatma Nasoz, Jongyun Jung, Bibek Bhattarai, Mira V. Han

Public Health Faculty Publications

The study aims were to develop fracture prediction models by using machine learning approaches and genomic data, as well as to identify the best modeling approach for fracture prediction. The genomic data of Osteoporotic Fractures in Men, cohort Study (n = 5130), were analyzed. After a comprehensive genotype imputation, genetic risk score (GRS) was calculated from 1103 associated Single Nucleotide Polymorphisms for each participant. Data were normalized and split into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and logistic regression were used to develop prediction models for major osteoporotic fractures …

Go to article

Machine Learning Prediction Of Glioblastoma Patient One-Year Survival, Andrew Du '20, Warren Mcgee, Jane Y. Wu

Student Publications & Research

Glioblastoma (GBM) is a grade IV astrocytoma formed primarily from cancerous astrocytes and sustained by intense angiogenesis. GBM often causes non-specific symptoms, creating difficulty for diagnosis. This study aimed to utilize machine learning techniques to provide an accurate one-year survival prognosis for GBM patients using clinical and genomic data from the Chinese Glioma Genome Atlas. Logistic regression (LR), support vector machines (SVM), random forest (RF), and ensemble models were used to identify and select predictors for GBM survival and to classify patients into those with an overall survival (OS) of less than one year and one year or greater. With …

Go to article

Data And Statistical Methods To Analyze The Human Microbiome, Levi Waldron

Publications and Research

The Waldron lab for computational biostatistics bridges the areas of cancer genomics and microbiome studies for public health, developing methods to exploit publicly available data resources and to integrate-omics studies.

Go to article

Understanding Huntington's Disease Using Machine Learning Approaches, Sonali Lokhande

KGI Theses and Dissertations

Huntington’s disease (HD) is a debilitating neurodegenerative disorder with a complex pathophysiology. Despite extensive studies to study the disease, the sequence of events through which mutant Huntingtin (mHtt) protein executes its action still remains elusive. The phenotype of HD is an outcome of numerous processes initiated by the mHtt protein along with other proteins that act as either suppressors or enhancers of the effects of mHtt protein and PolyQ aggregates. Utilizing an integrative systems biology approach, I construct and analyze a Huntington’s disease integrome using human orthologs of protein interactors of wild type and mHtt protein. Analysis of this integrome …

Go to article

Pattern Discovery In Brain Imaging Genetics Via Scca Modeling With A Generic Non-Convex Penalty, Lei Du, Kefei Liu, Xiaohui Yao, Jingwen Yan, Shannon L. Risacher, Junwei Han, Lei Guo, Andrew J. Saykin, Li Shen, Michael W. Weiner, Paul Aisen, Ronald Petersen, Clifford R. Jack, William Jagust, John Q. Trojanowki, Arthur W. Toga, Laurel Beckett, Robert C. Green, John Morris, Leslie M. Shaw, Zaven Khachaturian, Greg Sorensen, Maria Carrillo, Lew Kuller, Marc Raichle, Steven Paul, Peter Davies, Howard Fillit, Franz Hefti, David Holtzman, Charles D. Smith, Gregory Jicha, Peter A. Hardy, Partha Sinha, Elizabeth Oates, Gary Conrad

Neurology Faculty Publications

Brain imaging genetics intends to uncover associations between genetic markers and neuroimaging quantitative traits. Sparse canonical correlation analysis (SCCA) can discover bi-multivariate associations and select relevant features, and is becoming popular in imaging genetic studies. The L1-norm function is not only convex, but also singular at the origin, which is a necessary condition for sparsity. Thus most SCCA methods impose ℓ₁-norm onto the individual feature or the structure level of features to pursuit corresponding sparsity. However, the ℓ₁-norm penalty over-penalizes large coefficients and may incurs estimation bias. A number of non-convex penalties are proposed to reduce …

Go to article

Accurate Cytogenetic Biodosimetry Through Automated Dicentric Chromosome Curation And Metaphase Cell Selection, Jin Liu, Yanxin Li, Ruth Wilkins, Canadian Nuclear Laboratories, Joan H. Knoll, Peter Rogan

Biochemistry Publications

Accurate digital image analysis of abnormal microscopic structures relies on high quality images and on minimizing the rates of false positive (FP) and negative objects in images. Cytogenetic biodosimetry detects dicentric chromosomes (DCs) that arise from exposure to ionizing radiation, and determines radiation dose received based on DC frequency. Improvements in automated DC recognition increase the accuracy of dose estimates by reclassifying FP DCs as monocentric chromosomes or chromosome fragments. We also present image segmentation methods to rank high quality digital metaphase images and eliminate suboptimal metaphase cells. A set of chromosome morphology segmentation methods selectively filtered out FP DCs …

Go to article

Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun

Biostatistics Faculty Publications

In contrast to feature selection and gene set analysis, bi-level selection is a process of selecting not only important gene sets but also important genes within those gene sets. Depending on the order of selections, a bi-level selection method can be classified into three categories – forward selection, which first selects relevant gene sets followed by the selection of relevant individual genes; backward selection which takes the reversed order; and simultaneous selection, which performs the two tasks simultaneously usually with the aids of a penalized regression model. To test the existence of subtype-specific prognostic genes for non-small cell lung cancer …

Go to article

Detecting Gene-Gene Interactions Using A Permutation-Based Random Forest Method, Jing Li, James D. Malley, Angeline S. Andrew, Margaret R. Karagas, Jason H. Moore

Dartmouth Scholarship

Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions.

Go to article

Cross-Platform Normalization Of Microarray And Rna-Seq Data For Machine Learning Applications, Jeffrey A. Thompson, Jie Tan, Casey S. Greene

Dartmouth Scholarship

Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log …

Go to article

Genetics and Genomics Commons^™

Full-Text Articles in Genetics and Genomics

An Approach To Developing Benchmark Datasets For Protein Secondary Structure Segmentation From Cryo-Em Density Maps, Thu Nguyen, Yongcheng Mu, Jiangwen Sun, Jing He

Computer Science Faculty Publications

Pathway‐Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Bagchee‐Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan

Biochemistry Publications

Pathway-Extended Gene Expression Signatures Integrate Novel Biomarkers That Improve Predictions Of Patient Responses To Kinase Inhibitors, Ashis Jem Bagchee-Clark, Eliseos J. Mucaki, Tyson Whitehead, Peter Rogan

Biochemistry Publications

Machine Learning Approaches For Fracture Risk Assessment: A Comparative Analysis Of Genomic And Phenotypic Data In 5130 Older Men, Qing Wu, Fatma Nasoz, Jongyun Jung, Bibek Bhattarai, Mira V. Han

Public Health Faculty Publications

Machine Learning Prediction Of Glioblastoma Patient One-Year Survival, Andrew Du '20, Warren Mcgee, Jane Y. Wu

Student Publications & Research

Data And Statistical Methods To Analyze The Human Microbiome, Levi Waldron

Publications and Research

Understanding Huntington's Disease Using Machine Learning Approaches, Sonali Lokhande

KGI Theses and Dissertations

Neurology Faculty Publications

Accurate Cytogenetic Biodosimetry Through Automated Dicentric Chromosome Curation And Metaphase Cell Selection, Jin Liu, Yanxin Li, Ruth Wilkins, Canadian Nuclear Laboratories, Joan H. Knoll, Peter Rogan

Biochemistry Publications

Identification Of Prognostic Genes And Gene Sets For Early-Stage Non-Small Cell Lung Cancer Using Bi-Level Selection Methods, Suyan Tian, Chi Wang, Howard H. Chang, Jianguo Sun

Biostatistics Faculty Publications

Detecting Gene-Gene Interactions Using A Permutation-Based Random Forest Method, Jing Li, James D. Malley, Angeline S. Andrew, Margaret R. Karagas, Jason H. Moore

Dartmouth Scholarship

Cross-Platform Normalization Of Microarray And Rna-Seq Data For Machine Learning Applications, Jeffrey A. Thompson, Jie Tan, Casey S. Greene

Dartmouth Scholarship