Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Humans

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 64

Full-Text Articles in Bioinformatics

Radiation Exposure Determination In A Secure, Cloud-Based Online Environment, Ben C. Shirley, Eliseos J. Mucaki, Peter Rogan Oct 2022

Radiation Exposure Determination In A Secure, Cloud-Based Online Environment, Ben C. Shirley, Eliseos J. Mucaki, Peter Rogan

Biochemistry Publications

Rapid sample processing and interpretation of estimated exposures will be critical for triaging exposed individuals after a major radiation incident. The dicentric chromosome (DC) assay assesses absorbed radiation using metaphase cells from blood. The Automated Dicentric Chromosome Identifier and Dose Estimator System (ADCI) identifies DCs and determines radiation doses. This study aimed to broaden accessibility and speed of this system, while protecting data and software integrity. ADCI Online is a secure web-streaming platform accessible worldwide from local servers. Cloud-based systems containing data and software are separated until they are linked for radiation exposure estimation. Dose estimates are identical to ADCI …


Measuring And Controlling Medical Record Abstraction (Mra) Error Rates In An Observational Study., Maryam Y Garza, Tremaine Williams, Sahiti Myneni, Susan H Fenton, Songthip Ounpraseuth, Zhuopei Hu, Jeannette Lee, Jessica Snowden, Meredith N Zozus, Anita C Walden, Alan E Simon, Barbara Mcclaskey, Sarah G Sanders, Sandra S Beauman, Sara R Ford, Lacy Malloch, Amy Wilson, Lori A Devlin, Leslie W Young Aug 2022

Measuring And Controlling Medical Record Abstraction (Mra) Error Rates In An Observational Study., Maryam Y Garza, Tremaine Williams, Sahiti Myneni, Susan H Fenton, Songthip Ounpraseuth, Zhuopei Hu, Jeannette Lee, Jessica Snowden, Meredith N Zozus, Anita C Walden, Alan E Simon, Barbara Mcclaskey, Sarah G Sanders, Sandra S Beauman, Sara R Ford, Lacy Malloch, Amy Wilson, Lori A Devlin, Leslie W Young

Journal Articles

BACKGROUND: Studies have shown that data collection by medical record abstraction (MRA) is a significant source of error in clinical research studies relying on secondary use data. Yet, the quality of data collected using MRA is seldom assessed. We employed a novel, theory-based framework for data quality assurance and quality control of MRA. The objective of this work is to determine the potential impact of formalized MRA training and continuous quality control (QC) processes on data quality over time.

METHODS: We conducted a retrospective analysis of QC data collected during a cross-sectional medical record review of mother-infant dyads with Neonatal …


Radiation Exposure Determination In A Secure, Cloudbased Online Environment, Ben C. Shirley, Eliseos J. Mucaki, Joan H.M. Knoll, Peter Rogan Jan 2022

Radiation Exposure Determination In A Secure, Cloudbased Online Environment, Ben C. Shirley, Eliseos J. Mucaki, Joan H.M. Knoll, Peter Rogan

Biochemistry Publications

Rapid sample processing and interpretation of estimated exposures will be critical for triaging exposed individuals after a major radiation incident. The dicentric chromosome (DC) assay assesses absorbed radiation using metaphase cells from blood. The Automated Dicentric Chromosome Identifier and Dose Estimator System (ADCI) identifies DCs and determines radiation doses. This study aimed to broaden accessibility and speed of this system, while protecting data and software integrity. ADCI Online is a secure web-streaming platform accessible worldwide from local servers. Cloud-based systems containing data and software are separated until they are linked for radiation exposure estimation. Dose estimates are identical to ADCI …


Comprehensive Characterization Of Covid-19 Patients With Repeatedly Positive Sars-Cov-2 Tests Using A Large U.S. Electronic Health Record Database., Xiao Dong, Yujia Zhou, Xiao-Ou Shu, Elmer V Bernstam, Rebecca Stern, David M Aronoff, Hua Xu, Loren Lipworth Sep 2021

Comprehensive Characterization Of Covid-19 Patients With Repeatedly Positive Sars-Cov-2 Tests Using A Large U.S. Electronic Health Record Database., Xiao Dong, Yujia Zhou, Xiao-Ou Shu, Elmer V Bernstam, Rebecca Stern, David M Aronoff, Hua Xu, Loren Lipworth

Journal Articles

In the absence of genome sequencing, two positive molecular tests for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) separated by negative tests, prolonged time, and symptom resolution remain the best surrogate measure of possible reinfection. Using a large electronic health record database, we characterized clinical and testing data for 23 patients with repeatedly positive SARS-CoV-2 PCR test results ≥60 days apart, separated by ≥2 consecutive negative test results. The prevalence of chronic medical conditions, symptoms, and severe outcomes related to coronavirus disease 19 (COVID-19) illness were ascertained. The median age of patients was 64.5 years, 40% were Black, and 39% …


Digital Technology Needs In Maternal Mental Health: A Qualitative Inquiry., Alexandra Zingg, Laura Carter, Deevakar Rogith, Amy Franklin, Sudhakar Selvaraj, Jerrie Refuerzo, Sahiti Myneni May 2021

Digital Technology Needs In Maternal Mental Health: A Qualitative Inquiry., Alexandra Zingg, Laura Carter, Deevakar Rogith, Amy Franklin, Sudhakar Selvaraj, Jerrie Refuerzo, Sahiti Myneni

Journal Articles

Digital technologies offer many opportunities to improve mental healthcare management for women seeking pre- and-postnatal care. They provide a discrete, practical medium that is well-suited for the sensitive nature of mental health. Women who are more prone to experiencing peripartum depression (PPD), such as those of low-socioeconomic background or in high-risk pregnancies, can benefit the most from such technologies. However, current digital interventions directed towards this population provide suboptimal support, and their responsiveness to end user needs is quite limited. Our objective is to understand the digital terrain of information needs for low-socioeconomic status women with high-risk pregnancies, specifically within …


Deciphering Hierarchical Organization Of Topologically Associated Domains Through Change-Point Testing., Haipeng Xing, Yingru Wu, Michael Q Zhang, Yong Chen Apr 2021

Deciphering Hierarchical Organization Of Topologically Associated Domains Through Change-Point Testing., Haipeng Xing, Yingru Wu, Michael Q Zhang, Yong Chen

Faculty Scholarship for the College of Science & Mathematics

BACKGROUND: The nucleus of eukaryotic cells spatially packages chromosomes into a hierarchical and distinct segregation that plays critical roles in maintaining transcription regulation. High-throughput methods of chromosome conformation capture, such as Hi-C, have revealed topologically associating domains (TADs) that are defined by biased chromatin interactions within them.

RESULTS: We introduce a novel method, HiCKey, to decipher hierarchical TAD structures in Hi-C data and compare them across samples. We first derive a generalized likelihood-ratio (GLR) test for detecting change-points in an interaction matrix that follows a negative binomial distribution or general mixture distribution. We then employ several optimal search strategies to …


Generalized And Transferable Patient Language Representation For Phenotyping With Limited Data., Yuqi Si, Elmer V Bernstam, Kirk Roberts Apr 2021

Generalized And Transferable Patient Language Representation For Phenotyping With Limited Data., Yuqi Si, Elmer V Bernstam, Kirk Roberts

Journal Articles

The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 …


Representation Of Ehr Data For Predictive Modeling: A Comparison Between Umls And Other Terminologies., Laila Rasmy, Firat Tiryaki, Yujia Zhou, Yang Xiang, Cui Tao, Hua Xu, Degui Zhi Oct 2020

Representation Of Ehr Data For Predictive Modeling: A Comparison Between Umls And Other Terminologies., Laila Rasmy, Firat Tiryaki, Yujia Zhou, Yang Xiang, Cui Tao, Hua Xu, Degui Zhi

Journal Articles

OBJECTIVE: Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning.

MATERIALS AND METHODS: We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, …


Covid-19 Testnorm: A Tool To Normalize Covid-19 Testing Names To Loinc Codes., Xiao Dong, Jianfu Li, Ekin Soysal, Jiang Bian, Scott L Duvall, Elizabeth Hanchrow, Hongfang Liu, Kristine E Lynch, Michael Matheny, Karthik Natarajan, Lucila Ohno-Machado, Serguei Pakhomov, Ruth Madeleine Reeves, Amy M Sitapati, Swapna Abhyankar, Theresa Cullen, Jami Deckard, Xiaoqian Jiang, Robert Murphy, Hua Xu Jul 2020

Covid-19 Testnorm: A Tool To Normalize Covid-19 Testing Names To Loinc Codes., Xiao Dong, Jianfu Li, Ekin Soysal, Jiang Bian, Scott L Duvall, Elizabeth Hanchrow, Hongfang Liu, Kristine E Lynch, Michael Matheny, Karthik Natarajan, Lucila Ohno-Machado, Serguei Pakhomov, Ruth Madeleine Reeves, Amy M Sitapati, Swapna Abhyankar, Theresa Cullen, Jami Deckard, Xiaoqian Jiang, Robert Murphy, Hua Xu

Journal Articles

Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on coronavirus disease 2019 (COVID-19). Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC (Logical Observation Identifiers Names and Codes) …


Deep Learning In Clinical Natural Language Processing: A Methodical Review., Stephen Wu, Kirk Roberts, Surabhi Datta, Jingcheng Du, Zongcheng Ji, Yuqi Si, Sarvesh Soni, Qiong Wang, Qiang Wei, Yang Xiang, Bo Zhao, Hua Xu Mar 2020

Deep Learning In Clinical Natural Language Processing: A Methodical Review., Stephen Wu, Kirk Roberts, Surabhi Datta, Jingcheng Du, Zongcheng Ji, Yuqi Si, Sarvesh Soni, Qiong Wang, Qiang Wei, Yang Xiang, Bo Zhao, Hua Xu

Journal Articles

OBJECTIVE: This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research.

MATERIALS AND METHODS: We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers.

RESULTS: DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) …


Digilego For Peripartum Depression: A Novel Patient-Facing Digital Health Instantiation, J Rodin, C Timko, S Harris Jan 2020

Digilego For Peripartum Depression: A Novel Patient-Facing Digital Health Instantiation, J Rodin, C Timko, S Harris

Journal Articles

Digital health technologies offer unique opportunities to improve health outcomes for mental health conditions such as peripartum depression (PPD), a disorder that affects approximately 10-15% of women in the U.S. every year. In this paper, we present the adaption of a digital technology development framework, Digilego, in the context of PPD. Methods include mapping of the Behavior Intervention Technology (BIT) model and the Patient Engagement Framework (PEF) to translate patient needs captured through focus groups. This informs formative development and implementation of digital health features for optimal patient engagement in PPD screening and management. Results show an array ofPPD-specific Digilego …


Enhancing Clinical Concept Extraction With Contextual Embeddings., Yuqi Si, Jingqi Wang, Hua Xu, Kirk Roberts Nov 2019

Enhancing Clinical Concept Extraction With Contextual Embeddings., Yuqi Si, Jingqi Wang, Hua Xu, Kirk Roberts

Journal Articles

OBJECTIVE: Neural network-based representations ("embeddings") have dramatically advanced natural language processing (NLP) tasks, including clinical NLP tasks such as concept extraction. Recently, however, more advanced embedding methods and representations (eg, ELMo, BERT) have further pushed the state of the art in NLP, yet there are no common best practices for how to integrate these representations into clinical tasks. The purpose of this study, then, is to explore the space of possible options in utilizing these new models for clinical concept extraction, including comparing these to traditional word embedding methods (word2vec, GloVe, fastText).

MATERIALS AND METHODS: Both off-the-shelf, open-domain embeddings and …


Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru Oct 2019

Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru

Kentucky Injury Prevention and Research Center Faculty Publications

BACKGROUND: Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance.

METHODS: Using 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created …


Advances In Gene Ontology Utilization Improve Statistical Power Of Annotation Enrichment, Eugene Waverly Hinderer Iii, Robert M. Flight, Rashmi Dubey, James N. Macleod, Hunter N. B. Moseley Aug 2019

Advances In Gene Ontology Utilization Improve Statistical Power Of Annotation Enrichment, Eugene Waverly Hinderer Iii, Robert M. Flight, Rashmi Dubey, James N. Macleod, Hunter N. B. Moseley

Maxwell H. Gluck Equine Research Center Faculty Publications

Gene-annotation enrichment is a common method for utilizing ontology-based annotations in gene and gene-product centric knowledgebases. Effective utilization of these annotations requires inferring semantic linkages by tracing paths through edges in the ontological graph, referred to as relations. However, some relations are semantically problematic with respect to scope, necessitating their omission or modification lest erroneous term mappings occur. To address these issues, we created the Gene Ontology Categorization Suite, or GOcats—a novel tool that organizes the Gene Ontology into subgraphs representing user-defined concepts, while ensuring that all appropriate relations are congruent with respect to scoping semantics. Here, we demonstrate the …


Effects Of A Community Population Health Initiative On Blood Pressure Control In Latinos., James R Langabeer, Timothy D Henry, Carlos Perez Aldana, Larissa Deluna, Nora Silva, Tiffany Champagne-Langabeer Nov 2018

Effects Of A Community Population Health Initiative On Blood Pressure Control In Latinos., James R Langabeer, Timothy D Henry, Carlos Perez Aldana, Larissa Deluna, Nora Silva, Tiffany Champagne-Langabeer

Journal Articles

Background Hypertension remains one of the most important, modifiable cardiovascular risk factors. Yet, the largest minority ethnic group (Hispanics/Latinos) often have different health outcomes and behavior, making hypertension management more difficult. We explored the effects of an American Heart Association-sponsored population health intervention aimed at modifying behavior of Latinos living in Texas. Methods and Results We enrolled 8071 patients, and 5714 (65.7%) completed the 90-day program (58.5 years ±11.7; 59% female) from July 2016 to June 2018. Navigators identified patients with risk factors; initial and final blood pressure ( BP ) readings were performed in the physician's office; and interim …


Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang Feb 2018

Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang

Computer Science Faculty Publications

Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations.

Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor …


A Frame-Based Nlp System For Cancer-Related Information Extraction., Yuqi Si, Kirk Roberts Jan 2018

A Frame-Based Nlp System For Cancer-Related Information Extraction., Yuqi Si, Kirk Roberts

Journal Articles

We propose a frame-based natural language processing (NLP) method that extracts cancer-related information from clinical narratives. We focus on three frames: cancer diagnosis, cancer therapeutic procedure, and tumor description. We utilize a deep learning-based approach, bidirectional Long Short-term Memory (LSTM) Conditional Random Field (CRF), which uses both character and word embeddings. The system consists of two constituent sequence classifiers: a frame identification (lexical unit) classifier and a frame element classifier. The classifier achieves an F


Ordinal Convolutional Neural Networks For Predicting Rdoc Positive Valence Psychiatric Symptom Severity Scores, Anthony Rios, Ramakanth Kavuluru Nov 2017

Ordinal Convolutional Neural Networks For Predicting Rdoc Positive Valence Psychiatric Symptom Severity Scores, Anthony Rios, Ramakanth Kavuluru

Computer Science Faculty Publications

Background—The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task.

Objective—Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are …


Predicting Mental Conditions Based On "History Of Present Illness" In Psychiatric Notes With Deep Neural Networks, Tung Tran, Ramakanth Kavuluru Nov 2017

Predicting Mental Conditions Based On "History Of Present Illness" In Psychiatric Notes With Deep Neural Networks, Tung Tran, Ramakanth Kavuluru

Computer Science Faculty Publications

Background—Applications of natural language processing to mental health notes are not common given the sensitive nature of the associated narratives. The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) changed this scenario by providing the first set of neuropsychiatric notes to participants. This study summarizes our efforts and results in proposing a novel data use case for this dataset as part of the third track in this shared task.

Objective—We explore the feasibility and effectiveness of predicting a set of common mental conditions a patient has based on the short textual description of patient’s history …


Cross-Talk Between Clinical And Host-Response Parameters Of Periodontitis In Smokers, Radha Nagarajan, Craig S. Miller, Dolph R. Dawson Iii, Mohanad Al-Sabbagh, Jeffrey L. Ebersole Jun 2017

Cross-Talk Between Clinical And Host-Response Parameters Of Periodontitis In Smokers, Radha Nagarajan, Craig S. Miller, Dolph R. Dawson Iii, Mohanad Al-Sabbagh, Jeffrey L. Ebersole

Institute for Biomedical Informatics Faculty Publications

Background and Objective

Periodontal diseases are a major public health concern leading to tooth loss and have also been shown to be associated with several chronic systemic diseases. Smoking is a major risk factor for the development of numerous systemic diseases, as well as periodontitis. While it is clear that smokers have a significantly enhanced risk for developing periodontitis leading to tooth loss, the population varies regarding susceptibility to disease associated with smoking. This investigation focused on identifying differences in four broad sets of variables, consisting of: (i) host‐response molecules; (ii) periodontal clinical parameters; (iii) antibody responses to periodontal pathogens …


Perspectives And Expectations In Structural Bioinformatics Of Metalloproteins, Sen Yao, Robert M. Flight, Eric C. Rouchka, Hunter N. B. Moseley May 2017

Perspectives And Expectations In Structural Bioinformatics Of Metalloproteins, Sen Yao, Robert M. Flight, Eric C. Rouchka, Hunter N. B. Moseley

Molecular and Cellular Biochemistry Faculty Publications

Recent papers highlight the presence of large numbers of compressed angles in metal ion coordination geometries for metalloprotein entries in the worldwide Protein Data Bank, due mainly to multidentate coordination. The prevalence of these compressed angles has raised the controversial idea that significantly populated aberrant or even novel coordination geometries may exist. Some of these papers have undergone severe criticism, apparently due to views held that only canonical coordination geometries exist in significant numbers. While criticism of controversial ideas is warranted and to be expected, we believe that a line was crossed where unfair criticism was put forth to discredit …


Discovery And Validation Of Information Theory-Based Transcription Factor And Cofactor Binding Site Motifs., Ruipeng Lu, Eliseos J Mucaki, Peter K Rogan Mar 2017

Discovery And Validation Of Information Theory-Based Transcription Factor And Cofactor Binding Site Motifs., Ruipeng Lu, Eliseos J Mucaki, Peter K Rogan

Biochemistry Publications

Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, …


Integrated Biomarker Profiling Of Smokers With Periodontitis, Radhakrishnan Nagarajan, Mohanad Al-Sabbagh, Dolph Dawson Iii, Jeffrey L. Ebersole Mar 2017

Integrated Biomarker Profiling Of Smokers With Periodontitis, Radhakrishnan Nagarajan, Mohanad Al-Sabbagh, Dolph Dawson Iii, Jeffrey L. Ebersole

Institute for Biomedical Informatics Faculty Publications

Background

In the context of precision medicine, understanding patient‐specific variation is an important step in developing targeted and patient‐tailored treatment regimens for periodontitis. While several studies have successfully demonstrated the usefulness of molecular expression profiling in conjunction with single classifier systems in discerning distinct disease groups, the majority of these studies do not provide sufficient insights into potential variations within the disease groups.

Aim

The goal of this study was to discern biological response profiles of periodontitis and non‐periodontitis smoking subjects using an informed panel of biomarkers across multiple scales (salivary, oral microbiome, pathogens and other markers).

Material & Methods …


Predicting Disease-Related Genes Using Integrated Biomedical Networks, Jiajie Peng, Kun Bai, Xuequn Shang, Guohua Wang, Hansheng Xue, Shuilin Jin, Liang Cheng, Yadong Wang, Jin Chen Jan 2017

Predicting Disease-Related Genes Using Integrated Biomedical Networks, Jiajie Peng, Kun Bai, Xuequn Shang, Guohua Wang, Hansheng Xue, Shuilin Jin, Liang Cheng, Yadong Wang, Jin Chen

Institute for Biomedical Informatics Faculty Publications

Background: Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery.

Results: We propose a new network-based disease …


Fizzy: Feature Subset Selection For Metagenomics., Gregory Ditzler, J Calvin Morrison, Yemin Lan, Gail L Rosen Nov 2015

Fizzy: Feature Subset Selection For Metagenomics., Gregory Ditzler, J Calvin Morrison, Yemin Lan, Gail L Rosen

Henry M. Rowan College of Engineering Faculty Scholarship

BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …


Integrated Assessment Of Predicted Mhc Binding And Cross-Conservation With Self Reveals Patterns Of Viral Camouflage, Lu He, Anne S. De Groot, Andres H. Gutierrez, William D. Martin, Lenny Moise, Chris Bailey-Kellogg Mar 2014

Integrated Assessment Of Predicted Mhc Binding And Cross-Conservation With Self Reveals Patterns Of Viral Camouflage, Lu He, Anne S. De Groot, Andres H. Gutierrez, William D. Martin, Lenny Moise, Chris Bailey-Kellogg

Dartmouth Scholarship

Immune recognition of foreign proteins by T cells hinges on the formation of a ternary complex sandwiching a constituent peptide of the protein between a major histocompatibility complex (MHC) molecule and a T cell receptor (TCR). Viruses have evolved means of "camouflaging" themselves, avoiding immune recognition by reducing the MHC and/or TCR binding of their constituent peptides. Computer-driven T cell epitope mapping tools have been used to evaluate the degree to which articular viruses have used this means of avoiding immune response, but most such analyses focus on MHC-facing ‘agretopes'. Here we set out a new means of evaluating the …


Algorithms For Optimizing Cross-Overs In Dna Shuffling, Lu He, Alan M. Friedman, Chris Bailey-Kellogg Aug 2012

Algorithms For Optimizing Cross-Overs In Dna Shuffling, Lu He, Alan M. Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library.

Results: This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. …


Dna Methylation Arrays As Surrogate Measures Of Cell Mixture Distribution, Eugene Houseman, William P. Accomando, Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit May 2012

Dna Methylation Arrays As Surrogate Measures Of Cell Mixture Distribution, Eugene Houseman, William P. Accomando, Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit

Dartmouth Scholarship

There has been a long-standing need in biomedical research for a method that quantifies the normally mixed composition of leukocytes beyond what is possible by simple histological or flow cytometric assessments. The latter is restricted by the labile nature of protein epitopes, requirements for cell processing, and timely cell analysis. In a diverse array of diseases and following numerous immune-toxic exposures, leukocyte composition will critically inform the underlying immuno-biology to most chronic medical conditions. Emerging research demonstrates that DNA methylation is responsible for cellular differentiation, and when measured in whole peripheral blood, serves to distinguish cancer cases from controls.


Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg Nov 2011

Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.