Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 40

Full-Text Articles in Entire DC Network

Structure–Activity Relationship-Based Chemical Classification Of Highly Imbalanced Tox21 Datasets, Gabriel Idakwo, Sundar Thangapandian, Joseph Luttrell, Yan Li, Nan Wang, Zhaoxian Zhou, Huixiao Hong, Bei Yang, Chaoyang Zhang, Ping Gong Dec 2020

Structure–Activity Relationship-Based Chemical Classification Of Highly Imbalanced Tox21 Datasets, Gabriel Idakwo, Sundar Thangapandian, Joseph Luttrell, Yan Li, Nan Wang, Zhaoxian Zhou, Huixiao Hong, Bei Yang, Chaoyang Zhang, Ping Gong

Faculty Publications

The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the …


Entrna: A Framework To Predict Rna Foldability, Congzhe Su, Jeffery D. Weir, Fei Zhang, Hao Yan, Teresa Wu Jul 2019

Entrna: A Framework To Predict Rna Foldability, Congzhe Su, Jeffery D. Weir, Fei Zhang, Hao Yan, Teresa Wu

Faculty Publications

RNA molecules play many crucial roles in living systems. The spatial complexity that exists in RNA structures determines their cellular functions. Therefore, understanding RNA folding conformations, in particular, RNA secondary structures, is critical for elucidating biological functions. Existing literature has focused on RNA design as either an RNA structure prediction problem or an RNA inverse folding problem where free energy has played a key role.


In Silico Identification Of Genetic Mutations Conferring Resistance To Acetohydroxyacid Synthase Inhibitors: A Case Study Of Kochia Scoparia, Yan Li, Michael D. Netherland, Chaoyang Zhang, Huixiao Hong, Ping Gong May 2019

In Silico Identification Of Genetic Mutations Conferring Resistance To Acetohydroxyacid Synthase Inhibitors: A Case Study Of Kochia Scoparia, Yan Li, Michael D. Netherland, Chaoyang Zhang, Huixiao Hong, Ping Gong

Faculty Publications

Mutations that confer herbicide resistance are a primary concern for herbicide-based chemical control of invasive plants and are often under-characterized structurally and functionally. As the outcome of selection pressure, resistance mutations usually result from repeated long-term applications of herbicides with the same mode of action and are discovered through extensive field trials. Here we used acetohydroxyacid synthase (AHAS) of Kochia scoparia (KsAHAS) as an example to demonstrate that, given the sequence of a target protein, the impact of genetic mutations on ligand binding could be evaluated and resistance mutations could be identified using a biophysics-based computational approach. Briefly, …


Predicting Protein Residue-Residue Contacts Using Random Forests And Deep Networks, Joseph Luttrell Iv, Tong Liu, Chaoyang Zhang, Zheng Wang Mar 2019

Predicting Protein Residue-Residue Contacts Using Random Forests And Deep Networks, Joseph Luttrell Iv, Tong Liu, Chaoyang Zhang, Zheng Wang

Faculty Publications

Background: The ability to predict which pairs of amino acid residues in a protein are in contact with each other offers many advantages for various areas of research that focus on proteins. For example, contact prediction can be used to reduce the computational complexity of predicting the structure of proteins and even to help identify functionally important regions of proteins. These predictions are becoming especially important given the relatively low number of experimentally determined protein structures compared to the amount of available protein sequence data.

Results: Here we have developed and benchmarked a set of machine learning methods …


Similarities And Differences Between Variants Called With Human Reference Genome Hg19 Or Hg38, Bohu Pan, Rebecca Kusko, Wenming Xiao, Yuantin Zheng, Zhichao Liu, Chunlin Xiao, Sugunadevi Sakkiah, Wenjing Guo, Ping Gong, Chaoyang Zhang, Weigong Ge, Leming Shi, Weida Tong, Huixiao Hong Mar 2019

Similarities And Differences Between Variants Called With Human Reference Genome Hg19 Or Hg38, Bohu Pan, Rebecca Kusko, Wenming Xiao, Yuantin Zheng, Zhichao Liu, Chunlin Xiao, Sugunadevi Sakkiah, Wenjing Guo, Ping Gong, Chaoyang Zhang, Weigong Ge, Leming Shi, Weida Tong, Huixiao Hong

Faculty Publications

Background: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed.

Results: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and …


Deep Learning Architectures For Multi-Label Classification Of Intelligent Health Risk Prediction, Andrew Maxwell, Runzhi Li, Bei Yang, Heng Weng, Aihua Ou, Huixiao Hong, Zhaoxian Zhou, Ping Gong, Chaoyang Zhang Dec 2017

Deep Learning Architectures For Multi-Label Classification Of Intelligent Health Risk Prediction, Andrew Maxwell, Runzhi Li, Bei Yang, Heng Weng, Aihua Ou, Huixiao Hong, Zhaoxian Zhou, Ping Gong, Chaoyang Zhang

Faculty Publications

No abstract provided.


Proceedings Of The 2014 Midsouth Computational Biology And Bioinformatics Society (Mcbios) Conference, Jonathan D. Wren, Mikhail G. Dozmorov, Dennis Burian, Andy Perkins, Chaoyang Zhang, Peter Hoyt, Rakesh Kaundal Oct 2014

Proceedings Of The 2014 Midsouth Computational Biology And Bioinformatics Society (Mcbios) Conference, Jonathan D. Wren, Mikhail G. Dozmorov, Dennis Burian, Andy Perkins, Chaoyang Zhang, Peter Hoyt, Rakesh Kaundal

Faculty Publications

No abstract provided.


Smoq: A Tool For Predicting The Absolute Residue-Specific Quality Of A Single Protein Model With Support Vector Machine, Renzhi Cao, Zheng Wang, Yiheng Wang, Jianlin Cheng Apr 2014

Smoq: A Tool For Predicting The Absolute Residue-Specific Quality Of A Single Protein Model With Support Vector Machine, Renzhi Cao, Zheng Wang, Yiheng Wang, Jianlin Cheng

Faculty Publications

Background: It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models.

Results: We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to …


A Course-Based Research Experience: How Benefits Change With Increased Investment In Instructional Time, Christopher D. Shaffer, Consuelo J. Alvarez, April E. Bednarski, David Dunbar, Anya L. Goodman, Catherine Reinke, Anne G. Rosenwald, Michael J. Wolyniak, Cheryl Bailey, Daron Barnard, Christopher Bazinet, Dale L. Beach, James E.J. Bedard, Satish Bhalla, John Braverman, Martin Burg, Vidya Chandrasekaran, Hui-Min Chung, Kari Clase, Randall J. Dejong, Justin R. Diangelo, Chunguang Du, Todd T. Eckdahl, Heather Eisler, Julia A. Emerson, Amy Frary, Donald Frohlich, Yuying Gosser, Shubha Govind, Adam Haberman, Amy T. Hark, Charles Hauser, Arlene Hoogewerf, Laura L.M. Hoopes, Carina E. Howell, Diana Johnson, Christopher J. Jones, Lisa Kadlec, Marian Kaehler, S. Catherine Silver Key, Adam Kleinschmit, Nighat P. Kokan, Olga Kopp, Gary Kuleck, Judith Leatherman, Jane Lopilato, Christy Mackinnon, Juan Carlos Martinez-Cruzado, Gerard Mcneil, Stephanie Mel, Hemlata Mistry, Alexis Nagengast, Paul Overvoorde, Don W. Paetkau, Susan Parrish, Celeste N. Peterson, Mary Preuss, Laura K. Reed, Dennis Revie, Srebrenka Robic, Jennifer Roecklein-Canfield, Michael R. Rubin, Kenneth Saville, Stephanie Schroeder, Karim Sharif, Mary Shaw, Gary Skuse, Christopher D. Smith, Mary A. Smith, Sheryl T. Smith, Eric Spana, Mary Spratt, Aparna Sreenivasan, Joyce Stamm, Paul Szauter, Jeffrey S. Thompson, Matthew Wawersik, James Youngblom, Leming Zhou, Elaine R. Mardis, Jeremy Buhler, Wilson Leung, David Lopatto, Sarah C.R. Elgin Jan 2014

A Course-Based Research Experience: How Benefits Change With Increased Investment In Instructional Time, Christopher D. Shaffer, Consuelo J. Alvarez, April E. Bednarski, David Dunbar, Anya L. Goodman, Catherine Reinke, Anne G. Rosenwald, Michael J. Wolyniak, Cheryl Bailey, Daron Barnard, Christopher Bazinet, Dale L. Beach, James E.J. Bedard, Satish Bhalla, John Braverman, Martin Burg, Vidya Chandrasekaran, Hui-Min Chung, Kari Clase, Randall J. Dejong, Justin R. Diangelo, Chunguang Du, Todd T. Eckdahl, Heather Eisler, Julia A. Emerson, Amy Frary, Donald Frohlich, Yuying Gosser, Shubha Govind, Adam Haberman, Amy T. Hark, Charles Hauser, Arlene Hoogewerf, Laura L.M. Hoopes, Carina E. Howell, Diana Johnson, Christopher J. Jones, Lisa Kadlec, Marian Kaehler, S. Catherine Silver Key, Adam Kleinschmit, Nighat P. Kokan, Olga Kopp, Gary Kuleck, Judith Leatherman, Jane Lopilato, Christy Mackinnon, Juan Carlos Martinez-Cruzado, Gerard Mcneil, Stephanie Mel, Hemlata Mistry, Alexis Nagengast, Paul Overvoorde, Don W. Paetkau, Susan Parrish, Celeste N. Peterson, Mary Preuss, Laura K. Reed, Dennis Revie, Srebrenka Robic, Jennifer Roecklein-Canfield, Michael R. Rubin, Kenneth Saville, Stephanie Schroeder, Karim Sharif, Mary Shaw, Gary Skuse, Christopher D. Smith, Mary A. Smith, Sheryl T. Smith, Eric Spana, Mary Spratt, Aparna Sreenivasan, Joyce Stamm, Paul Szauter, Jeffrey S. Thompson, Matthew Wawersik, James Youngblom, Leming Zhou, Elaine R. Mardis, Jeremy Buhler, Wilson Leung, David Lopatto, Sarah C.R. Elgin

Faculty Publications

There is widespread agreement that science, technology, engineering, and mathematics programs should provide undergraduates with research experience. Practical issues and limited resources, however, make this a challenge. We have developed a bioinformatics project that provides a course-based research experience for students at a diverse group of schools and offers the opportunity to tailor this experience to local curriculum and institution-specific student needs. We assessed both attitude and knowledge gains, looking for insights into how students respond given this wide range of curricular and institutional variables. While different approaches all appear to result in learning gains, we find that a significant …


Differential Reconstructed Gene Interaction Networks For Deriving Toxicity Threshold In Chemical Risk Assessment, Yi Yang, Andrew Maxwell, Xiaowei Zhang, Nan Wang, Edward J. Perkins, Chaoyang Zhang, Ping Gong Oct 2013

Differential Reconstructed Gene Interaction Networks For Deriving Toxicity Threshold In Chemical Risk Assessment, Yi Yang, Andrew Maxwell, Xiaowei Zhang, Nan Wang, Edward J. Perkins, Chaoyang Zhang, Ping Gong

Faculty Publications

Background: Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) approach to connect pathway perturbation with toxicity threshold setting.

Methods: Our DNs approach consists of 6 steps: time-series gene expression data collection, identification of altered genes, gene interaction network reconstruction, differential …


Seqnls: Nuclear Localization Signal Prediction Based On Frequent Pattern Mining And Linear Motif Scoring, J.-R. Lin, Jianjun Hu Jan 2013

Seqnls: Nuclear Localization Signal Prediction Based On Frequent Pattern Mining And Linear Motif Scoring, J.-R. Lin, Jianjun Hu

Faculty Publications

Nuclear localization signals (NLSs) are stretches of residues in proteins mediating their importing into the nucleus. NLSs are known to have diverse patterns, of which only a limited number are covered by currently known NLS motifs. Here we propose a sequential pattern mining algorithm SeqNLS to effectively identify potential NLS patterns without being constrained by the limitation of current knowledge of NLSs. The extracted frequent sequential patterns are used to predict NLS candidates which are then filtered by a linear motif-scoring scheme based on predicted sequence disorder and by the relatively local conservation (IRLC) based masking.

The experiment results on …


Mtbindingsim: Simulate Protein Binding To Microtubules, Julia T. Philip, Charles H. Pence, Holly V. Goodson Jan 2012

Mtbindingsim: Simulate Protein Binding To Microtubules, Julia T. Philip, Charles H. Pence, Holly V. Goodson

Faculty Publications

Summary: Many protein–protein interactions are more complex than can be accounted for by 1:1 binding models. However, biochemists have few tools available to help them recognize and predict the behaviors of these more complicated systems, making it difficult to design experiments that distinguish between possible binding models. MTBindingSim provides researchers with an environment in which they can rapidly compare different models of binding for a given scenario. It is written specifically with microtubule polymers in mind, but many of its models apply equally well to any polymer or any protein–protein interaction. MTBindingSim can thus both help in training intuition about …


Minimalist Ensemble Algorithms For Genome-Wide Protein Localization Prediction, J.-R. Lin, A. M. Mondal, R. Liu, Jianjun Hu Jan 2012

Minimalist Ensemble Algorithms For Genome-Wide Protein Localization Prediction, J.-R. Lin, A. M. Mondal, R. Liu, Jianjun Hu

Faculty Publications

Background

Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.

Results

This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature …


Transcriptomic Profiles Of Peripheral White Blood Cells In Type Ii Diabetes And Racial Differences In Expression Profiles, Jinghe Mao, Junmei Ai, Xinchun Zhou, Ming Shenwu, Manuel Ong Jr., Marketta Blue, Jasmine T. Washington, Xiaonan Wang, Youping Deng Dec 2011

Transcriptomic Profiles Of Peripheral White Blood Cells In Type Ii Diabetes And Racial Differences In Expression Profiles, Jinghe Mao, Junmei Ai, Xinchun Zhou, Ming Shenwu, Manuel Ong Jr., Marketta Blue, Jasmine T. Washington, Xiaonan Wang, Youping Deng

Faculty Publications

Background: Along with obesity, physical inactivity, and family history of metabolic disorders, African American ethnicity is a risk factor for type 2 diabetes (T2D) in the United States. However, little is known about the differences in gene expression and transcriptomic profiles of blood in T2D between African Americans (AA) and Caucasians (CAU), and microarray analysis of peripheral white blood cells (WBCs) from these two ethnic groups will facilitate our understanding of the underlying molecular mechanism in T2D and identify genetic biomarkers responsible for the disparities.

Results: A whole human genome oligomicroarray of peripheral WBCs was performed on 144 …


Refnetbuilder: A Platform For Construction Of Integrated Reference Gene Regulatory Networks From Expressed Sequence Tags, Ying Li, Ping Gong, Edward J. Perkins, Chaoyang Zhang, Nan Wang Oct 2011

Refnetbuilder: A Platform For Construction Of Integrated Reference Gene Regulatory Networks From Expressed Sequence Tags, Ying Li, Ping Gong, Edward J. Perkins, Chaoyang Zhang, Nan Wang

Faculty Publications

Background: Gene Regulatory Networks (GRNs) provide integrated views of gene interactions that control biological processes. Many public databases contain biological interactions extracted from experimentally validated literature reports, but most furnish only information for a few genetic model organisms. In order to provide a bioinformatic tool for researchers who work with non-model organisms, we developed RefNetBuilder, a new platform that allows construction of putative reference pathways or GRNs from expressed sequence tags (ESTs).

Results: RefNetBuilder was designed to have the flexibility to extract and archive pathway or GRN information from public databases such as the Kyoto Encyclopedia of Genes …


The Proteogenomic Mapping Tool, William S. Sanders, Nan Wang, Susan M. Bridges, Brandon M. Malone, Yoginder S. Dandass, Fiona M. Mccarthy, Bindu Nanduri, Mark L. Lawrence, Shane C. Burgess Apr 2011

The Proteogenomic Mapping Tool, William S. Sanders, Nan Wang, Susan M. Bridges, Brandon M. Malone, Yoginder S. Dandass, Fiona M. Mccarthy, Bindu Nanduri, Mark L. Lawrence, Shane C. Burgess

Faculty Publications

Background: High-throughput mass spectrometry (MS) proteomics data is increasingly being used to complement traditional structural genome annotation methods. To keep pace with the high speed of experimental data generation and to aid in structural genome annotation, experimentally observed peptides need to be mapped back to their source genome location quickly and exactly. Previously, the tools to do this have been limited to custom scripts designed by individual research groups to analyze their own data, are generally not widely available, and do not scale well with large eukaryotic genomes.

Results: The Proteogenomic Mapping Tool includes a Java implementation of …


Computational Prediction Of Heme-Binding Residues By Exploiting Residue Interaction Network, R. Liu, Jianjun Hu Jan 2011

Computational Prediction Of Heme-Binding Residues By Exploiting Residue Interaction Network, R. Liu, Jianjun Hu

Faculty Publications

Computational identification of heme-binding residues is beneficial for predicting and designing novel heme proteins. Here we proposed a novel method for heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures. Comprehensive analysis showed that key residues located in heme-binding regions are generally associated with the nodes with higher degree, closeness and betweenness, but lower clustering coefficient in the network. HemeNet, a support vector machine (SVM) based predictor, was developed to identify heme-binding residues by combining topological features with existing sequence and structural features. The results showed that incorporation of network-based …


Prediction Of Discontinuous B-Cell Epitopes Using Logistic Regression And Structural Information, R. Liu, Jianjun Hu Jan 2011

Prediction Of Discontinuous B-Cell Epitopes Using Logistic Regression And Structural Information, R. Liu, Jianjun Hu

Faculty Publications

Computational prediction of discontinuous B-cell epitopes remains challenging, but it is an important task in vaccine design. In this study, we developed a novel computational method to predict discontinuous epitope residues by combining the logistic regression model with two important structural features, B-factor and relative accessible surface area (RASA). We conducted five-fold cross-validation on a representative dataset composed of antigen structures bound with antibodies and independent testing on Epitome database, respectively. Experimental results indicate that besides the well-known RASA feature, B-factor can also be used to identify discontinuous epitopes. Furthermore, these two features are complementary and their combination can remarkably …


Hemebind: A Novel Method For Heme Binding Residue Prediction By Combining Structural And Sequence Information, R. Liu, Jianjun Hu Jan 2011

Hemebind: A Novel Method For Heme Binding Residue Prediction By Combining Structural And Sequence Information, R. Liu, Jianjun Hu

Faculty Publications

Background

Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues.

Results

Here …


Quail Genomics: A Knowledgebase For Northern Bobwhite, Arun Rawat, Kurt A. Gust, Mohamed O. Elasri, Edward J. Perkins Oct 2010

Quail Genomics: A Knowledgebase For Northern Bobwhite, Arun Rawat, Kurt A. Gust, Mohamed O. Elasri, Edward J. Perkins

Faculty Publications

Background

The Quail Genomics knowledgebase (http://www.quailgenomics.info) has been initiated to share and develop functional genomic data for Northern bobwhite (Colinus virginianus). This web-based platform has been designed to allow researchers to perform analysis and curate genomic information for this non-model species that has little supporting information in GenBank.

Description

A multi-tissue, normalized cDNA library generated for Northern bobwhite was sequenced using 454 Life Sciences next generation sequencing. The Quail Genomics knowledgebase represents the 478,142 raw ESTs generated from the sequencing effort in addition to assembled nucleotide and protein sequences including 21,980 unigenes annotated with meta-data. A …


Time Lagged Information Theoretic Approaches To The Reverse Engineering Of Gene Regulatory Networks, Vijender Chaitankar, Preetam Ghosh, Edward J. Perkins, Ping Gong, Youping Deng, Chaoyang Zhang Oct 2010

Time Lagged Information Theoretic Approaches To The Reverse Engineering Of Gene Regulatory Networks, Vijender Chaitankar, Preetam Ghosh, Edward J. Perkins, Ping Gong, Youping Deng, Chaoyang Zhang

Faculty Publications

Background: A number of models and algorithms have been proposed in the past for gene regulatory network (GRN) inference; however, none of them address the effects of the size of time-series microarray expression data in terms of the number of time-points. In this paper, we study this problem by analyzing the behaviour of three algorithms based on information theory and dynamic Bayesian network (DBN) models. These algorithms were implemented on different sizes of data generated by synthetic networks. Experiments show that the inference accuracy of these algorithms reaches a saturation point after a specific data size brought about by …


Dynamics Of Protofibril Elongation And Association Involved In Aβ42 Peptide Aggregation In Alzheimer's Disease, Preetam Ghosh, Amit Kumar, Bhaswati Datta, Vijayaraghavan Rangachari Oct 2010

Dynamics Of Protofibril Elongation And Association Involved In Aβ42 Peptide Aggregation In Alzheimer's Disease, Preetam Ghosh, Amit Kumar, Bhaswati Datta, Vijayaraghavan Rangachari

Faculty Publications

Background: The aggregates of a protein called, ‘Aβ’ found in brains of Alzheimer’s patients are strongly believed to be the cause for neuronal death and cognitive decline. Among the different forms of Aβ aggregates, smaller aggregates called ‘soluble oligomers’ are increasingly believed to be the primary neurotoxic species responsible for early synaptic dysfunction. Since it is well known that the Aβ aggregation is a nucleation dependant process, it is widely believed that the toxic oligomers are intermediates to fibril formation, or what we call the ‘on-pathway’ products. Modeling of Aβ aggregation has been of intense investigation during the last …


Incorporating Genomics And Bioinformatics Across The Life Sciences Curriculum, Jayna L. Ditty, Christopher A. Kvaal, Brad Goodner, Sharyn K. Freyermuth, Cheryl Bailey, Robert A. Britton, Stuart G. Gordon, Sabine Heinhorst, Kelyenne Reed, Zhaohui Xu, Erin R. Sanders-Lorenz, Seth Axen, Edwin Kim, Mitrick Johns, Kathleen Scott, Cheryl A. Kerfeld Aug 2010

Incorporating Genomics And Bioinformatics Across The Life Sciences Curriculum, Jayna L. Ditty, Christopher A. Kvaal, Brad Goodner, Sharyn K. Freyermuth, Cheryl Bailey, Robert A. Britton, Stuart G. Gordon, Sabine Heinhorst, Kelyenne Reed, Zhaohui Xu, Erin R. Sanders-Lorenz, Seth Axen, Edwin Kim, Mitrick Johns, Kathleen Scott, Cheryl A. Kerfeld

Faculty Publications

No abstract provided.


Bayesmotif: De Novo Protein Sorting Motif Discovery From Impure Datasets, Jianjun Hu, F. Zhang Jan 2010

Bayesmotif: De Novo Protein Sorting Motif Discovery From Impure Datasets, Jianjun Hu, F. Zhang

Faculty Publications

Background

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms.

Methods

We formulated the protein sorting motif discovery problem as a classification problem …


Feature Selection And Classification Of Maqc-Ii Breast Cancer And Multiple Myeloma Microarray Gene Expression Data, Qingzhong Liu, Andrew H. Sung, Zhongxue Chen, Jianzhong Liu, Xudong Huang, Youping Deng Dec 2009

Feature Selection And Classification Of Maqc-Ii Breast Cancer And Multiple Myeloma Microarray Gene Expression Data, Qingzhong Liu, Andrew H. Sung, Zhongxue Chen, Jianzhong Liu, Xudong Huang, Youping Deng

Faculty Publications

Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort …


Subcellular Localization Of Marine Bacterial Alkaline Phosphatases, H. Luo, Ronald Benner, R. A. Long, Jianjun Hu Jan 2009

Subcellular Localization Of Marine Bacterial Alkaline Phosphatases, H. Luo, Ronald Benner, R. A. Long, Jianjun Hu

Faculty Publications

Bacterial alkaline phosphatases (APases) are important enzymes in organophosphate utilization in the ocean. The subcellular localization of APases has significant ecological implications for marine biota but is largely unknown. The extensive metagenomic sequence databases from the Global Ocean Sampling Expedition provide an opportunity to address this question. A bioinformatics pipeline was developed to identify marine bacterial APases from the metagenomic databases, and a consensus classification algorithm was designed to predict their subcellular localizations. We identified 3,733 bacterial APase sequences (including PhoA, PhoD, and PhoX) and found that cytoplasmic (41%) and extracellular (30%) APases exceed their periplasmic (17%), outer membrane (12%), …


Integrative Disease Classification Based On Cross-Platform Microarray Data, C.-C. Liu, Jianjun Hu, M. Kalakrishnan, H. Huang, X. J. Zhou Jan 2009

Integrative Disease Classification Based On Cross-Platform Microarray Data, C.-C. Liu, Jianjun Hu, M. Kalakrishnan, H. Huang, X. J. Zhou

Faculty Publications

Background

Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification.

Results

In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank …


Novel Implementation Of Conditional Co-Regulation By Graph Theory To Derive Co-Expressed Genes From Microarray Data, Arun Rawat, Georg J. Seifert, Youping Deng Aug 2008

Novel Implementation Of Conditional Co-Regulation By Graph Theory To Derive Co-Expressed Genes From Microarray Data, Arun Rawat, Georg J. Seifert, Youping Deng

Faculty Publications

Background

Most existing transcriptional databases like Comprehensive Systems-Biology Database (CSB.DB) and Arabidopsis Microarray Database and Analysis Toolbox (GENEVESTIGATOR) help to seek a shared biological role (similar pathways and biosynthetic cycles) based on correlation. These utilize conventional methods like Pearson correlation and Spearman rank correlation to calculate correlation among genes. However, not all are genes expressed in all the conditions and this leads to their exclusion in these transcriptional databases that consist of experiments performed in varied conditions. This leads to incomplete studies of co-regulation among groups of genes that might be linked to the same or related biosynthetic pathway.

Results …


Cloning, Analysis And Functional Annotation Of Expressed Sequence Tags From The Earthworm Eisenia Fetida, Mehdi Pirooznia, Ping Gong, Xin Guan, Laura S. Inouye, Kuan Yang, Edward J. Perkins, Youping Deng Nov 2007

Cloning, Analysis And Functional Annotation Of Expressed Sequence Tags From The Earthworm Eisenia Fetida, Mehdi Pirooznia, Ping Gong, Xin Guan, Laura S. Inouye, Kuan Yang, Edward J. Perkins, Youping Deng

Faculty Publications

Background

Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR.

Results

A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone …


Comparison Of Probabilistic Boolean Network And Dynamic Bayesian Network Approaches For Inferring Gene Regulatory Networks, Peng Li, Chaoyang Zhang, Edward J. Perkins, Ping Gong, Youping Deng Nov 2007

Comparison Of Probabilistic Boolean Network And Dynamic Bayesian Network Approaches For Inferring Gene Regulatory Networks, Peng Li, Chaoyang Zhang, Edward J. Perkins, Ping Gong, Youping Deng

Faculty Publications

Background: The regulation of gene expression is achieved through gene regulatory networks (GRNs) in which collections of genes interact with one another and other substances in a cell. In order to understand the underlying function of organisms, it is necessary to study the behavior of genes in a gene regulatory network context. Several computational approaches are available for modeling gene regulatory networks with different datasets. In order to optimize modeling of GRN, these approaches must be compared and evaluated in terms of accuracy and efficiency.

Results: In this paper, two important computational approaches for modeling gene regulatory networks, …