Open Access. Powered by Scholars. Published by Universities.®

Molecular Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Series

Institution
Keyword
Publication Year
Publication

Articles 1 - 27 of 27

Full-Text Articles in Molecular Biology

Awegnn: Auto-Parametrized Weighted Element-Specific Graph Neural Networks For Molecules., Timothy Szocinski, Duc Duy Nguyen, Guo-Wei Wei Jul 2021

Awegnn: Auto-Parametrized Weighted Element-Specific Graph Neural Networks For Molecules., Timothy Szocinski, Duc Duy Nguyen, Guo-Wei Wei

Mathematics Faculty Publications

While automated feature extraction has had tremendous success in many deep learning algorithms for image analysis and natural language processing, it does not work well for data involving complex internal structures, such as molecules. Data representations via advanced mathematics, including algebraic topology, differential geometry, and graph theory, have demonstrated superiority in a variety of biomolecular applications, however, their performance is often dependent on manual parametrization. This work introduces the auto-parametrized weighted element-specific graph neural network, dubbed AweGNN, to overcome the obstacle of this tedious parametrization process while also being a suitable technique for automated feature extraction on these internally complex …


Algebraic Graph-Assisted Bidirectional Transformers For Molecular Property Prediction, Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, Feng Pan Jun 2021

Algebraic Graph-Assisted Bidirectional Transformers For Molecular Property Prediction, Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, Feng Pan

Mathematics Faculty Publications

The ability of molecular property prediction is of great significance to drug discovery, human health, and environmental protection. Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Although some machine learning models, such as bidirectional encoder from transformer, can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy, it neglects three-dimensional (3D) stereochemical information. Algebraic graph, specifically, element-specific multiscale weighted colored algebraic graph, embeds complementary 3D molecular information into graph invariants. We propose an algebraic graph-assisted bidirectional transformer (AGBT) framework by fusing representations generated by algebraic graph and bidirectional transformer, as well as …


Genome Sequencing Analysis Of Laboratory Isolate Of Francisella Noatunensis Subs. Orientalis, Joseph Paquette Apr 2020

Genome Sequencing Analysis Of Laboratory Isolate Of Francisella Noatunensis Subs. Orientalis, Joseph Paquette

Senior Honors Projects

Francisella noatunensis subs. orientalis is a known fish pathogen that has been most notably isolated from tilapia (Oreochromis niloticus) in Costa Rica. The genome of this Francisella species pathogen has been sequenced using Next-Generation Sequencing and been made available for the scientific community. Dr. Kathryn Ramsey’s research laboratory in the Department of Cell and Molecular Biology at the University of Rhode Island works with several Francisella species pathogens and is interested in identifying the differences, if any, between the known genome sequence of Francisella noatunensis and that of a laboratory isolate of the same species. With the use …


Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano Jan 2020

Outlier Profiles Of Atomic Structures Derived From X-Ray Crystallography And From Cryo-Electron Microscopy, Lin Chen, Jing He, Angelo Facchiano

Computer Science Faculty Publications

Background: As more protein atomic structures are determined from cryo-electron microscopy (cryo-EM) density maps, validation of such structures is an important task. Methods: We applied a histogram-based outlier score (HBOS) to six sets of cryo-EM atomic structures and five sets of X-ray atomic structures, including one derived from X-ray data with better than 1.5 Å resolution. Cryo-EM data sets contain structures released by December 2016 and those released between 2017 and 2019, derived from resolution ranges 0–4 Å and 4–6 Å respectively. Results: The distribution of HBOS values in five sets of X-ray structures show that HBOS is sensitive distinguishing …


Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens Mar 2019

Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens

FIU Electronic Theses and Dissertations

The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to …


Tracing Actin Filament Bundles In Three-Dimensional Electron Tomography Density Maps Of Hair Cell Stereocilia, Salim Sazzed, Junha Song, Julio Kovacs, Willi Wriggers, Manfred Auer, Jing He Apr 2018

Tracing Actin Filament Bundles In Three-Dimensional Electron Tomography Density Maps Of Hair Cell Stereocilia, Salim Sazzed, Junha Song, Julio Kovacs, Willi Wriggers, Manfred Auer, Jing He

Computer Science Faculty Publications

Cryo-electron tomography (cryo-ET) is a powerful method of visualizing the three-dimensional organization of supramolecular complexes, such as the cytoskeleton, in their native cell and tissue contexts. Due to its minimal electron dose and reconstruction artifacts arising from the missing wedge during data collection, cryo-ET typically results in noisy density maps that display anisotropic XY versus Z resolution. Molecular crowding further exacerbates the challenge of automatically detecting supramolecular complexes, such as the actin bundle in hair cell stereocilia. Stereocilia are pivotal to the mechanoelectrical transduction process in inner ear sensory epithelial hair cells. Given the complexity and dense arrangement of actin …


Crosstalk Among Lncrnas, Micrornas And Mrnas In The Muscle ‘Degradome’ Of Rainbow Trout, Bam Paneru, Ali Ali, Rafet Al-Tobasei, Brett Kenney, Mohamed Salem Jan 2018

Crosstalk Among Lncrnas, Micrornas And Mrnas In The Muscle ‘Degradome’ Of Rainbow Trout, Bam Paneru, Ali Ali, Rafet Al-Tobasei, Brett Kenney, Mohamed Salem

Faculty & Staff Scholarship

In fish, protein-coding and noncoding genes involved in muscle atrophy are not fully characterized. In this study, we characterized coding and noncoding genes involved in gonadogenesis-associated muscle atrophy, and investigated the potential functional interplay between these genes. Using RNA- Seq, we compared expression pattern of mRNAs, long noncoding RNAs (lncRNAs) and microRNAs of atrophying skeletal muscle from gravid females and control skeletal muscle from age-matched sterile individuals. A total of 852 mRNAs, 1,160 lncRNAs and 28 microRNAs were differentially expressed (DE) between the two groups. Muscle atrophy appears to be mediated by many genes encoding ubiquitin- proteasome system, autophagy related …


Integrated Analysis Of Lncrna And Mrna Expression In Rainbow Trout Families Showing Variation In Muscle Growth And Fillet Quality Traits, Ali Ali, Rafet Al-Tobasei, Brett Kenney, Timothy D. Leeds, Mohamed Salem Jan 2018

Integrated Analysis Of Lncrna And Mrna Expression In Rainbow Trout Families Showing Variation In Muscle Growth And Fillet Quality Traits, Ali Ali, Rafet Al-Tobasei, Brett Kenney, Timothy D. Leeds, Mohamed Salem

Faculty & Staff Scholarship

Muscle yield and quality traits are important for the aquaculture industry and consumers. Genetic selection for these traits is difficult because they are polygenic and result from multifactorial interactions. To study the genetic architecture of these traits, phenotypic characterization of whole body weight (WBW), muscle yield, fat content, shear force and whiteness were measured in ~500 fish representing 98 families from a growth-selected line. RNA-Seq was used to sequence the muscle transcriptome of different families exhibiting divergent phenotypes for each trait. We have identified 240 and 1,280 differentially expressed (DE) protein-coding genes and long noncoding RNAs (lncRNAs), respectively, in fish …


Testing The Independence Hypothesis Of Accepted Mutations For Pairs Of Adjacent Amino Acids In Protein Sequences, Jyotsna Ramanan, Peter Revesz Jul 2017

Testing The Independence Hypothesis Of Accepted Mutations For Pairs Of Adjacent Amino Acids In Protein Sequences, Jyotsna Ramanan, Peter Revesz

School of Computing: Faculty Publications

Evolutionary studies usually assume that the genetic mutations are independent of each other. However, that does not imply that the observed mutations are independent of each other because it is possible that when a nucleotide is mutated, then it may be biologically beneficial if an adjacent nucleotide mutates too. With a number of decoded genes currently available in various genome libraries and online databases, it is now possible to have a large-scale computer-based study to test whether the independence assumption holds for pairs of adjacent amino acids. Hence the independence question also arises for pairs of adjacent amino acids within …


An Effective Computational Method Incorporating Multiple Secondary Structure Predictions In Topology Determination For Cryo-Em Images, Abhishek Biswas, Desh Ranjan, Mohammad Zubair, Stephanie Zeil, Kamal Al Nasr, Jing He Jan 2017

An Effective Computational Method Incorporating Multiple Secondary Structure Predictions In Topology Determination For Cryo-Em Images, Abhishek Biswas, Desh Ranjan, Mohammad Zubair, Stephanie Zeil, Kamal Al Nasr, Jing He

Computer Science Faculty Publications

A key idea in de novo modeling of a medium-resolution density image obtained from cryo-electron microscopy is to compute the optimal mapping between the secondary structure traces observed in the density image and those predicted on the protein sequence. When secondary structures are not determined precisely, either from the image or from the amino acid sequence of the protein, the computational problem becomes more complex. We present an efficient method that addresses the secondary structure placement problem in presence of multiple secondary structure predictions and computes the optimal mapping. We tested the method using 12 simulated images from alpha-proteins and …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Deep Models For Brain Em Image Segmentation: Novel Insights And Improved Performance, Ahmed Fakhry, Hanchuan Peng, Shuiwang Ji Jan 2016

Deep Models For Brain Em Image Segmentation: Novel Insights And Improved Performance, Ahmed Fakhry, Hanchuan Peng, Shuiwang Ji

Computer Science Faculty Publications

Motivation: Accurate segmentation of brain electron microscopy (EM) images is a critical step in dense circuit reconstruction. Although deep neural networks (DNNs) have been widely used in a number of applications in computer vision, most of these models that proved to be effective on image classification tasks cannot be applied directly to EM image segmentation, due to the different objectives of these tasks. As a result, it is desirable to develop an optimized architecture that uses the full power of DNNs and tailored specifically for EM image segmentation.

Results: In this work, we proposed a novel design of DNNs for …


Mutations Of Adjacent Amino Acid Pairs Are Not Always Independent, Jyotsna Ramanan, Peter Revesz Oct 2015

Mutations Of Adjacent Amino Acid Pairs Are Not Always Independent, Jyotsna Ramanan, Peter Revesz

CSE Conference and Workshop Papers

Evolutionary studies usually assume that the genetic mutations are independent of each other. This paper tests the independence hypothesis for genetic mutations with regard to protein coding regions. According to the new experimental results the independence assumption generally holds, but there are certain exceptions. In particular, the coding regions that represent two adjacent amino acids seem to change in ways that sometimes deviate significantly from the expected theoretical probability under the independence assumption.


Isquest: Finding Insertion Sequences In Prokaryotic Sequence Fragment Data, Abhishek Biswas, David T. Gauthier, Desh Ranjan, Mohammad Zubair Jun 2015

Isquest: Finding Insertion Sequences In Prokaryotic Sequence Fragment Data, Abhishek Biswas, David T. Gauthier, Desh Ranjan, Mohammad Zubair

Computer Science Faculty Publications

Motivation: Insertion sequences (ISs) are transposable elements present in most bacterial and archaeal genomes that play an important role in genomic evolution. The increasing availability of sequenced prokaryotic genomes offers the opportunity to study ISs comprehensively, but development of efficient and accurate tools is required for discovery and annotation. Additionally, prokaryotic genomes are frequently deposited as incomplete, or draft stage because of the substantial cost and effort required to finish genome assembly projects. Development of methods to identify IS directly from raw sequence reads or draft genomes are therefore desirable. Software tools such as Optimized Annotation System for Insertion Sequences …


Tracing Beta Strands Using Strandtwister From Cryo-Em Density Maps At Medium Resolutions, Dong Si, Jing He Jan 2014

Tracing Beta Strands Using Strandtwister From Cryo-Em Density Maps At Medium Resolutions, Dong Si, Jing He

Computer Science Faculty Publications

Major secondary structure elements such as α helices and β sheets can be computationally detected from cryoelectron microscopy (cryo-EM) density maps with medium resolutions of 5–10 A˚ . However, a critical piece of information for modeling atomic structures is missing, because there are no tools to detect β strands from cryo-EM maps at medium resolutions. We propose a method, StrandTwister, to detect the traces of β strands through the analysis of twist, an intrinsic nature of a β sheet. StrandTwister has been tested using 100 β sheets simulated at 10 A˚ resolution and 39 β sheets computationally detected from cryo-EM …


Disulfide By Design 2.0: A Web-Based Tool For Disulfide Engineering In Proteins, Douglas B. Craig, Alan A. Dombkowski Jan 2013

Disulfide By Design 2.0: A Web-Based Tool For Disulfide Engineering In Proteins, Douglas B. Craig, Alan A. Dombkowski

Wayne State University Associated BioMed Central Scholarship

Abstract

Background

Disulfide engineering is an important biotechnological tool that has advanced a wide range of research. The introduction of novel disulfide bonds into proteins has been used extensively to improve protein stability, modify functional characteristics, and to assist in the study of protein dynamics. Successful use of this technology is greatly enhanced by software that can predict pairs of residues that will likely form a disulfide bond if mutated to cysteines.

Results

We had previously developed and distributed software for this purpose: Disulfide by Design (DbD). The original DbD program has been widely used; however, it has a number …


Secondary Structure, A Missing Component Of Sequence- Based Minimotif Definitions, David P. Sargeant, Michael R. Gryk, Mark W. Maciejewsk, Vishal Thapar, Vamsi Kundeti, Sanguthevar Rajasekaran, Pedro Romero, Keith Dunker, Shun-Cheng Li, Tomonori Kaneko, Martin Schiller Dec 2012

Secondary Structure, A Missing Component Of Sequence- Based Minimotif Definitions, David P. Sargeant, Michael R. Gryk, Mark W. Maciejewsk, Vishal Thapar, Vamsi Kundeti, Sanguthevar Rajasekaran, Pedro Romero, Keith Dunker, Shun-Cheng Li, Tomonori Kaneko, Martin Schiller

Life Sciences Faculty Research

Minimotifs are short contiguous segments of proteins that have a known biological function. The hundreds of thousands of minimotifs discovered thus far are an important part of the theoretical understanding of the specificity of protein-protein interactions, posttranslational modifications, and signal transduction that occur in cells. However, a longstanding problem is that the different abstractions of the sequence definitions do not accurately capture the specificity, despite decades of effort by many labs. We present evidence that structure is an essential component of minimotif specificity, yet is not used in minimotif definitions. Our analysis of several known minimotifs as case studies, analysis …


Achieving High Accuracy Prediction Of Minimotifs, Tian Mi, Sanguthevar Rajasekaran, Jerlin Camilus Merlin, Michael R. Gryk, Martin Schiller Sep 2012

Achieving High Accuracy Prediction Of Minimotifs, Tian Mi, Sanguthevar Rajasekaran, Jerlin Camilus Merlin, Michael R. Gryk, Martin Schiller

Life Sciences Faculty Research

The low complexity of minimotif patterns results in a high false-positive prediction rate, hampering protein function prediction. A multi-filter algorithm, trained and tested on a linear regression model, support vector machine model, and neural network model, using a large dataset of verified minimotifs, vastly improves minimotif prediction accuracy while generating few false positives. An optimal threshold for the best accuracy reaches an overall accuracy above 90%, while a stringent threshold for the best specificity generates less than 1% false positives or even no false positives and still produces more than 90% true positives for the linear regression and neural network …


Scireader Enables Reading Of Medical Content With Instantaneous Definitions, Patrick R. Gradie, Megan Litster, Rinu Thomas, Jay Vyas, Martin Schiller Jan 2011

Scireader Enables Reading Of Medical Content With Instantaneous Definitions, Patrick R. Gradie, Megan Litster, Rinu Thomas, Jay Vyas, Martin Schiller

Life Sciences Faculty Research

Background

A major problem patients encounter when reading about health related issues is document interpretation, which limits reading comprehension and therefore negatively impacts health care. Currently, searching for medical definitions from an external source is time consuming, distracting, and negatively impacts reading comprehension and memory of the material.

Methods

SciReader was built as a Java application with a Flex-based front-end client. The dictionary used bySciReader was built by consolidating data from several sources and generating new definitions with a standardized syntax. The application was evaluated by measuring the percentage of words defined in different documents. A survey was used …


Estimation Of Alternative Splicing Isoform Frequencies From Rna-Seq Data, Marius Nicolae, Serghei Mangul, Ion I. Măndoiu, Alexander Zelikovskiy Jan 2011

Estimation Of Alternative Splicing Isoform Frequencies From Rna-Seq Data, Marius Nicolae, Serghei Mangul, Ion I. Măndoiu, Alexander Zelikovskiy

Computer Science Faculty Publications

Background: Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.

Results: In this paper we present a novel expectation-maximization algorithm for inference of isoform- and genespecific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information …


Computational Network Analysis Of The Anatomical And Genetic Organizations In The Mouse Brain, Shuiwang Ji Jan 2011

Computational Network Analysis Of The Anatomical And Genetic Organizations In The Mouse Brain, Shuiwang Ji

Computer Science Faculty Publications

Motivation: The mammalian central nervous system (CNS) generates high-level behavior and cognitive functions. Elucidating the anatomical and genetic organizations in the CNS is a key step toward understanding the functional brain circuitry. The CNS contains an enormous number of cell types, each with unique gene expression patterns. Therefore, it is of central importance to capture the spatial expression patterns in the brain. Currently, genome-wide atlas of spatial expression patterns in the mouse brain has been made available, and the data are in the form of aligned 3D data arrays. The sheer volume and complexity of these data pose significant challenges …


Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller Apr 2010

Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller

Life Sciences Faculty Research

Background

Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive predictions.

Methodology/Principal Findings

Certain domains and minimotifs are known to be strongly associated with a known cellular process or molecular function. Therefore, we hypothesized that by restricting minimotif predictions to those where the minimotif containing protein and target protein have …


Venn, A Tool For Titrating Sequence Conservation Onto Protein Structures, Jay Vyas, Michael R. Gryk, Martin R. Schiller Oct 2009

Venn, A Tool For Titrating Sequence Conservation Onto Protein Structures, Jay Vyas, Michael R. Gryk, Martin R. Schiller

Life Sciences Faculty Research

Residue conservation is an important, established method for inferring protein function, modularity and specificity. It is important to recognize that it is the 3D spatial orientation of residues that drives sequence conservation. Considering this, we have built a new computational tool, VENN that allows researchers to interactively and graphically titrate sequence homology onto surface representations of protein structures. Our proposed titration strategies reveal critical details that are not readily identified using other existing tools. Analyses of a bZIP transcription factor and receptor recognition of Fibroblast Growth Factor using VENN revealed key specificity determinants. Weblink: http://sbtools.uchc.edu/venn/.


A Proposed Syntax For Minimotif Semantics, Version 1., Jay Vyas, Ronald J. Nowling, Mark W. Maciejewski, Sanguthevar Rajasekaran, Michael R. Gryk, Martin R. Schiller Aug 2009

A Proposed Syntax For Minimotif Semantics, Version 1., Jay Vyas, Ronald J. Nowling, Mark W. Maciejewski, Sanguthevar Rajasekaran, Michael R. Gryk, Martin R. Schiller

Life Sciences Faculty Research

BACKGROUND:

One of the most important developments in bioinformatics over the past few decades has been the observation that short linear peptide sequences (minimotifs) mediate many classes of cellular functions such as protein-protein interactions, molecular trafficking and post-translational modifications. As both the creators and curators of a database which catalogues minimotifs, Minimotif Miner, the authors have a unique perspective on the commonalities of the many functional roles of minimotifs. There is an obvious usefulness in standardizing functional annotations both in allowing for the facile exchange of data between various bioinformatics resources, as well as the internal clustering of sets of …


Minimotif Miner 2nd Release: A Database And Web System For Motif Search, Sanguthevar Rajasekaran, Sudha Balla, Patrick R. Gradie, Michael R. Gryk, Krishna Kadaveru, Vamsi Kundeti, Mark W. Maciejewski, Tian Mi, Nicholas Rubino, Jay Vyas, Martin R. Schiller Jan 2009

Minimotif Miner 2nd Release: A Database And Web System For Motif Search, Sanguthevar Rajasekaran, Sudha Balla, Patrick R. Gradie, Michael R. Gryk, Krishna Kadaveru, Vamsi Kundeti, Mark W. Maciejewski, Tian Mi, Nicholas Rubino, Jay Vyas, Martin R. Schiller

Life Sciences Faculty Research

Minimotif Miner (MnM) consists of a minimotif database and a web-based application that enables prediction of motif-based functions in user-supplied protein queries. We have revised MnM by expanding the database more than 10-fold to approximately 5000 motifs and standardized the motif function definitions. The web-application user interface has been redeveloped with new features including improved navigation, screencast-driven help, support for alias names and expanded SNP analysis. A sample analysis of prion shows how MnM 2 can be used.


The Cell Cycle–Regulated Genes Of Schizosaccharomyces Pombe, Anna Oliva, Adan Rosebrock, Francisco Ferrezuelo, Haiying Chen, Saumyadipta Pyne, Steve Skiena, Bruce Futcher, Janet Leatherwood Jun 2005

The Cell Cycle–Regulated Genes Of Schizosaccharomyces Pombe, Anna Oliva, Adan Rosebrock, Francisco Ferrezuelo, Haiying Chen, Saumyadipta Pyne, Steve Skiena, Bruce Futcher, Janet Leatherwood

Department of Molecular Genetics and Microbiology Faculty Publications

Many genes are regulated as an innate part of the eukaryotic cell cycle, and a complex transcriptional network helps enable the cyclic behavior of dividing cells. This transcriptional network has been studied in Saccharomyces cerevisiae (budding yeast) and elsewhere. To provide more perspective on these regulatory mechanisms, we have used microarrays to measure gene expression through the cell cycle of Schizosaccharomyces pombe (fission yeast). The 750 genes with the most significant oscillations were identified and analyzed. There were two broad waves of cell cycle transcription, one in early/mid G2 phase, and the other near the G2/M transition. The early/mid G2 …


Protocols For Disease Classification From Mass Spectrometry Data, Michael Wagner, Dayanand Naik, Alex Pothen Jan 2003

Protocols For Disease Classification From Mass Spectrometry Data, Michael Wagner, Dayanand Naik, Alex Pothen

Mathematics & Statistics Faculty Publications

We report our results in classifying protein matrix-assisted laser desorption/ionizationtime of flight mass spectra obtained from serum samples into diseased and healthy groups. We discuss in detail five of the steps in preprocessing the mass spectral data for biomarker discovery, as well as our criterion for choosing a small set of peaks for classifying the samples. Cross-validation studies with four selected proteins yielded misclassification rates in the 10-15% range for all the classification methods. Three of these proteins or protein fragments are down-regulated and one up-regulated in lung cancer, the disease under consideration in this data set. When cross-validation studies …