Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Transcriptome (3)
- Algorithm (2)
- Data mining (2)
- Feature selection (2)
- RNA-seq (2)
-
- Algebraic Control (1)
- Algorithms (1)
- Blocking Transitions (1)
- Boolean Networks (1)
- Computational Biology (1)
- Computational biology (1)
- Databases, Genetic (1)
- Deep learning (1)
- Edge Deletions (1)
- Gene Expression Regulation (1)
- Gene expression (1)
- Gene signatures (1)
- Genes (1)
- Genomics (1)
- Humans (1)
- Kernel Density Estimation (1)
- Longitudinal data (1)
- Machine learning (1)
- Network Control (1)
- Neural networks (1)
- Non-parametric statistics (1)
- Outlier Detection (1)
- P-value (1)
- Pathway information (1)
- Phylogenetic trees (1)
- Publication
- Publication Type
Articles 1 - 8 of 8
Full-Text Articles in Physical Sciences and Mathematics
Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang
Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang
Biostatistics Faculty Publications
To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …
Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang
Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang
Biostatistics Faculty Publications
With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) …
Recurrent Neural Networks And Their Applications To Rna Secondary Structure Inference, Devin Willmott
Recurrent Neural Networks And Their Applications To Rna Secondary Structure Inference, Devin Willmott
Theses and Dissertations--Mathematics
Recurrent neural networks (RNNs) are state of the art sequential machine learning tools, but have difficulty learning sequences with long-range dependencies due to the exponential growth or decay of gradients backpropagated through the RNN. Some methods overcome this problem by modifying the standard RNN architecure to force the recurrent weight matrix W to remain orthogonal throughout training. The first half of this thesis presents a novel orthogonal RNN architecture that enforces orthogonality of W by parametrizing with a skew-symmetric matrix via the Cayley transform. We present rules for backpropagation through the Cayley transform, show how to deal with the Cayley …
Bayesian Prediction Intervals For Assessing P-Value Variability In Prospective Replication Studies, Olga A. Vsevolozhskaya, Gabriel Ruiz, Dmitri Zaykin
Bayesian Prediction Intervals For Assessing P-Value Variability In Prospective Replication Studies, Olga A. Vsevolozhskaya, Gabriel Ruiz, Dmitri Zaykin
Biostatistics Faculty Publications
Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. Hypothesis and significance testing, and the accompanying P-values are being scrutinized as representing the most widely applied and abused practices. One line of critique is that P-values are inherently unfit to fulfill their ostensible role as measures of credibility for scientific hypotheses. It has also been suggested that while P-values …
Identification Of Control Targets In Boolean Molecular Network Models Via Computational Algebra, David Murrugarra, Alan Veliz-Cuba, Boris Aguilar, Reinhard Laubenbacher
Identification Of Control Targets In Boolean Molecular Network Models Via Computational Algebra, David Murrugarra, Alan Veliz-Cuba, Boris Aguilar, Reinhard Laubenbacher
Mathematics Faculty Publications
Background: Many problems in biomedicine and other areas of the life sciences can be characterized as control problems, with the goal of finding strategies to change a disease or otherwise undesirable state of a biological system into another, more desirable, state through an intervention, such as a drug or other therapeutic treatment. The identification of such strategies is typically based on a mathematical model of the process to be altered through targeted control inputs. This paper focuses on processes at the molecular level that determine the state of an individual cell, involving signaling or gene regulation. The mathematical model type …
Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang
Novel Computational Methods For Transcript Reconstruction And Quantification Using Rna-Seq Data, Yan Huang
Theses and Dissertations--Computer Science
The advent of RNA-seq technologies provides an unprecedented opportunity to precisely profile the mRNA transcriptome of a specific cell population. It helps reveal the characteristics of the cell under the particular condition such as a disease. It is now possible to discover mRNA transcripts not cataloged in existing database, in addition to assessing the identities and quantities of the known transcripts in a given sample or cell. However, the sequence reads obtained from an RNA-seq experiment is only a short fragment of the original transcript. How to recapitulate the mRNA transcriptome from short RNA-seq reads remains a challenging problem. We …
Statistics In The Billera-Holmes-Vogtmann Treespace, Grady S. Weyenberg
Statistics In The Billera-Holmes-Vogtmann Treespace, Grady S. Weyenberg
Theses and Dissertations--Statistics
This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists.
For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a …
A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu
A Novel Computational Framework For Transcriptome Analysis With Rna-Seq Data, Yin Hu
Theses and Dissertations--Computer Science
The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected …