Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Algorithms (2)
- Bioinformatics (2)
- Cluster analysis (2)
- Genomics (2)
- AdaBoost (1)
-
- Bayesian (1)
- Bayesian inference (1)
- Biology--Mathematical models (1)
- Biomarkers (1)
- Biomathematics (1)
- ChIP-seq data (1)
- Citizen science; conservation planning; participatory management; Pterois volitans; social-ecological systems. (1)
- Count data (1)
- DNA-protein binding sites (1)
- Diffuse (1)
- EQTL Analysis (1)
- Functional Data Analysis (1)
- Functional data analysis (1)
- Gene expression (1)
- Gene expression profiling (1)
- Gene regulatory networks (1)
- Gene-environment interactions; Robustness; Partially linear varying coefficient model; Penalized selection (1)
- Generalized function-on-scalar regression (1)
- Genetic regulation (1)
- Genetics (1)
- Genome-wide association studies (1)
- Germline Mutation (1)
- Graph-based Regularization (1)
- High-dimensional data (1)
- Humans (1)
- Publication
- Publication Type
- File Type
Articles 1 - 15 of 15
Full-Text Articles in Bioinformatics
A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross
A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross
Yale Day of Data
Post-market surveillance is a collection of processes and activities used by product manufacturers and regulators, such as the U.S. Food and Drug Administration (FDA) to monitor the safety and effectiveness of medical devices once they are available for use “on the market”. These activities are designed to generate information to identify poorly performing devices and other safety problems, accurately characterize real-world device performance and clinical outcomes, and facilitate the development of new devices, or new uses for existing devices. Typically, a device is monitored by comparing adverse events in the exposed population to a matched unexposed population. This research considers …
K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein
K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein
Yale Day of Data
The regulation of gene expression involves interaction between transcriptional enhancers and core promoters. However, the separation between developmental and housekeeping gene regulation remains unknown. Here, we present a method to detect if different core promoters exhibit specificity to certain enhancers within massively parallel assays for enhancer detection. We use k-mers of various length (3-8bp) as sequence features and compare k-mer frequencies between developmental and housekeeping enhancers. This method shows promoter specificity of enhancers in D. melanogaster.
A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im
A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im
Bioinformatics Faculty Publications
Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates ‘imputed’ gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys …
Germline Mutation Detection In Next Generation Sequencing Data And Tp53 Mutation Carrier Probability Estimation For Li-Fraumeni Syndrome, Gang Peng
Dissertations & Theses (Open Access)
Next generation sequencing technology has been widely used in genomic analysis, but its application has been compromised by the missing true variants, especially when these variants are rare. We proposed a family-based variant calling method, FamSeq, integrating Mendelian transmission information with de novo mutation and sequencing data to improve the variant calling accuracy. We investigated the factors impacting the improvement of family-based variant calling in simulation data and validated it in real sequencing data. In both simulation and real data, FamSeq works better than the single individual based method.
In FamSeq, we implemented four different methods for the Mendelian genetic …
Computational Modeling Of Rna-Small Molecule And Rna-Protein Interactions, Lu Chen
Computational Modeling Of Rna-Small Molecule And Rna-Protein Interactions, Lu Chen
Dissertations & Theses (Open Access)
The past decade has witnessed an era of RNA biology; despite the considerable discoveries nowadays, challenges still remain when one aims to screen RNA-interacting small molecule or RNA-interacting protein. These challenges imply an immediate need for cost-efficient while predictive computational tools capable of generating insightful hypotheses to discover novel RNA-interacting small molecule or RNA-interacting protein. Thus, we implemented novel computational models in this dissertation to predict RNA-ligand interactions (Chapter 1) and RNA-protein interactions (Chapter 2).
Targeting RNA has not garnered comparable interest as protein, and is restricted by lack of computational tools for structure-based drug design. To test the potential …
Fast, Accurate, And Reliable Molecular Docking With Quickvina 2, Amr Alhossary, Stephanus Daniel Handoko, Yuguang Mu, Chee-Keong Kwoh
Fast, Accurate, And Reliable Molecular Docking With Quickvina 2, Amr Alhossary, Stephanus Daniel Handoko, Yuguang Mu, Chee-Keong Kwoh
Research Collection School Of Computing and Information Systems
Motivation: The need for efficient molecular docking tools for high-throughput screening is growing alongside the rapid growth of drug-fragment databases. AutoDock Vina ('Vina') is a widely used docking tool with parallelization for speed. QuickVina ('QVina 1') then further enhanced the speed via a heuristics, requiring high exhaustiveness. With low exhaustiveness, its accuracy was compromised. We present in this article the latest version of QuickVina ('QVina 2') that inherits both the speed of QVina 1 and the reliability of the original Vina.Results: We tested the efficacy of QVina 2 on the core set of PDBbind 2014. With the default exhaustiveness level …
Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula
Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula
Electronic Theses and Dissertations
Determining the best clustering algorithm and ideal number of clusters for a particular dataset is a fundamental difficulty in unsupervised clustering analysis. In biological research, data generated from Next Generation Sequencing technology and microarray gene expression data are becoming more and more common, so new tools and resources are needed to group such high dimensional data using clustering analysis. Different clustering algorithms can group data very differently. Therefore, there is a need to determine the best groupings in a given dataset using the most suitable clustering algorithm for that data. This paper presents the R package optCluster as an efficient …
Summary Of Survival Analysis With Sas Procedures., Derek Duane Childers 1990-
Summary Of Survival Analysis With Sas Procedures., Derek Duane Childers 1990-
Electronic Theses and Dissertations
The research conducted for this thesis was performed to summarize some of the most commonly used survival analysis techniques as well as to create one macro that will provide the solutions for these techniques. Some of the techniques that this thesis focuses on are survival and hazard functions, mean and median survival times, life table, log rank test, proportional hazards/model building, and competing risk. To further analyze these survival analysis techniques I will use the Bone Marrow Transplantation for Leukemia dataset. This trial consists of either acute myelocytic leukemia (AML 99 patients) or acute lymphoblastic leukemia (ALL 38 patients). There …
Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula
Zero-Inflated Models To Identify Transcription Factor Binding Sites In Chip-Seq Experiments, Sameera Dhananjaya Viswakula
Mathematics & Statistics Theses & Dissertations
It is essential to determine the protein-DNA binding sites to understand many biological processes. A transcription factor is a particular type of protein that binds to DNA and controls gene regulation in living organisms. Chromatin immunoprecipitation followed by highthroughput sequencing (ChIP-seq) is considered the gold standard in locating these binding sites and programs use to identify DNA-transcription factor binding sites are known as peak-callers. ChIP-seq data are known to exhibit considerable background noise and other biases. In this study, we propose a negative binomial model (NB), a zero-inflated Poisson model (ZIP) and a zero-inflated negative binomial model (ZINB) for peak-calling. …
Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore
Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore
Dartmouth Scholarship
Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …
Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull
Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull
Jeffrey S. Morris
Current methods for conducting expression Quantitative Trait Loci (eQTL) analysis are limited in scope to a pairwise association testing between a single nucleotide polymorphism (SNPs) and expression probe set in a region around a gene of interest, thus ignoring the inherent between-SNP correlation. To determine association, p-values are then typically adjusted using Plug-in False Discovery Rate. As many SNPs are interrogated in the region and multiple probe-sets taken, the current approach requires the fitting of a large number of models. We propose to remedy this by introducing a flexible function-on-scalar regression that models the genome as a functional outcome. The …
Graph-Based Regularization In Machine Learning: Discovering Driver Modules In Biological Networks, Xi Gao
Graph-Based Regularization In Machine Learning: Discovering Driver Modules In Biological Networks, Xi Gao
Theses and Dissertations
Curiosity of human nature drives us to explore the origins of what makes each of us different. From ancient legends and mythology, Mendel's law, Punnett square to modern genetic research, we carry on this old but eternal question. Thanks to technological revolution, today's scientists try to answer this question using easily measurable gene expression and other profiling data. However, the exploration can easily get lost in the data of growing volume, dimension, noise and complexity. This dissertation is aimed at developing new machine learning methods that take data from different classes as input, augment them with knowledge of feature relationships, …
The Role Of Citizens In Detecting And Responding To A Rapid Marine Invasion, Steven B. Scyphers, Sean P. Powers, J. Lad Akins, J. Marcus Drymon, Charles W. Martin, Zeb H. Schobernd, Pamela J. Schofield, Robert L. Shipp, Theodore S. Switzer
The Role Of Citizens In Detecting And Responding To A Rapid Marine Invasion, Steven B. Scyphers, Sean P. Powers, J. Lad Akins, J. Marcus Drymon, Charles W. Martin, Zeb H. Schobernd, Pamela J. Schofield, Robert L. Shipp, Theodore S. Switzer
University Faculty and Staff Publications
Documenting and responding to species invasions requires innovative strategies that account for ecological and societal complexities. We used the recent expansion of Indo-Pacific lionfish (Pterois volitans/miles) throughout northern Gulf of Mexico coastal waters to evaluate the role of stakeholders in documenting and responding to a rapid marine invasion. We coupled an online survey of spearfishers and citizen science monitoring programs with traditional fishery-independent data sources and found that citizen observations documented lionfish 1–2 years earlier and more frequently than traditional reef fish monitoring programs. Citizen observations first documented lionfish in 2010 followed by rapid expansion and proliferation in …
Deciphering The Associations Between Gene Expression And Copy Number Alteration Using A Sparse Double Laplacian Shrinkage Approach, Shuangge Ma
Shuangge Ma
Both gene expression levels (GEs) and copy number alterations (CNAs) have important implications in the development of complex diseases. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The expression of a gene can be regulated by multiple CNAs, and one CNA can regulate the expression of multiple genes. In addition, multiple GEs (CNAs) can be correlated with each other. The existing methods for associating GEs with CNAs have limitations in deciphering the complex data structures. In this study, we develop a sparse double Laplacian shrinkage approach. It jointly models the effects of …
A Penalized Robust Semiparametric Approach For Gene-Environment Interactions, Shuangge Ma
A Penalized Robust Semiparametric Approach For Gene-Environment Interactions, Shuangge Ma
Shuangge Ma
In genetic and genomic studies, gene-environment (G*E) interactions have important implications. Some of the existing G$\times$E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G*E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model (PLVCM) is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to …