Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

3,352 Full-Text Articles 5,650 Authors 870,759 Downloads 182 Institutions

All Articles in Bioinformatics

Faceted Search

3,352 full-text articles. Page 88 of 133.

An Incremental Phylogenetic Tree Algorithm Based On Repeated Insertions Of Species, Peter Revesz, Zhiqiang Li 2015 University of Nebraska-Lincoln

An Incremental Phylogenetic Tree Algorithm Based On Repeated Insertions Of Species, Peter Revesz, Zhiqiang Li

CSE Conference and Workshop Papers

In this paper, we introduce a new phylogenetic tree algorithm that generates phylogenetic trees by repeatedly inserting species one-by-one. The incremental phylogenetic tree algorithm can work on proteins or DNA sequences. Computer experiments show that the new algorithm is better than the commonly used UPGMA and Neighbor Joining algorithms.


Dispredict: A Predictor Of Disordered Protein Using Optimized Rbf Kernel, Sumaiya Iqbal, Md Tamjidul Hoque 2015 University of New Orleans

Dispredict: A Predictor Of Disordered Protein Using Optimized Rbf Kernel, Sumaiya Iqbal, Md Tamjidul Hoque

Computer Science Faculty Publications

Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use …


Social Health Signals, Ashutosh Sopan Jadhav, Swapnil Soni, Amit P. Sheth 2015 Wright State University - Main Campus

Social Health Signals, Ashutosh Sopan Jadhav, Swapnil Soni, Amit P. Sheth

Kno.e.sis Publications

Recently Twitter, has emerged as one of the primary medium for sharing and seeking of the latest information related to variety of the topics including health information. Recently, Twitter has emerged as one of the primary mediums for sharing and seeking the latest information related to a variety of topics, including health information. Although Twitter is an excellent information source, identification of useful information from the deluge of tweets is one of the major challenge. Twitter search is limited to keyword based techniques to retrieve information for a given query and sometimes the results do not contain real-time information. Moreover, …


Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed 2015 Western Michigan University

Big Data Proteogenomics And High Performance Computing: Challenges And Opportunities, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage, transfer, analysis and visualization. Integrating these big data sets (NGS+MS) for proteogenomics studies compounds all of the associated computational problems. Existing sequential algorithms for these proteogenomics datasets analysis are inadequate for big data and high performance computing (HPC) solutions are almost non-existent. The purpose of this …


Nhash: Randomized N-Gram Hashing For Distributed Generation Of Validatable Unique Study Identifiers In Multicenter Research, Guo-Qiang Zhang, Shiqiang Tao, Guangming Xing, Jeno Mozes, Bilal Zonjy, Samden D. Lhatoo, Licong Cui 2015 University of Kentucky

Nhash: Randomized N-Gram Hashing For Distributed Generation Of Validatable Unique Study Identifiers In Multicenter Research, Guo-Qiang Zhang, Shiqiang Tao, Guangming Xing, Jeno Mozes, Bilal Zonjy, Samden D. Lhatoo, Licong Cui

Institute for Biomedical Informatics Faculty Publications

BACKGROUND: A unique study identifier serves as a key for linking research data about a study subject without revealing protected health information in the identifier. While sufficient for single-site and limited-scale studies, the use of common unique study identifiers has several drawbacks for large multicenter studies, where thousands of research participants may be recruited from multiple sites. An important property of study identifiers is error tolerance (or validatable), in that inadvertent editing mistakes during their transmission and use will most likely result in invalid study identifiers.

OBJECTIVE: This paper introduces a novel method called "Randomized N-gram Hashing (NHash)," …


Ezdi's Semantics-Enhanced Linguistic, Nlp, And Ml Approach For Health Informatics, Raxit Goswami, Neil Shah, Amit P. Sheth 2015 Wright State University - Main Campus

Ezdi's Semantics-Enhanced Linguistic, Nlp, And Ml Approach For Health Informatics, Raxit Goswami, Neil Shah, Amit P. Sheth

Kno.e.sis Publications

ezDI uses large and extensive knowledge graph to enhance linguistics, NLP and ML techniques to improve structured data extraction from millions of EMR records. It then normalizes it, and maps it with various computer-processable nomenclature such as SNOMED-CT, RxNorm, ICD-9, ICD-10, CPT, and LOINC. Furthermore, it applies advanced reasoning that exploited domain-specific and hierarchical relationships among entities in the knowledge graph to make the data actionable. These capabilities are part of its highly scalable AWS deployed heath intelligence platform that support healthcare informatics applications, including Computer Assisted Coding (CAC), Computerized Document Improvement (CDI), compliance and audit, and core measures and …


Efficient Algorithms For Prokaryotic Whole Genome Assembly And Finishing, Abhishek Biswas 2015 Old Dominion University

Efficient Algorithms For Prokaryotic Whole Genome Assembly And Finishing, Abhishek Biswas

Computer Science Theses & Dissertations

De-novo genome assembly from DNA fragments is primarily based on sequence overlap information. In addition, mate-pair reads or paired-end reads provide linking information for joining gaps and bridging repeat regions. Genome assemblers in general assemble long contiguous sequences (contigs) using both overlapping reads and linked reads until the assembly runs into an ambiguous repeat region. These contigs are further bridged into scaffolds using linked read information. However, errors can be made in both phases of assembly due to high error threshold of overlap acceptance and linking based on too few mate reads. Identical as well as similar repeat regions can …


Evolution Of Mobile Promoters In Prokaryotic Genomes., Mahnaz Rabbani 2015 The University of Western ontario

Evolution Of Mobile Promoters In Prokaryotic Genomes., Mahnaz Rabbani

Electronic Thesis and Dissertation Repository

Mobile genetic elements are important factors in evolution, and greatly influence the structure of genomes, facilitating the development of new adaptive characteristics. The dynamics of these mobile elements can be described using various mathematical and statistical models. In this thesis, we focus on a specific category of mobile genetic elements, i.e. mobile promoters, which are mobile regions of DNA that initiate the transcription of genes. We present a class of mathematical models for the evolution of mobile promoters in prokaryotic genomes, based on data obtained from available sequenced genomes. Our novel location-based model incorporates two biologically meaningful regions of the …


Characterizing Migratory Signaling Pathways Of Transplantable Retinal Progenitor Cells And Photoreceptor Precursor Cells Toward Restoration Of Degenerative Retina ' A Systems Biology Approach, Uchenna John Unachukwu 2015 Graduate Center, City University of New York

Characterizing Migratory Signaling Pathways Of Transplantable Retinal Progenitor Cells And Photoreceptor Precursor Cells Toward Restoration Of Degenerative Retina ' A Systems Biology Approach, Uchenna John Unachukwu

Dissertations, Theses, and Capstone Projects

A common feature of several heterogeneous diseases that result in retinal degeneration (RD) is photoreceptor loss, leading to an irreversible decline in visual function [15-17]. There are no cell replacement treatments available for RD diseases such as age-related macular degeneration (AMD) and retinitis pigmentosa (RP). Although many RD cases are of a genetic origin, a promising strategy to treat diseased phenotypes is by replacing lost photoreceptor cells, for synaptic integration and restoration of visual function. To advance photoreceptor-replacement strategies as a practical therapy, in light of highly restricted integration rates reported across studies, this body of research focused on defining …


Population Genomics Of White-Footed Mice (Peromyscus Leucopus) In New York City, Stephen Edward Harris 2015 Graduate Center, City University of New York

Population Genomics Of White-Footed Mice (Peromyscus Leucopus) In New York City, Stephen Edward Harris

Dissertations, Theses, and Capstone Projects

Urbanization significantly alters natural ecosystems. New York City (NYC) is one of the oldest and most urbanized cities in North America, but still maintains substantial populations of some native wildlife. The white-footed mouse, Peromyscus leucopus, is a common resident of NYC's forest fragments, and isolated populations may adapt in response to novel urban ecosystems. Using pooled transcriptome-wide RNAseq data, individually barcoded transcriptome-wide RNAseq data, and genome-wide RADseq data, I found genetic differentiation between urban and rural P. leucopus populations and evidence suggestive of local adaptation. I compared genome and transcriptome-wide SNP data in P. leucopus from relatively large urban …


A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross 2015 Yale University

A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross

Yale Day of Data

Post-market surveillance is a collection of processes and activities used by product manufacturers and regulators, such as the U.S. Food and Drug Administration (FDA) to monitor the safety and effectiveness of medical devices once they are available for use “on the market”. These activities are designed to generate information to identify poorly performing devices and other safety problems, accurately characterize real-world device performance and clinical outcomes, and facilitate the development of new devices, or new uses for existing devices. Typically, a device is monitored by comparing adverse events in the exposed population to a matched unexposed population. This research considers …


K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein 2015 Yale University

K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein

Yale Day of Data

The regulation of gene expression involves interaction between transcriptional enhancers and core promoters. However, the separation between developmental and housekeeping gene regulation remains unknown. Here, we present a method to detect if different core promoters exhibit specificity to certain enhancers within massively parallel assays for enhancer detection. We use k-mers of various length (3-8bp) as sequence features and compare k-mer frequencies between developmental and housekeeping enhancers. This method shows promoter specificity of enhancers in D. melanogaster.


A Unified Framework For The Prioritization Of Variants Of Uncertain Significance In Hereditary Breast And Ovarian Cancer Patients, Natasha G. Caminsky 2015 Western University

A Unified Framework For The Prioritization Of Variants Of Uncertain Significance In Hereditary Breast And Ovarian Cancer Patients, Natasha G. Caminsky

Electronic Thesis and Dissertation Repository

A significant proportion of hereditary breast and ovarian cancer (HBOC) patients receive uninformative genetic testing results, an issue exacerbated by the overwhelming quantity of variants of uncertain significance identified. This thesis describes a framework where, aside from protein coding changes, information theory (IT)-based sequence analysis identifies and prioritizes pathogenic variants occurring within sequence elements predicted to be recognized by proteins involved in mRNA splicing, transcription, and untranslated region binding and structure. To support the utilization of IT analysis, we established IT-based variant interpretation accuracy by performing a comprehensive review of mutations altering mRNA splicing in rare and common diseases.

Custom …


A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, GTEx Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im 2015 Loyola University Chicago

A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im

Bioinformatics Faculty Publications

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates ‘imputed’ gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys …


Automatic Emotion Identification From Text, Wenbo Wang 2015 Wright State University - Main Campus

Automatic Emotion Identification From Text, Wenbo Wang

Kno.e.sis Publications

Emotions are both prevalent in and essential to most aspects of our lives. They in- fluence our decision-making, affect our social relationships and shape our daily behavior. With the rapid growth of emotion-rich textual content, such as microblog posts, blog posts, and forum discussions, there is a growing need to develop algorithms and techniques for identifying people’s emotions expressed in text. It has valuable implications for the studies of suicide prevention, employee productivity, well-being of people, customer relationship management, etc. However, emotion identification is quite challenging partly due to the following reasons: i) It is a multi-class classification problem that …


Short Peptides In Minimalistic Biocatalyst Design, Krystyna L. Duncan, Rein V. Ulijn 2015 University of Strathclyde

Short Peptides In Minimalistic Biocatalyst Design, Krystyna L. Duncan, Rein V. Ulijn

Publications and Research

We review recent developments in the use of short peptides in the design of minimalistic biocatalysts focusing on ester hydrolysis. A number of designed peptide nanostructures are shown to have (modest) catalytic activity. Five features are discussed and illustrated by literature examples, including primary peptide sequence, nanosurfaces/scaffolds, binding pockets, multivalency and the presence of metal ions. Some of these are derived from natural enzymes, but others, such as multivalency of active sites on designed nanofibers, may give rise to new features not found in natural enzymes. Remarkably, it is shown that each of these design features give rise to similar …


Inferring Plastid Metabolic Pathways Within The Nonphotosynthetic Free-Living Green Algal Genus Polytomella, Sara Asmail 2015 The University of Western Ontario

Inferring Plastid Metabolic Pathways Within The Nonphotosynthetic Free-Living Green Algal Genus Polytomella, Sara Asmail

Electronic Thesis and Dissertation Repository

The advent of photosynthesis facilitated the evolution of aerobic life on Earth. However, species such as Prototheca wickerhamii and Plasmodium falciparum, among many others, have lost photosynthesis and opted for a free-living/parasitic lifestyle. Despite this loss, these species have retained the plastid for its metabolic pathways, without which they would die. Polytomella is a nonphotosynthetic free-living alga, closely related to the photosynthetic model organism Chlamydomonas reinhardtii, and has been shown to lack a plastid genome. I set out to determine Polytomella plastid metabolic pathways using bioinformatics to look for mRNA and DNA homologous sequences matching pathway enzymes in model organisms. …


Found And Lost: The Fates Of Horizontally Acquired Genes In Arthropod-Symbiotic Spiroplasma, Wen-Sui Lo, Gail E. Gasparich, Chih-Horng Kuo 2015 Academia Sinica

Found And Lost: The Fates Of Horizontally Acquired Genes In Arthropod-Symbiotic Spiroplasma, Wen-Sui Lo, Gail E. Gasparich, Chih-Horng Kuo

Gail Gasparich

Horizontal gene transfer (HGT) is an important mechanism that contributed to biological diversity, particularly in bacteria. Through acquisition of novel genes, the recipient cell may change its ecological preference and the process could promote speciation. In this study, we determined the complete genome sequence of two Spiroplasma species for comparative analyses and inferred the putative gene gains and losses. Although most Spiroplasma species are symbionts of terrestrial insects, Spiroplasma eriocheiris has evolved to be a lethal pathogen of freshwater crustaceans. We found that approximately 7% of the genes in this genome may have originated from HGT and these genes expanded …


An Exploration Of The Phylogenetic Placement Of Recently Discovered Ultrasmall Archaeal Lineages, Jeffrey M. O'Brien 2015 University of Connecticut - Storrs

An Exploration Of The Phylogenetic Placement Of Recently Discovered Ultrasmall Archaeal Lineages, Jeffrey M. O'Brien

Honors Scholar Theses

In recent years, several new clades within the domain Achaea have been discovered. This is due in part to microbiological sampling of novel environments, and the increasing ability to detect and sequence uncultivable organisms through metagenomic analysis. These organisms share certain features, such as small cell size and streamlined genomes. Reduction in genome size can present difficulties to phylogenetic reconstruction programs. Since there is less genetic data to work with, these organisms often have missing genes in concatenated multiple sequence alignments. Evolutionary Biologists have not reached a consensus on the placement of these lineages in the archaeal evolutionary tree. There …


Algorithms For Peptide Identification From Mixture Tandem Mass Spectra, Yi Liu 2015 The University of Western Ontario

Algorithms For Peptide Identification From Mixture Tandem Mass Spectra, Yi Liu

Electronic Thesis and Dissertation Repository

The large amount of data collected in an mass spectrometry experiment requires effective computational approaches for the automated analysis of those data. Though extensive research has been conducted for such purpose by the proteomics community, there are still remaining challenges, among which, one particular challenge is that the identification rate of the MS/MS spectra collected is rather low. One significant reason that contributes to this situation is the frequently observed mixture spectra, which result from the concurrent fragmentation of multiple precursors in a single MS/MS spectrum. However, nearly all the mainstream computational methods still take the assumption that the acquired …


Digital Commons powered by bepress