Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

2015

Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 30 of 76

Full-Text Articles in Physical Sciences and Mathematics

Apply Data Clustering To Gene Expression Data, Abdullah Jameel Abualhamayl Mr. Dec 2015

Apply Data Clustering To Gene Expression Data, Abdullah Jameel Abualhamayl Mr.

Electronic Theses, Projects, and Dissertations

Data clustering plays an important role in effective analysis of gene expression. Although DNA microarray technology facilitates expression monitoring, several challenges arise when dealing with gene expression datasets. Some of these challenges are the enormous number of genes, the dimensionality of the data, and the change of data over time. The genetic groups which are biologically interlinked can be identified through clustering. This project aims to clarify the steps to apply clustering analysis of genes involved in a published dataset. The methodology for this project includes the selection of the dataset representation, the selection of gene datasets, Similarity Matrix Selection, …


Intent Classification Of Short-Text On Social Media, Hemant Purohit, Guozhu Dong, Valerie L. Shalin, Krishnaprasad Thirunarayan, Amit P. Sheth Dec 2015

Intent Classification Of Short-Text On Social Media, Hemant Purohit, Guozhu Dong, Valerie L. Shalin, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

Social media platforms facilitate the emergence of citizen communities that discuss real-world events. Their content reflects a variety of intent ranging from social good (e.g., volunteering to help) to commercial interest (e.g., criticizing product features). Hence, mining intent from social data can aid in filtering social media to support organizations, such as an emergency management unit for resource planning. However, effective intent mining is inherently challenging due to ambiguity in interpretation, and sparsity of relevant behaviors in social data. In this paper, we address the problem of multiclass classification of intent with a use-case of social data generated during crisis …


Rare Occurrences Of Free-Living Bacteria Belonging To Sedimenticola From Subtidal Seagrass Beds Associated With The Lucinid Clam, Stewartia Floridana, Aaron M. Goemann Dec 2015

Rare Occurrences Of Free-Living Bacteria Belonging To Sedimenticola From Subtidal Seagrass Beds Associated With The Lucinid Clam, Stewartia Floridana, Aaron M. Goemann

Masters Theses

Lucinid clams and their sulfur-oxidizing endosymbionts comprise two compartments of a three-stage, biogeochemical relationship among the clams, seagrasses, and microbial communities in marine sediments. A population of the lucinid clam, Stewartia floridana, was sampled from a subtidal seagrass bed at Bokeelia Island Seaport in Florida to test the hypotheses: (1) S. floridana, like other lucinids, are more abundant in seagrass beds than bare sediments; (2) S. floridana gill microbiomes are dominated by one bacterial operational taxonomic unit (OTU) at a sequence similarity threshold level of 97% (a common cutoff for species level taxonomy) from 16S rRNA genes; …


Computational Methods For Biomarker Identification In Complex Disease, Amin Ahmadi Adl Nov 2015

Computational Methods For Biomarker Identification In Complex Disease, Amin Ahmadi Adl

USF Tampa Graduate Theses and Dissertations

In a modern systematic view of biology, cell functions arise from the interaction between molecular components. One of the challenging problems in systems biology with high-throughput measurements is discovering the important components involved in the development and progression of complex diseases, which may serve as biomarkers for accurate predictive modeling and as targets for therapeutic purposes. Due to the non-linearity and heterogeneity of these complex diseases, traditional biomarker identification approaches have had limited success at finding clinically useful biomarkers. In this dissertation we propose novel methods for biomarker identification that explicitly take into account the non-linearity and heterogeneity of complex …


Robust Modeling And Predictions Of Greenhouse Gas Fluxes From Forest And Wetland Ecosystems, Khandker S. Ishtiaq Nov 2015

Robust Modeling And Predictions Of Greenhouse Gas Fluxes From Forest And Wetland Ecosystems, Khandker S. Ishtiaq

FIU Electronic Theses and Dissertations

The land-atmospheric exchanges of carbon dioxide (CO2) and methane (CH4) are major drivers of global warming and climatic changes. The greenhouse gas (GHG) fluxes indicate the dynamics and potential storage of carbon in terrestrial and wetland ecosystems. Appropriate modeling and prediction tools can provide a quantitative understanding and valuable insights into the ecosystem carbon dynamics, while aiding the development of engineering and management strategies to limit emissions of GHGs and enhance carbon sequestration. This dissertation focuses on the development of data-analytics tools and engineering models by employing a range of empirical and semi-mechanistic approaches to robustly …


Mutations Of Adjacent Amino Acid Pairs Are Not Always Independent, Jyotsna Ramanan, Peter Revesz Oct 2015

Mutations Of Adjacent Amino Acid Pairs Are Not Always Independent, Jyotsna Ramanan, Peter Revesz

CSE Conference and Workshop Papers

Evolutionary studies usually assume that the genetic mutations are independent of each other. This paper tests the independence hypothesis for genetic mutations with regard to protein coding regions. According to the new experimental results the independence assumption generally holds, but there are certain exceptions. In particular, the coding regions that represent two adjacent amino acids seem to change in ways that sometimes deviate significantly from the expected theoretical probability under the independence assumption.


Implicit Information Extraction From Clinical Notes, Sujan Perera Oct 2015

Implicit Information Extraction From Clinical Notes, Sujan Perera

Kno.e.sis Publications

We address the problem of extracting implicit information from the unstructured clinical notes. Here we introduce the problem of 'implicit entity recognition in clinical notes', propose a knowledge driven approach to address this problem and demonstrate the results of our initial experiments.


Feedback-Driven Radiology Exam Report Retrieval With Semantics, Sarasi Lalithsena, Luis Tari, Anna Von Reden, Benjamin Wilson, Brian J. Kolowitz, John Kalafut, Steven Gustafson, Amit P. Sheth Oct 2015

Feedback-Driven Radiology Exam Report Retrieval With Semantics, Sarasi Lalithsena, Luis Tari, Anna Von Reden, Benjamin Wilson, Brian J. Kolowitz, John Kalafut, Steven Gustafson, Amit P. Sheth

Kno.e.sis Publications

Clinical documents are vital resources for radiologists to have a better understanding of patient history. The use of clinical documents can complement the often brief reasons for exams that are provided by physicians in order to perform more informed diagnoses. With the large number of study exams that radiologists have to perform on a daily basis, it becomes too time-consuming for radiologists to sift through each patient's clinical documents. It is therefore important to provide a capability that can present contextually relevant clinical documents, and at the same time satisfy the diverse information needs among radiologists from different specialties. In …


An Incremental Phylogenetic Tree Algorithm Based On Repeated Insertions Of Species, Peter Revesz, Zhiqiang Li Oct 2015

An Incremental Phylogenetic Tree Algorithm Based On Repeated Insertions Of Species, Peter Revesz, Zhiqiang Li

CSE Conference and Workshop Papers

In this paper, we introduce a new phylogenetic tree algorithm that generates phylogenetic trees by repeatedly inserting species one-by-one. The incremental phylogenetic tree algorithm can work on proteins or DNA sequences. Computer experiments show that the new algorithm is better than the commonly used UPGMA and Neighbor Joining algorithms.


Social Health Signals, Ashutosh Sopan Jadhav, Swapnil Soni, Amit P. Sheth Oct 2015

Social Health Signals, Ashutosh Sopan Jadhav, Swapnil Soni, Amit P. Sheth

Kno.e.sis Publications

Recently Twitter, has emerged as one of the primary medium for sharing and seeking of the latest information related to variety of the topics including health information. Recently, Twitter has emerged as one of the primary mediums for sharing and seeking the latest information related to a variety of topics, including health information. Although Twitter is an excellent information source, identification of useful information from the deluge of tweets is one of the major challenge. Twitter search is limited to keyword based techniques to retrieve information for a given query and sometimes the results do not contain real-time information. Moreover, …


Ezdi's Semantics-Enhanced Linguistic, Nlp, And Ml Approach For Health Informatics, Raxit Goswami, Neil Shah, Amit P. Sheth Oct 2015

Ezdi's Semantics-Enhanced Linguistic, Nlp, And Ml Approach For Health Informatics, Raxit Goswami, Neil Shah, Amit P. Sheth

Kno.e.sis Publications

ezDI uses large and extensive knowledge graph to enhance linguistics, NLP and ML techniques to improve structured data extraction from millions of EMR records. It then normalizes it, and maps it with various computer-processable nomenclature such as SNOMED-CT, RxNorm, ICD-9, ICD-10, CPT, and LOINC. Furthermore, it applies advanced reasoning that exploited domain-specific and hierarchical relationships among entities in the knowledge graph to make the data actionable. These capabilities are part of its highly scalable AWS deployed heath intelligence platform that support healthcare informatics applications, including Computer Assisted Coding (CAC), Computerized Document Improvement (CDI), compliance and audit, and core measures and …


Evolution Of Mobile Promoters In Prokaryotic Genomes., Mahnaz Rabbani Oct 2015

Evolution Of Mobile Promoters In Prokaryotic Genomes., Mahnaz Rabbani

Electronic Thesis and Dissertation Repository

Mobile genetic elements are important factors in evolution, and greatly influence the structure of genomes, facilitating the development of new adaptive characteristics. The dynamics of these mobile elements can be described using various mathematical and statistical models. In this thesis, we focus on a specific category of mobile genetic elements, i.e. mobile promoters, which are mobile regions of DNA that initiate the transcription of genes. We present a class of mathematical models for the evolution of mobile promoters in prokaryotic genomes, based on data obtained from available sequenced genomes. Our novel location-based model incorporates two biologically meaningful regions of the …


Efficient Algorithms For Prokaryotic Whole Genome Assembly And Finishing, Abhishek Biswas Oct 2015

Efficient Algorithms For Prokaryotic Whole Genome Assembly And Finishing, Abhishek Biswas

Computer Science Theses & Dissertations

De-novo genome assembly from DNA fragments is primarily based on sequence overlap information. In addition, mate-pair reads or paired-end reads provide linking information for joining gaps and bridging repeat regions. Genome assemblers in general assemble long contiguous sequences (contigs) using both overlapping reads and linked reads until the assembly runs into an ambiguous repeat region. These contigs are further bridged into scaffolds using linked read information. However, errors can be made in both phases of assembly due to high error threshold of overlap acceptance and linking based on too few mate reads. Identical as well as similar repeat regions can …


A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross Sep 2015

A Machine Learning Approach To Post-Market Surveillance Of Medical Devices, Jonathan Bates, Shu-Xia Li, Craig Parzynski, Ronald Coifman, Harlan Krumholz, Joseph Ross

Yale Day of Data

Post-market surveillance is a collection of processes and activities used by product manufacturers and regulators, such as the U.S. Food and Drug Administration (FDA) to monitor the safety and effectiveness of medical devices once they are available for use “on the market”. These activities are designed to generate information to identify poorly performing devices and other safety problems, accurately characterize real-world device performance and clinical outcomes, and facilitate the development of new devices, or new uses for existing devices. Typically, a device is monitored by comparing adverse events in the exposed population to a matched unexposed population. This research considers …


K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein Sep 2015

K-Mer Analysis On Developmental And Housekeeping Enhancer Peaks, Yunsi Yang, Anurag Sethi, Mark Gerstein

Yale Day of Data

The regulation of gene expression involves interaction between transcriptional enhancers and core promoters. However, the separation between developmental and housekeeping gene regulation remains unknown. Here, we present a method to detect if different core promoters exhibit specificity to certain enhancers within massively parallel assays for enhancer detection. We use k-mers of various length (3-8bp) as sequence features and compare k-mer frequencies between developmental and housekeeping enhancers. This method shows promoter specificity of enhancers in D. melanogaster.


A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im Sep 2015

A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Gtex Consortium, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im

Bioinformatics Faculty Publications

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates ‘imputed’ gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys …


Automatic Emotion Identification From Text, Wenbo Wang Sep 2015

Automatic Emotion Identification From Text, Wenbo Wang

Kno.e.sis Publications

Emotions are both prevalent in and essential to most aspects of our lives. They in- fluence our decision-making, affect our social relationships and shape our daily behavior. With the rapid growth of emotion-rich textual content, such as microblog posts, blog posts, and forum discussions, there is a growing need to develop algorithms and techniques for identifying people’s emotions expressed in text. It has valuable implications for the studies of suicide prevention, employee productivity, well-being of people, customer relationship management, etc. However, emotion identification is quite challenging partly due to the following reasons: i) It is a multi-class classification problem that …


Algorithms For Peptide Identification From Mixture Tandem Mass Spectra, Yi Liu Aug 2015

Algorithms For Peptide Identification From Mixture Tandem Mass Spectra, Yi Liu

Electronic Thesis and Dissertation Repository

The large amount of data collected in an mass spectrometry experiment requires effective computational approaches for the automated analysis of those data. Though extensive research has been conducted for such purpose by the proteomics community, there are still remaining challenges, among which, one particular challenge is that the identification rate of the MS/MS spectra collected is rather low. One significant reason that contributes to this situation is the frequently observed mixture spectra, which result from the concurrent fragmentation of multiple precursors in a single MS/MS spectrum. However, nearly all the mainstream computational methods still take the assumption that the acquired …


Improving The Computer Science In Bioinformatics Through Open Source Pedagogy, John David N. Dionisio, Kam D. Dahlquist Aug 2015

Improving The Computer Science In Bioinformatics Through Open Source Pedagogy, John David N. Dionisio, Kam D. Dahlquist

John David N. Dionisio

Bioinformatics relies more than ever on information technologies. This pressures scientists to keep up with software development best practices. However, traditional computer science curricula do not necessarily expose students to collaborative and long-lived software development. Using open source principles, practices, and tools forms an effective pedagogy for software development best practices. This paper reports on a bioinformatics teaching framework implemented through courses introducing computer science students to the field. The courses led to an initial product release consisting of software and an Escherichia coli K12 GenMAPP Gene Database, within a total "incubation time" of six months.


Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung Aug 2015

Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung

Theses and Dissertations

Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to …


Computational Modeling Of Rna-Small Molecule And Rna-Protein Interactions, Lu Chen Aug 2015

Computational Modeling Of Rna-Small Molecule And Rna-Protein Interactions, Lu Chen

Dissertations & Theses (Open Access)

The past decade has witnessed an era of RNA biology; despite the considerable discoveries nowadays, challenges still remain when one aims to screen RNA-interacting small molecule or RNA-interacting protein. These challenges imply an immediate need for cost-efficient while predictive computational tools capable of generating insightful hypotheses to discover novel RNA-interacting small molecule or RNA-interacting protein. Thus, we implemented novel computational models in this dissertation to predict RNA-ligand interactions (Chapter 1) and RNA-protein interactions (Chapter 2).

Targeting RNA has not garnered comparable interest as protein, and is restricted by lack of computational tools for structure-based drug design. To test the potential …


Germline Mutation Detection In Next Generation Sequencing Data And Tp53 Mutation Carrier Probability Estimation For Li-Fraumeni Syndrome, Gang Peng Aug 2015

Germline Mutation Detection In Next Generation Sequencing Data And Tp53 Mutation Carrier Probability Estimation For Li-Fraumeni Syndrome, Gang Peng

Dissertations & Theses (Open Access)

Next generation sequencing technology has been widely used in genomic analysis, but its application has been compromised by the missing true variants, especially when these variants are rare. We proposed a family-based variant calling method, FamSeq, integrating Mendelian transmission information with de novo mutation and sequencing data to improve the variant calling accuracy. We investigated the factors impacting the improvement of family-based variant calling in simulation data and validated it in real sequencing data. In both simulation and real data, FamSeq works better than the single individual based method.

In FamSeq, we implemented four different methods for the Mendelian genetic …


Domain Specific Document Retrieval Framework For Real-Time Social Health Data, Swapnil Soni Jul 2015

Domain Specific Document Retrieval Framework For Real-Time Social Health Data, Swapnil Soni

Kno.e.sis Publications

With the advent of the web search and microblogging, the percentage of Online Health Information Seekers (OHIS) using these online services to share and seek health real-time information has in- creased exponentially. OHIS use web search engines or microblogging search services to seek out latest, relevant as well as reliable health in- formation. When OHIS turn to microblogging search services to search real-time content, trends and breaking news, etc. the search results are not promising. Two major challenges exist in the current microblogging search engines are keyword based techniques and results do not contain real-time information. To address these challenges, …


Scalable Euclidean Embedding For Big Data, Zohreh S. Alavi, Sagar Sharma, Lu Zhou, Keke Chen Jul 2015

Scalable Euclidean Embedding For Big Data, Zohreh S. Alavi, Sagar Sharma, Lu Zhou, Keke Chen

Kno.e.sis Publications

Euclidean embedding algorithms transform data defined in an arbitrary metric space to the Euclidean space, which is critical to many visualization techniques. At big-data scale, these algorithms need to be scalable to massive dataparallel infrastructures. Designing such scalable algorithms and understanding the factors affecting the algorithms are important research problems for visually analyzing big data. We propose a framework that extends the existing Euclidean embedding algorithms to scalable ones. Specifically, it decomposes an existing algorithm into naturally parallel components and non-parallelizable components. Then, data parallel implementations such as MapReduce and data reduction techniques are applied to the two categories of …


Evaluating A Potential Commercial Tool For Healthcare Application For People With Dementia, Tanvi Banerjee, Pramod Anantharam, William L. Romine, Larry Wayne Lawhorne Jul 2015

Evaluating A Potential Commercial Tool For Healthcare Application For People With Dementia, Tanvi Banerjee, Pramod Anantharam, William L. Romine, Larry Wayne Lawhorne

Kno.e.sis Publications

The widespread use of smartphones and sensors has made physiology, environment, and public health notifications amenable to continuous monitoring. Personalized digital health and patient empowerment can become a reality only if the complex multisensory and multimodal data is processed within the patient context, converting relevant medical knowledge into actionable information for better and timely decisions. We apply these principles in the healthcare domain of dementia. Specifically, in this study we validate one of our sensor platforms to ascertain whether it will be suitable for detecting physiological changes that may help us detect changes in people with dementia. This study shows …


Fast, Accurate, And Reliable Molecular Docking With Quickvina 2, Amr Alhossary, Stephanus Daniel Handoko, Yuguang Mu, Chee-Keong Kwoh Jul 2015

Fast, Accurate, And Reliable Molecular Docking With Quickvina 2, Amr Alhossary, Stephanus Daniel Handoko, Yuguang Mu, Chee-Keong Kwoh

Research Collection School Of Computing and Information Systems

Motivation: The need for efficient molecular docking tools for high-throughput screening is growing alongside the rapid growth of drug-fragment databases. AutoDock Vina ('Vina') is a widely used docking tool with parallelization for speed. QuickVina ('QVina 1') then further enhanced the speed via a heuristics, requiring high exhaustiveness. With low exhaustiveness, its accuracy was compromised. We present in this article the latest version of QuickVina ('QVina 2') that inherits both the speed of QVina 1 and the reliability of the original Vina.Results: We tested the efficacy of QVina 2 on the core set of PDBbind 2014. With the default exhaustiveness level …


"Time For Dabs": Analyzing Twitter Data On Butane Hash Oil Use, Raminta Daniulaityte, Robert G. Carlson, Farahnaz Golroo, Sanjaya Wijeratne, Edward W. Boyer, Silvia S. Martins, Ramzi W. Nahhas, Amit P. Sheth Jun 2015

"Time For Dabs": Analyzing Twitter Data On Butane Hash Oil Use, Raminta Daniulaityte, Robert G. Carlson, Farahnaz Golroo, Sanjaya Wijeratne, Edward W. Boyer, Silvia S. Martins, Ramzi W. Nahhas, Amit P. Sheth

Kno.e.sis Publications

No abstract provided.


Trust Management: Multimodal Data Perspective, Krishnaprasad Thirunarayan Jun 2015

Trust Management: Multimodal Data Perspective, Krishnaprasad Thirunarayan

Kno.e.sis Publications

No abstract provided.


Identifying Modifier Genes In Sma Model Mice, Weiting Xu May 2015

Identifying Modifier Genes In Sma Model Mice, Weiting Xu

Theses

Spinal Muscular Atrophy (SMA) involves the loss of nerve cells called motor neurons in the spinal cord and is classified as a motor neuron disease, it affects 1 in 5000-10000 newborns, one of the leading genetic causes of infant death in USA. Mutations in the SMN1, UBA1, DYNC1H1 and VAPB genes cause spinal muscular atrophy. Extra copies of the SMN2 gene modify the severity of spinal muscular atrophy. Mutations in SMN1 (Motor Neuron 1) mainly causes SMA (Autosomal recessive inheritance). SMN1 gene mutations lead to a shortage of the SMN protein and SMN protein forms SMN complex …


Entity Recommendations Using Hierarchical Knowledge Bases, Siva Kumar Cheekula, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit P. Sheth May 2015

Entity Recommendations Using Hierarchical Knowledge Bases, Siva Kumar Cheekula, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit P. Sheth

Kno.e.sis Publications

Recent developments in recommendation algorithms have focused on integrating Linked Open Data to augment traditional algorithms with background knowledge. These developments recognize that the integration of Linked Open Data may or better performance, particularly in cold start cases. In this paper, we explore if and how a specific type of Linked Open Data, namely hierarchical knowledge, may be utilized for recommendation systems. We propose a content-based recommendation approaches that adapts a spreading activation algorithm over the DBpedia category structure to identify entities of interest to the user. Evaluation of the algorithm over the Movielens dataset demonstrates that our method yields …