Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2013

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 131

Full-Text Articles in Bioinformatics

Statistical And Comparative Phylogeography Of Mexican Freshwater Taxa In Extreme Aquatic Environments, Lyndon M. Coghill Dec 2013

Statistical And Comparative Phylogeography Of Mexican Freshwater Taxa In Extreme Aquatic Environments, Lyndon M. Coghill

University of New Orleans Theses and Dissertations

Phylogeography aims to understand the processes that underlie the distribution of genetic variation within and among closely related species. Although the means by which this goal might be achieved differ considerably from those that spawned the field some thirty years ago, the foundation and conceptual breakthroughs made by Avise are nonetheless the same and are as relevant today as they were two decades ago. Namely, patterns of neutral genetic variation among individuals carry the signature of a species’ demographic past, and the spatial and temporal environmental heterogeneity across a species’ geographic range can influence patterns of evolutionary change. Aquatic systems …


Interaction-Based Discovery Of Functionally Important Genes In Cancers, Dario Ghersi, Mona Singh Dec 2013

Interaction-Based Discovery Of Functionally Important Genes In Cancers, Dario Ghersi, Mona Singh

Interdisciplinary Informatics Faculty Publications

A major challenge in cancer genomics is uncovering genes with an active role in tumorigenesis from a potentially large pool of mutated genes across patient samples. Here we focus on the interactions that proteins make with nucleic acids, small molecules, ions and peptides, and show that residues within proteins that are involved in these interactions are more frequently affected by mutations observed in large-scale cancer genomic data than are other residues. We leverage this observation to predict genes that play a functionally important role in cancers by introducing a computational pipeline (http://canbind.princeton.edu) for mapping large-scale cancer exome data …


Computational Molecular Coevolution, Russell J. Dickson Dec 2013

Computational Molecular Coevolution, Russell J. Dickson

Electronic Thesis and Dissertation Repository

A major goal in computational biochemistry is to obtain three-dimensional structure information from protein sequence. Coevolution represents a biological mechanism through which structural information can be obtained from a family of protein sequences. Evolutionary relationships within a family of protein sequences are revealed through sequence alignment. Statistical analyses of these sequence alignments reveals positions in the protein family that covary, and thus appear to be dependent on one another throughout the evolution of the protein family. These covarying positions are inferred to be coevolving via one of two biological mechanisms, both of which imply that coevolution is facilitated by inter-residue …


Characterizing The Human Vaginal Microbiome Using High-Throughput Sequencing, Jean Megan E. Macklaim Dec 2013

Characterizing The Human Vaginal Microbiome Using High-Throughput Sequencing, Jean Megan E. Macklaim

Electronic Thesis and Dissertation Repository

The human vaginal microbiome undoubtedly has a significant role in reproductive health and for protection from infectious organisms. Recent efforts to characterize the bacterial species of the vagina using molecular techniques have uncovered an unexpected diversity. Using high-throughput sequencing I sought to describe the structure and function of the vaginal microbiome under different physiological states including healthy, bacterial vaginosis (BV), post-menopausal vaginal atrophy, and acute vulvovaginal candidiasis (VVC).

Partial 16S rRNA gene sequencing revealed that healthy, asymptomatic women most often have vaginal biotas dominated by Lactobacillus iners or L. crispatus. In contrast, BV is a heterogeneous, highly diversified condition …


Discovering Driver Somatic Mutations, Copy Number Alterations And Methylation Changes Using Markov Chain Monte Carlo, Bokhari Yahya Dec 2013

Discovering Driver Somatic Mutations, Copy Number Alterations And Methylation Changes Using Markov Chain Monte Carlo, Bokhari Yahya

Theses and Dissertations

Nowadays we have tremendous amount of genetic data needing to be interpreted. Somatic mutations, copy number variations and methylation are example of the genetics data we are dealing with. Discovering driver mutations from these combined data types is challenging. Mutations are unpredictable and have broad heterogeneity, which makes our goal hard to accomplish. Many methods have been proposed to solve the mystery of genetics of cancer. In this project we manipulate those above mentioned genetics data types and choose to use and modified an existing method utilizing Markov Chain Monte Carlo (MCMC). The method introduced two properties, coverage and exclusivity. …


Impact Of Noise On Molecular Network Inference, Radhakrishnan Nagarajan, Marco Scutari Dec 2013

Impact Of Noise On Molecular Network Inference, Radhakrishnan Nagarajan, Marco Scutari

Biostatistics Faculty Publications

Molecular entities work in concert as a system and mediate phenotypic outcomes and disease states. There has been recent interest in modelling the associations between molecular entities from their observed expression profiles as networks using a battery of algorithms. These networks have proven to be useful abstractions of the underlying pathways and signalling mechanisms. Noise is ubiquitous in molecular data and can have a pronounced effect on the inferred network. Noise can be an outcome of several factors including: inherent stochastic mechanisms at the molecular level, variation in the abundance of molecules, heterogeneity, sensitivity of the biological assay or measurement …


Attributing Meaning To Online Social Network Analysis For Tailored Socio-Behavioral Support Systems, Sahiti Myneni Dec 2013

Attributing Meaning To Online Social Network Analysis For Tailored Socio-Behavioral Support Systems, Sahiti Myneni

Dissertations & Theses (Open Access)

Ubiquitous online social networks provide us with a unique opportunity to deliver scalable interventions for the support of lifestyle modifications in order to change behaviors that predispose toward cancer and other diseases. At the same time these networks act as rich data sources to inform our understanding of end-user needs. Traditionally, social network analysis is based on communication frequency among members. In this work, I introduce communication content as a complementary frame for studying these networks.

QuitNet, an online social network developed to provide smoking cessation support is considered for analysis. Qualitative coding, automated content analysis, and network analysis were …


Filter-Based Multiscale Entropy Analysis Of Complex Physiological Time Series, Liang Zhao Dec 2013

Filter-Based Multiscale Entropy Analysis Of Complex Physiological Time Series, Liang Zhao

Dissertations - ALL

The multiscale entropy (MSE) has been widely and successfully used in analyzing the complexity of physiologic time series. In this thesis, we re-interpret the averaging process in MSE as filtering a time series by a filter of a piecewise constant type. From this viewpoint, we introduce the {\it filter-based multiscale entropy} (FME) which filters a time series by filters to generate its multiple frequency components and then compute the {\it blockwise} entropy of the resulting components. By choosing filters adapted to the feature of a given time series, FME is able to better capture its multiscale information and to provide …


Introducing A Novel Method For Genetic Analysis Of Autism Spectrum Disorder, Sepideh Nouri Dec 2013

Introducing A Novel Method For Genetic Analysis Of Autism Spectrum Disorder, Sepideh Nouri

Dissertations & Theses (Open Access)

Autism is a spectrum of neurological disorders that is characterized by repetitive and stereotyped behaviors, lack of social skills in verbal and non-verbal communications, and intellectual disability. Recent statistics shows that 1 out of every 88 children in the US is affected by autism.

In this thesis, I first review previous studies on genetic association analyses of autism spectrum disorder. A large number of these studies fall into two categories: Genome Wide Association Studies (GWAS) and sequencing studies. Although GWAS are able to identify multiple common risk variants associated with different diseases, these common variants explain only a small portion …


Demonstration Of A Targeted Proteome Characterization Approach For Examining Specific Metabolic Pathways In Complex Bacterial Systems, Adam Justin Martin Dec 2013

Demonstration Of A Targeted Proteome Characterization Approach For Examining Specific Metabolic Pathways In Complex Bacterial Systems, Adam Justin Martin

Masters Theses

Multiple Reaction Monitoring (MRM) is a powerful tandem mass spectrometry (MS/MS) tool frequently implemented in proteomic studies to provide targeted analysis of proteins and peptides. The selectivity that MRM delivers is so strong that it provides the quadrupole mass spectrometers (QQQ), on which it is commonly employed, with pertinence to proteomic studies that they would otherwise lack for their relatively low resolution. Additionally, this increased level of selectivity is sufficient enough to supplant complicated fractionation techniques, additional dimensions of chromatography, and 24 hour long MS/MS experiments in simplistic biological samples. But there is a deficiency of evidence to determine the …


Ultra-Deep Pyrosequencing Of Partial Surface Protein Genes From Infectious Salmon Anaemia Virus (Isav) Suggest Novel Mechanisms Involved In Transition To Virulence, Torstein Tengs Dr. Nov 2013

Ultra-Deep Pyrosequencing Of Partial Surface Protein Genes From Infectious Salmon Anaemia Virus (Isav) Suggest Novel Mechanisms Involved In Transition To Virulence, Torstein Tengs Dr.

Dr. Torstein Tengs

Uncultivable HPR0 strains of infectious salmon anaemia viruses (ISAVs) infecting gills are non-virulent putative precursors of virulent ISAVs (vISAVs) causing systemic disease in farmed Atlantic salmon (Salmo salar). The transition to virulence involves two molecular events, a deletion in the highly polymorphic region (HPR) of the hemagglutinin-esterase (HE) gene and a Q266→L266 substitution or insertion next to the putative cleavage site (R267) in the fusion protein (F). We have performed ultra-deep pyrosequencing (UDPS) of these gene regions from healthy fish positive for HPR0 virus carrying full-length HPR sampled in a screening program, and a vISAV strain from an ISA outbreak …


Towards The Prediction Of Mutations In Genomic Sequences, Juan Carlos Martinez Nov 2013

Towards The Prediction Of Mutations In Genomic Sequences, Juan Carlos Martinez

FIU Electronic Theses and Dissertations

Bio-systems are inherently complex information processing systems. Furthermore, physiological complexities of biological systems limit the formation of a hypothesis in terms of behavior and the ability to test hypothesis. More importantly the identification and classification of mutation in patients are centric topics in today’s cancer research.

Next generation sequencing (NGS) technologies can provide genome-wide coverage at a single nucleotide resolution and at reasonable speed and cost. The unprecedented molecular characterization provided by NGS offers the potential for an individualized approach to treatment. These advances in cancer genomics have enabled scientists to interrogate cancer-specific genomic variants and compare them with the …


Rigidity Analysis Of Protein Biological Assemblies And Periodic Crystal Structures, Filip Jagodzinski, Pamela Clark, Jessica Grant, Tiffany Liu, Samantha Monastra, Ileana Streinu Nov 2013

Rigidity Analysis Of Protein Biological Assemblies And Periodic Crystal Structures, Filip Jagodzinski, Pamela Clark, Jessica Grant, Tiffany Liu, Samantha Monastra, Ileana Streinu

All Faculty Scholarship for the College of the Sciences

Background

We initiate in silico rigidity-theoretical studies of biological assemblies and small crystals for protein structures. The goal is to determine if, and how, the interactions among neighboring cells and subchains affect the flexibility of a molecule in its crystallized state. We use experimental X-ray crystallography data from the Protein Data Bank (PDB). The analysis relies on an effcient graph-based algorithm. Computational experiments were performed using new protein rigidity analysis tools available in the new release of our KINARI-Web server http://kinari.cs.umass.edu.

Results

We provide two types of results: on biological assemblies and on crystals. We found that when only isolated …


Development And Evaluation Of An Ontology-Based Quality Metrics Extraction System, Sina Madani Nov 2013

Development And Evaluation Of An Ontology-Based Quality Metrics Extraction System, Sina Madani

Dissertations & Theses (Open Access)

The Institute of Medicine reports a growing demand in recent years for quality improvement within the healthcare industry. In response, numerous organizations have been involved in the development and reporting of quality measurement metrics. However, disparate data models from such organizations shift the burden of accurate and reliable metrics extraction and reporting to healthcare providers. Furthermore, manual abstraction of quality metrics and diverse implementation of Electronic Health Record (EHR) systems deepens the complexity of consistent, valid, explicit, and comparable quality measurement reporting within healthcare provider organizations.

The main objective of this research is to evaluate an ontology-based information extraction framework …


Pcaanalyser: A 2d-Image Analysis Based Module For Effective Determination Of Prostate Cancer Progression In 3d Culture, Md Tamjidul Hoque, Louisa C. E. Windus, Carrie J. Lovitt, Vicky M. Avery Nov 2013

Pcaanalyser: A 2d-Image Analysis Based Module For Effective Determination Of Prostate Cancer Progression In 3d Culture, Md Tamjidul Hoque, Louisa C. E. Windus, Carrie J. Lovitt, Vicky M. Avery

Computer Science Faculty Publications

Three-dimensional (3D) in vitro cell based assays for Prostate Cancer (PCa) research are rapidly becoming the preferred alternative to that of conventional 2D monolayer cultures. 3D assays more precisely mimic the microenvironment found in vivo, and thus are ideally suited to evaluate compounds and their suitability for progression in the drug discovery pipeline. To achieve the desired high throughput needed for most screening programs, automated quantification of 3D cultures is required. Towards this end, this paper reports on the development of a prototype analysis module for an automated high-content-analysis (HCA) system, which allows for accurate and fast investigation of …


Automatic Domain Identification For Linked Open Data, Sarasi Lalithsena, Pascal Hitzler, Amit P. Sheth, Prateek Jain Nov 2013

Automatic Domain Identification For Linked Open Data, Sarasi Lalithsena, Pascal Hitzler, Amit P. Sheth, Prateek Jain

Kno.e.sis Publications

Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present an evaluation which validates the topic domains we can identify with our system. Furthermore, we evaluate the effectiveness of identified topic domains for the purpose …


Lack Of Rbl1/P107 Effects On Cell Proliferation And Maturation In The Inner Ear, Sonia M. Rocha-Sanchez, Laura R. Scheetz, Sabrina Siddiqi, Michael W. Weston, Lynette M. Smith, Kate Dempsey, Hesham Ali, Joann Mcgee, Edward J. Walsh Nov 2013

Lack Of Rbl1/P107 Effects On Cell Proliferation And Maturation In The Inner Ear, Sonia M. Rocha-Sanchez, Laura R. Scheetz, Sabrina Siddiqi, Michael W. Weston, Lynette M. Smith, Kate Dempsey, Hesham Ali, Joann Mcgee, Edward J. Walsh

Information Systems and Quantitative Analysis Faculty Publications

Loss of postnatal mammalian auditory hair cells (HCs) is irreversible. Earlier studies have highlighted the importance of the Retinoblastoma family of proteins (pRBs) (i.e., Rb1, Rbl1/p107, and Rbl2/p130) in the auditory cells’ proliferation and emphasized our lack of information on their specific roles in the auditory system. We have previously demonstrated that lack of Rbl2/p130 moderately affects HCs’ and supporting cells’ (SCs) proliferation. Here, we present evidence supporting multiple roles for Rbl1/p107 in the developing and mature mouse organ of Corti (OC). Like other pRBs, Rbl1/p107 is expressed in the OC, particularly in the Hensen’s and Deiters’ cells. Moreover, Rbl1/p107 …


Semantics-Empowered Big Data Processing With Applications, Krishnaprasad Thirunarayan, Amit P. Sheth Nov 2013

Semantics-Empowered Big Data Processing With Applications, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the Five Vs of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing …


High Variance In Reproductive Success Generates A False Signature Of A Genetic Bottleneck In Populations Of Constant Size: A Simulation Study, Sean M. Hoban, Massimo Mezzavilla, Oscar E. Gaggiotti, Andrea Benazzo, Cock Van Oosterhout, Giorgio Bertorelle Oct 2013

High Variance In Reproductive Success Generates A False Signature Of A Genetic Bottleneck In Populations Of Constant Size: A Simulation Study, Sean M. Hoban, Massimo Mezzavilla, Oscar E. Gaggiotti, Andrea Benazzo, Cock Van Oosterhout, Giorgio Bertorelle

Faculty Publications and Other Works -- General Biology

Background

Demographic bottlenecks can severely reduce the genetic variation of a population or a species. Establishing whether low genetic variation is caused by a bottleneck or a constantly low effective number of individuals is important to understand a species’ ecology and evolution, and it has implications for conservation management. Recent studies have evaluated the power of several statistical methods developed to identify bottlenecks. However, the false positive rate, i.e. the rate with which a bottleneck signal is misidentified in demographically stable populations, has received little attention. We analyse this type of error (type I) in forward computer simulations of stable …


City Notifications As A Data Source For Traffic Management, Pramod Anantharam, Biplav Srivastava Oct 2013

City Notifications As A Data Source For Traffic Management, Pramod Anantharam, Biplav Srivastava

Kno.e.sis Publications

A common problem for cities of developing countries like India in managing traffic is the lack of basic automated instrumentation to track road conditions or vehicle locations. Still, to help their citizens make informed travel decisions based on changing city dynamics; many cities have an authorized, city-initiated, notification service in place to alert subscribing commuters about road conditions. Here, alternative means may be used to create informal textual notifications e.g., inputs from field personnel, citizen updates, and pre-authorized events from city calendar. In this paper, we show that collections of such notifications, when processed with information extraction techniques, can turn …


Differential Reconstructed Gene Interaction Networks For Deriving Toxicity Threshold In Chemical Risk Assessment, Yi Yang, Andrew Maxwell, Xiaowei Zhang, Nan Wang, Edward J. Perkins, Chaoyang Zhang, Ping Gong Oct 2013

Differential Reconstructed Gene Interaction Networks For Deriving Toxicity Threshold In Chemical Risk Assessment, Yi Yang, Andrew Maxwell, Xiaowei Zhang, Nan Wang, Edward J. Perkins, Chaoyang Zhang, Ping Gong

Faculty Publications

Background: Pathway alterations reflected as changes in gene expression regulation and gene interaction can result from cellular exposure to toxicants. Such information is often used to elucidate toxicological modes of action. From a risk assessment perspective, alterations in biological pathways are a rich resource for setting toxicant thresholds, which may be more sensitive and mechanism-informed than traditional toxicity endpoints. Here we developed a novel differential networks (DNs) approach to connect pathway perturbation with toxicity threshold setting.

Methods: Our DNs approach consists of 6 steps: time-series gene expression data collection, identification of altered genes, gene interaction network reconstruction, differential …


Estimation Of Variation For High-Throughput Molecular Biological Experiments With Small Sample Size, Danni Yu Oct 2013

Estimation Of Variation For High-Throughput Molecular Biological Experiments With Small Sample Size, Danni Yu

Open Access Dissertations

Motivation: In the quantification of molecular components, a large variation can affect and even potentially mislead the biological conclusions. Meanwhile, the high-throughput experiments often involve a small number of samples due to the limitation of cost and time. In such cases, the stochastic information may dominate the outcome of an experiment because there may not be enough samples to present the true biological information. It is challenging to distinguish the changes in phenotype from the stochastic variation.

Methods: Since the biological molecules have been quantified with different technologies, different statistical methods are required. Focusing on three types of important high-throughput …


Statistical Models For Gene And Transcripts Quantification And Identification Using Rna-Seq Technology, Han Wu Oct 2013

Statistical Models For Gene And Transcripts Quantification And Identification Using Rna-Seq Technology, Han Wu

Open Access Dissertations

RNA-Seq has emerged as a powerful technique for transcriptome study. As much as the improved sensitivity and coverage, RNA-Seq also brings challenges for data analysis. The massive amount of sequence reads data, excessive variability, uncertainties, and bias and noises stemming from multiple sources all make the analysis of RAN-Seq data difficult. Despite much progress, RNA-Seq data analysis still has much room for improvement, especially on the quantification of gene and transcript expression levels. The quantification of gene expression level is a direct inference problem, whereas the quantification of the transcript expression level is an indirect problem, because the label of …


Development Of Tyrosine Kinase Peptide Biosensors And Methods For Detection, Andrew Michael Lipchik Oct 2013

Development Of Tyrosine Kinase Peptide Biosensors And Methods For Detection, Andrew Michael Lipchik

Open Access Dissertations

New methods to monitor tyrosine kinase activity are critical for studying kinases in cell biology, drug discovery and the clinic. Peptide-based biosensors for detection of kinase activity utilitize a kinase specific artificial peptide substrate, which can report intercellular kinase activity through the incorporation of phosphate.

An artificial Syk substrate peptide was developed and incorporated with other functional modules to produce a Syk biosensor. These modules included a biotin-tag for affinity capture, a photo-cleavable amino acid to allow release of the substrate from the delivery module and the cell penetrating peptides TAT. A live cell kinase assay utilizing this biosensor was …


Identifying Chromosome Rearrangements In The Allopolyploid Brassica Napus Using Pyrosequencing, Alexandra R. Barbella Oct 2013

Identifying Chromosome Rearrangements In The Allopolyploid Brassica Napus Using Pyrosequencing, Alexandra R. Barbella

Master's Theses

Allopolyploids form through the hybridization of two or more diploid genomes. A challenge to reproduction in allopolyploids is that pairing can occur between homologous chromosomes or homeologous chromosomes (i.e.different subgenomes.). Crossover between homeologous chromosomes can result in chromosome rearrangements that lower fertility and overall fitness. Rearrangements can alter the dosage of either entire chromosomes or just parts of chromosomes. Understanding the frequency and extent of rearrangements will help to explain the evolution and genome stabilization of agriculturally important allopolyploid species. Pyrosequencing is a useful tool in the study dosage changes in allopolyploids because it allows quantification of the relative contribution …


The Origin And Molecular Evolution Of Two Multigene Families: G-Protein Coupled Receptors And Glycoside Hydrolase Families, Seong-Il Eyun Sep 2013

The Origin And Molecular Evolution Of Two Multigene Families: G-Protein Coupled Receptors And Glycoside Hydrolase Families, Seong-Il Eyun

School of Biological Sciences: Dissertations, Theses, and Student Research

Multigene family is a group of genes that arose from a common ancestor by gene duplication. Gene duplications are a major driving force of new function acquisition. Multigene family thus has a fundamental role in adaptation. To elucidate their molecular evolutionary mechanisms, I chose two multigene families: chemosensory receptors and glycoside hydrolases. I have identified complete repertoires of trace amine-associated receptors (TAARs), a member of chemosensory receptors, from 38 metazoan genomes. An ancestral-type TAAR emerged before the divergence between gnathostomes (jawed vertebrates) and sea lamprey (jawless fish). Primary amine detecting TAARs (TAAR1-4) are found to be older and have evolved …


Quantifying Mutational Impacts On Intrinsic Dna Flexibility In Prokaryotic Genomes, Mohammed Alawad Sep 2013

Quantifying Mutational Impacts On Intrinsic Dna Flexibility In Prokaryotic Genomes, Mohammed Alawad

Theses

The existence of synonymous codon biases across all taxonomic groups is a long standing problem in biology. While codon bias seems to be adequately explained by the maintenance of translation efficiency and accuracy in some organisms, there is still no adequate explanation of why codon biases universally track the intergenic gc content, as these regions of the genome would not be under selection pressures affecting translation. One part of the story may come from the triplet nature of codon in which each third position defines the minor groove width and thus affects the basic structure of the DNA by altering …


Detecting Modules In Multiplex Networks – An Application For Integrating Expression Profiles Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Baikang Pei, Mark Gerstein Sep 2013

Detecting Modules In Multiplex Networks – An Application For Integrating Expression Profiles Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Baikang Pei, Mark Gerstein

Yale Day of Data

Multiplex network, a set of networks linked through interconnected layers, is a useful mathematical framework for data integration. Here, we present a general method to detect modules in multiplex networks and apply it in a specific biological context: to simultaneously cluster the genome-wide expression profiles of C. elegans and D. melanogaster generated by the ENOCDE and modENCODE consortia. The method revealed modules that are fundamentally cross-species and can either be conserved or species-specific. In general, the method could be applied in various contexts like the integration of different social networks.


Mining Effective Multi-Segment Sliding Window For Pathogen Incidence Rate Prediction, Lei Duan, Changjie Tang, Xiasong Li, Guozhu Dong, Xianming Wang, Jie Zuo, Min Jiang, Zhongqi Li, Yongqing Zhang Sep 2013

Mining Effective Multi-Segment Sliding Window For Pathogen Incidence Rate Prediction, Lei Duan, Changjie Tang, Xiasong Li, Guozhu Dong, Xianming Wang, Jie Zuo, Min Jiang, Zhongqi Li, Yongqing Zhang

Kno.e.sis Publications

Pathogen incidence rate prediction, which can be considered as time series modeling, is an important task for infectious disease incidence rate prediction and for public health. This paper investigates the application of a genetic computation technique, namely GEP, for pathogen incidence rate prediction. To overcome the shortcomings of traditional sliding windows in GEP-based time series modeling, the paper introduces the problem of mining effective sliding window, for discovering optimal sliding windows for building accurate prediction models. To utilize the periodical characteristic of pathogen incidence rates, a multi-segment sliding window consisting of several segments from different periodical intervals is proposed and …


A Statistical And Schema Independent Approach To Identify Equivalent Properties On Linked Data, Kalpa Gunaratna, Krishnaprasad Thirunarayan, Prateek Jain, Amit P. Sheth, Sanjaya Wijeratne Sep 2013

A Statistical And Schema Independent Approach To Identify Equivalent Properties On Linked Data, Kalpa Gunaratna, Krishnaprasad Thirunarayan, Prateek Jain, Amit P. Sheth, Sanjaya Wijeratne

Kno.e.sis Publications

Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts, instances, and properties. Efforts to address this limitation over properties have focused on matching data-type properties across datasets; however, matching of object-type properties has not received …