Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- Selected Works (37)
- Wright State University (36)
- COBRA (13)
- University of Nebraska - Lincoln (7)
- University of Nebraska at Omaha (3)
-
- San Jose State University (2)
- University at Albany, State University of New York (2)
- University of South Carolina (2)
- California Polytechnic State University, San Luis Obispo (1)
- Dartmouth College (1)
- Georgia Southern University (1)
- Institute of Business Administration (1)
- Louisiana Tech University (1)
- Loyola University Chicago (1)
- New Jersey Institute of Technology (1)
- SelectedWorks (1)
- The Texas Medical Center Library (1)
- The University of Southern Mississippi (1)
- University of Arkansas, Fayetteville (1)
- University of Kentucky (1)
- University of Massachusetts Amherst (1)
- University of Missouri, St. Louis (1)
- University of Tennessee, Knoxville (1)
- Virginia Commonwealth University (1)
- William & Mary (1)
- Keyword
-
- Regularized Analysis of Large p, Small n Data (5)
- Bioinformatics (4)
- Conference (3)
- Insect Systematics (3)
- SSW (3)
-
- Semantic Sensor Web (3)
- Semantic Web (3)
- Binding site identification server (2)
- Probe molecule (2)
- SITEHOUND-web (2)
- Spatio-Temporal-Thematic Analysis of Social Data (2)
- 4- γ (1)
- Abductive Reasoning (1)
- Advertising on Social Networks (1)
- Algorithms (1)
- Amino acid sequence (1)
- Amyloid proteins (1)
- Analysis of User Generated Content (1)
- Anatomy (1)
- Ascites (1)
- BLUE-text (1)
- Basal intemode (1)
- Base-calling; Large-scale data analysis; Linear models; Second-generation DNA sequencing; Quality Assessment (1)
- Best Practices (1)
- Biochemical networks (1)
- Bioinformatics, Computational Biology (1)
- Bioinformatics/Cancer Genomics (1)
- Biology, Bioinformatics (1)
- Biomedical language understanding and extraction (1)
- Blood flow (1)
- Publication
-
- Kno.e.sis Publications (30)
- Shuangge Ma (27)
- Computer Science and Engineering Faculty Publications (6)
- Biomedical Imaging and Biosignal Analysis Laboratory (5)
- Johns Hopkins University, Dept. of Biostatistics Working Papers (5)
-
- Faculty Publications (3)
- Interdisciplinary Informatics Faculty Publications (3)
- T. Heath Ogden (3)
- U.C. Berkeley Division of Biostatistics Working Paper Series (3)
- COBRA Preprint Series (2)
- Department of Computer Science and Engineering: Dissertations, Theses, and Student Research (2)
- Faculty Publications, Computer Science (2)
- Goldi A Kozloski (2)
- Legacy Theses & Dissertations (2009 - 2024) (2)
- Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series (2)
- William B. Andreopoulos (2)
- Bioinformatics Faculty Publications (1)
- Biological Sciences (1)
- Chemistry & Biochemistry Faculty Works (1)
- Dartmouth Scholarship (1)
- Dissertations & Theses (Open Access) (1)
- Dissertations, Theses, and Masters Projects (1)
- Doctoral Dissertations (1)
- Electronic Theses and Dissertations (1)
- Faculty Publications and Other Works -- EECS (1)
- Graduate Theses and Dissertations (1)
- Harvard University Biostatistics Working Paper Series (1)
- International Conference on Information and Communication Technologies (1)
- Mark R Segal (1)
- Masters Theses 1911 - February 2014 (1)
- Publication Type
- File Type
Articles 1 - 30 of 119
Full-Text Articles in Life Sciences
Feature Selection And Classification Of Maqc-Ii Breast Cancer And Multiple Myeloma Microarray Gene Expression Data, Qingzhong Liu, Andrew H. Sung, Zhongxue Chen, Jianzhong Liu, Xudong Huang, Youping Deng
Feature Selection And Classification Of Maqc-Ii Breast Cancer And Multiple Myeloma Microarray Gene Expression Data, Qingzhong Liu, Andrew H. Sung, Zhongxue Chen, Jianzhong Liu, Xudong Huang, Youping Deng
Faculty Publications
Microarray data has a high dimension of variables but available datasets usually have only a small number of samples, thereby making the study of such datasets interesting and challenging. In the task of analyzing microarray data for the purpose of, e.g., predicting gene-disease association, feature selection is very important because it provides a way to handle the high dimensionality by exploiting information redundancy induced by associations among genetic markers. Judicious feature selection in microarray data analysis can result in significant reduction of cost while maintaining or improving the classification or prediction accuracy of learning machines that are employed to sort …
Research In Semantic Web And Information Retrieval: Trust, Sensors, And Search, Krishnaprasad Thirunarayan
Research In Semantic Web And Information Retrieval: Trust, Sensors, And Search, Krishnaprasad Thirunarayan
Kno.e.sis Publications
No abstract provided.
Muc4/Muc4 Functions And Regulation In Cancer., Goldi Kozloski
Muc4/Muc4 Functions And Regulation In Cancer., Goldi Kozloski
Goldi A Kozloski
Biological Sequence Simulation For Testing Complex Evolutionary Hypotheses: Indel-Seq-Gen Version 2.0, Cory L. Strope
Biological Sequence Simulation For Testing Complex Evolutionary Hypotheses: Indel-Seq-Gen Version 2.0, Cory L. Strope
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Reconstructing the evolutionary history of biological sequences will provide a better understanding of mechanisms of sequence divergence and functional evolution. Long-term sequence evolution includes not only substitutions of residues but also more dynamic changes such as insertion, deletion, and long-range rearrangements. Such dynamic changes make reconstructing sequence evolution history difficult and affect the accuracy of molecular evolutionary methods, such as multiple sequence alignments (MSAs) and phylogenetic methods. In order to test the accuracy of these methods, benchmark datasets are required. However, currently available benchmark datasets have limitations in their sizes and evolutionary histories of the included sequences are unknown. These …
Attempted Cloning Of A Wnt Gene From Botrylloides Violaceus, Manasa Chandra, James Tumulak
Attempted Cloning Of A Wnt Gene From Botrylloides Violaceus, Manasa Chandra, James Tumulak
Biological Sciences
Botrylloides violaceus is a colonial ascidian with the ability to undergo sexual and asexual reproduction as well as regeneration. The canonical pathway starts with the extracellular protein Wnt and ends with β-catenin, a transcription factor, which also functions in cell adhesion. The Wnt signaling pathway is involved in embryogenesis and regeneration in a variety of other species. In our studies we attempt to isolate and sequence both a Wnt gene and from Botrylloides via degenerate primer design and PCR. Using bioinformatic methods we aligned sequences from other organisms, as the Botrylloides genome has not yet been sequenced. Using mouse, Ciona, …
Charge Switch Nucleotides, John G. K. Williams, Gregory R. Bashford, Jiyan Chen, Dan Draney, Nara Narayanan, Bambi Reynolds, Pamela Sheaff
Charge Switch Nucleotides, John G. K. Williams, Gregory R. Bashford, Jiyan Chen, Dan Draney, Nara Narayanan, Bambi Reynolds, Pamela Sheaff
Biomedical Imaging and Biosignal Analysis Laboratory
The present invention provides compounds, methods and systems for sequencing nucleic acid using single molecule detection. Using labeled NPs that exhibit charged-switching behavior, single-molecule DNA sequencing in a microchannel sorting system is realized. In operation, sequencing products are detected enabling real-time sequencing as successive detectable moieties flow through a detection channel. By electrically sorting charged molecules, the cleaved product molecules are detected in isolation Without interference from unincorporated NPs and Without illuminating the polymerase-DNA complex.
Genetic Effect Of The Dwarfing Genes On Some Culm Characteristics Associatcd With Lodging Resistance In Bread Wheat, Md. Mahbub Hasan
Genetic Effect Of The Dwarfing Genes On Some Culm Characteristics Associatcd With Lodging Resistance In Bread Wheat, Md. Mahbub Hasan
Md. Mahbub Hasan
Due to the challenge of screening traits related to lodging resistance under natural field conditions, selection for lodging resistant varieties in wheat breeding programs is difficult. The identification of easily measurable culm anatomical traits related to lodging resistance would simplify the selection process. The present study was conducted to determine the effect of dwarfing genes on culm anatomical traits related to lodging resistance in our of basal internode 1. Field and laboratory study was conducted in Shahjalal University of Science and Technology, Sylhet, Bangladesh with eight wheat genotypes having Rhr1, Rht2 dwarfing genes in them and a local land race …
Towards Reasoning Pragmatics, Pascal Hitzler
Towards Reasoning Pragmatics, Pascal Hitzler
Computer Science and Engineering Faculty Publications
The realization of Semantic Web reasoning is central to substantiating the Semantic Web vision. However, current mainstream research on this topic faces serious challenges, which force us to question established lines of research and to rethink the underlying approaches.
A Contrast Pattern Based Clustering Quality Index For Categorical Data, Qingbao Liu, Guozhu Dong
A Contrast Pattern Based Clustering Quality Index For Categorical Data, Qingbao Liu, Guozhu Dong
Kno.e.sis Publications
Since clustering is unsupervised and highly explorative, clustering validation (i.e. assessing the quality of clustering solutions) has been an important and long standing research problem. Existing validity measures have significant shortcomings. This paper proposes a novel contrast pattern based clustering quality index (CPCQ) for categorical data, by utilizing the quality and diversity of the contrast patterns (CPs) which contrast the clusters in clusterings. High quality CPs can characterize clusters and discriminate them against each other. Experiments show that the CPCQ index (1) can recognize that expert-determined classes are the best clusters for many datasets from the UCI repository; (2) does …
Sparql Query Re-Writing For Spatial Datasets Using Partonomy Based Transformation Rules, Prateek Jain, Cory Andrew Henson, Amit P. Sheth, Peter Z. Yeh, Kunal Verma
Sparql Query Re-Writing For Spatial Datasets Using Partonomy Based Transformation Rules, Prateek Jain, Cory Andrew Henson, Amit P. Sheth, Peter Z. Yeh, Kunal Verma
Kno.e.sis Publications
Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology’s containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query …
Applications Of Variable Number Tandem Repeat Genotyping In The Validation Of An Animal Medical Model And Gene Flow Studies In Threatened Populations Of Reptiles, Candace D. Smith
Applications Of Variable Number Tandem Repeat Genotyping In The Validation Of An Animal Medical Model And Gene Flow Studies In Threatened Populations Of Reptiles, Candace D. Smith
Graduate Theses and Dissertations
We used variable number tandem repeats (VNTR) to validate the chicken as a human medical model for Pulmonary Arterial Hypertension. We identified seven regions on four chromosomes and interrogated for VNTR markers that significantly associate with Pulmonary Hypertension Syndrome/ascites. In those regions, we identified 7 candidate genes; AGTR1, ACE, p38MAPK, SST, 5HT2B, NET1, and CALM3 for further analysis as significantly contributing QTL for ascites/PHS. We also used variable number tandem repeats to measure gene flow and gather evidence for multiple paternity in a population of Timber rattlesnakes, Crotalus horridus. We were able to verify 1 VNTR that can be used …
A Local Qualitative Approach To Referral And Functional Trust, Krishnaprasad Thirunarayan, Dharan Althuru, Cory Andrew Henson, Amit P. Sheth
A Local Qualitative Approach To Referral And Functional Trust, Krishnaprasad Thirunarayan, Dharan Althuru, Cory Andrew Henson, Amit P. Sheth
Kno.e.sis Publications
Trust and confidence are becoming key issues in diverse applications such as ecommerce, social networks, semantic sensor web, semantic web information retrieval systems, etc. Both humans and machines use some form of trust to make informed and reliable decisions before acting. In this work, we briefly review existing work on trust networks, pointing out some of its drawbacks. We then propose a local framework to explore two different kinds of trust among agents called referral trust and functional trust, that are modelled using local partial orders, to enable qualitative trust personalization. The proposed approach formalizes reasoning with trust, distinguishing between …
Classification, Clustering And Data-Mining Of Biological Data, Thomas Triplet
Classification, Clustering And Data-Mining Of Biological Data, Thomas Triplet
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are currently over 1100 molecular biology databases dispersed throughout the Internet. However, very few of them integrate data from multiple sources. To assist in the functional and evolutionary analysis of the abundant number of novel proteins, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database that integrates data from various biological sources. PROFESS is freely available athttp://cse.unl.edu/~profess/. Our database is designed to be versatile and expandable and will not …
Targeted Genomic Signature Profiling With Quasi-Alignment Statistics, Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie Mcgee, Maggie Dunham
Targeted Genomic Signature Profiling With Quasi-Alignment Statistics, Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie Mcgee, Maggie Dunham
COBRA Preprint Series
Genome databases continue to expand with no change in the basic format of sequence data. The prevalent use of the Classic alignment based search tools like BLAST have significantly pushed the limits of Genome Isolate research. The relatively new frontier of Metagenomic research deals with thousands of diverse genomes with newer demands beyond the current homologue search and analysis. Compressing sequence data into a complex form could facilitate a broader range of sequence analyses. To this end, this research explores reorganizing sequence data as complex Markov signatures also known as Extensible Markov Models. Markov models have found successful application in …
An Anytime Algorithm For Computing Inconsistency Measurement, Yue Ma, Guilin Qi, Guohui Xiao, Pascal Hitzler, Zuoquan Lin
An Anytime Algorithm For Computing Inconsistency Measurement, Yue Ma, Guilin Qi, Guohui Xiao, Pascal Hitzler, Zuoquan Lin
Computer Science and Engineering Faculty Publications
Measuring inconsistency degrees of inconsistent knowledge bases is an important problem as it provides context information for facilitating inconsistency handling. Many methods have been proposed to solve this problem and a main class of them is based on some kind of paraconsistent semantics. In this paper, we consider the computational aspects of inconsistency degrees of propositional knowledge bases under 4-valued semantics. We first analyze its computational complexity. As it turns out that computing the exact inconsistency degree is intractable, we then propose an anytime algorithm that provides tractable approximation of the inconsistency degree from above and below. We show that …
Ontology-Driven Provenance Management In Escience: An Application In Parasite Research, Satya S. Sahoo, D. Brent Weatherly, Raghava Mutharaju, Pramod Anantharam, Amit P. Sheth, Rick L. Tarleton
Ontology-Driven Provenance Management In Escience: An Application In Parasite Research, Satya S. Sahoo, D. Brent Weatherly, Raghava Mutharaju, Pramod Anantharam, Amit P. Sheth, Rick L. Tarleton
Kno.e.sis Publications
Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be underpinned by formal semantics to enable analysis of large scale provenance information by software applications. Further, effective analysis of provenance information requires well-defined query mechanisms to support complex queries over large datasets. This paper introduces an ontology-driven provenance management infrastructure for biology experiment data, as part …
A Survey Of The Semantic Specification Of Sensors, Michael Compton, Cory Andrew Henson, Laurent Lefort, Holger Neuhaus, Amit P. Sheth
A Survey Of The Semantic Specification Of Sensors, Michael Compton, Cory Andrew Henson, Laurent Lefort, Holger Neuhaus, Amit P. Sheth
Kno.e.sis Publications
Semantic sensor networks use declarative descriptions of sensors promote reuse and integration, and to help solve the difficulties of installing, querying and maintaining complex, heterogeneous sensor networks. This paper reviews the state of the art for the semantic specification of sensors, one of the fundamental technologies in the semantic sensor network vision. Twelve sensor ontologies are reviewed and analysed for the range and expressive power of their concepts. The reasoning and search technology developed in conjunction with these ontologies is also reviewed, as is technology for annotating OGC standards with links to ontologies. Sensor concepts that cannot be expressed accurately …
Provenir Ontology: Towards A Framework For Escience Provenance Management, Satya S. Sahoo, Amit P. Sheth
Provenir Ontology: Towards A Framework For Escience Provenance Management, Satya S. Sahoo, Amit P. Sheth
Kno.e.sis Publications
Provenance metadata describes the 'lineage' or history of an entity and necessary information to verify the quality of data, validate experiment protocols, and associate trust value with scientific results. eScience projects generate data and the associated provenance metadata in a distributed environment (such as myGrid) and on a very large scale that often precludes manual analysis. Given this scenario, provenance information should be, (a) interoperable across projects, research groups, and application domains, and (b) support analysis over large datasets using reasoning to discover implicit information. In this paper, we introduce an ontology-driven framework for eScience provenance management underpinned by an …
Suggestions For Owl 3, Pascal Hitzler
Suggestions For Owl 3, Pascal Hitzler
Computer Science and Engineering Faculty Publications
With OWL 2 about to be completed, it is the right time to start discussions on possible future modifications of OWL. We present here a number of suggestions in order to discuss them with the OWL user community. They encompass expressive extensions on polynomial OWL 2 profiles, a suggestion for an OWL Rules language, and expressive extensions for OWL DL.
Paraconsistent Reasoning For Owl 2, Yue Ma, Pascal Hitzler
Paraconsistent Reasoning For Owl 2, Yue Ma, Pascal Hitzler
Computer Science and Engineering Faculty Publications
A four-valued description logic has been proposed to reason with description logic based inconsistent knowledge bases. This approach has a distinct advantage that it can be implemented by invoking classical reasoners to keep the same complexity as under the classical semantics. However, this approach has so far only been studied for the basid description logic ALC. In this paper, we further study how to extend the four-valued semantics to the more expressive description logic SROIQ which underlies the forthcoming revision of the Web Ontology Language, OWL 2, and also investigate how it fares when adapated to tractable description logics including …
A Preferential Tableaux Calculus For Circumscriptive Alco, Stephan Grimm, Pascal Hitzler
A Preferential Tableaux Calculus For Circumscriptive Alco, Stephan Grimm, Pascal Hitzler
Computer Science and Engineering Faculty Publications
Nonmonotonic extensions of description logics (DLs) allow for default and local closed-world reasoning and are an acknowledged desired feature for applications, e.g. in the Semantic Web. A recent approach to such an extension is based on McCarthy's circumscription, which rests on the principle of minimising the extension of selected predicates to close off dedicated parts of a domain model. While decidability and complexity results have been established in the literature, no practical algorithmisation for circumscriptive DLs has been proposed so far. In this paper, we present a tableaux calculus that can be used as a decision procedure for concept satisfiability …
Ibm Altocumulus: A Cross-Cloud Middleware And Platform, E. Michael Maximilien, Ajith Harshana Ranabahu, Roy Engehausen, Laura Anderson
Ibm Altocumulus: A Cross-Cloud Middleware And Platform, E. Michael Maximilien, Ajith Harshana Ranabahu, Roy Engehausen, Laura Anderson
Kno.e.sis Publications
Cloud computing has become the new face of computing and promises to offer virtually unlimited, cheap, readily available, "utility type" computing resources. Many vendors have entered this market with different offerings ranging from infrastructure-as-a-service such as Amazon, to fully functional platform services such as Google App Engine. However, as a result of this heterogeneity, deploying applications to a cloud and managing them needs to be done using vendor specific methods. This "lock in" is seen as a major hurdle in adopting cloud technologies to the enterprise. IBM Altocumulus, the cloud middleware platform from IBM Almaden Services Research, aims to solve …
A Best Practice Model For Cloud Middleware Systems, Ajith Harshana Ranabahu, E. Michael Maximilien
A Best Practice Model For Cloud Middleware Systems, Ajith Harshana Ranabahu, E. Michael Maximilien
Kno.e.sis Publications
Cloud computing is the latest trend in computing where the intention is to facilitate cheap, utility type computing resources in a service-oriented manner. However, the cloud landscape is still maturing and there are heterogeneities between the clouds, ranging from the application development paradigms to their service interfaces,and scaling approaches. These differences hinder the adoption of cloud by major enterprises. We believe that a cloud middleware can solve most of these issues to allow cross-cloud inter-operation. Our proposed system is Altocumulus, a cloud middleware that homogenizes the clouds. In order to provide the best use of the cloud resources and make …
Context And Domain Knowledge Enhanced Entity Spotting In Informal Text, Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, Amit P. Sheth
Context And Domain Knowledge Enhanced Entity Spotting In Informal Text, Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, Amit P. Sheth
Kno.e.sis Publications
This paper explores the application of restricted relationship graphs (RDF) and statistical NLP techniques to improve named entity annotation in challenging Informal English domains. We validate our approach using on-line forums discussing popular music. Named entity annotation is particularly difficult in this domain because it is characterized by a large number of ambiguous entities, such as the Madonna album “Music” or Lilly Allen’s pop hit “Smile”.
We evaluate improvements in annotation accuracy that can be obtained by restricting the set of possible entities using real-world constraints. We find that constrained domain entity extraction raises the annotation accuracy significantly, making an …
Easymifs And Sitehound: A Toolkit For The Identification Of Ligand-Binding Sites In Protein Structures, Dario Ghersi, Roberto Sanchez
Easymifs And Sitehound: A Toolkit For The Identification Of Ligand-Binding Sites In Protein Structures, Dario Ghersi, Roberto Sanchez
Interdisciplinary Informatics Faculty Publications
Summary: SITEHOUND uses Molecular Interaction Fields (MIFs) produced by EASYMIFS to identify protein structure regions that show a high propensity for interaction with ligands. The type of binding site identified depends on the probe atom used in the MIF calculation. The input to EASYMIFS is a PDB file of a protein structure; the output MIF serves as input to SITEHOUND, which in turn produces a list of putative binding sites. Extensive testing of SITEHOUND for the detection of binding sites for drug-like molecules and phosphorylated ligands has been carried out.
Availability: EASYMIFS and SITEHOUND executables for Linux, Mac …
Conformational Changes In Receptor Tyrosine Kinase Signaling: An Erbb Garden Of Delights., Goldi Kozloski
Conformational Changes In Receptor Tyrosine Kinase Signaling: An Erbb Garden Of Delights., Goldi Kozloski
Goldi A Kozloski
Context Is Highly Contextual!, Amit P. Sheth
Context Is Highly Contextual!, Amit P. Sheth
Kno.e.sis Publications
No abstract provided.
Integrative Analysis Of Cancer Genomic Data, Shuangge Ma
Integrative Analysis Of Cancer Genomic Data, Shuangge Ma
Shuangge Ma
In the past decade, we have witnessed a period of unparallel development in the field of cancer genomics. To address the same or similar biomedical questions, multiple cancer genomic studies have been independently designed and conducted. Cancer gene signatures identified from analysis of individual datasets often have low reproducibility. A cost-effective way of improving reproducibility is to conduct integrative analysis of datasets from multiple studies with comparable designs. To properly integrate multiple studies and conduct integrative analysis, we need to access various public data warehouses, retrieve experiment protocols and raw data, evaluate individual studies and select those with comparable designs, …
Integrative Clustering Of Multiple Genomic Data Types Using A Joint Latent Variable Model With Application To Breast And Lung Cancer Subtype Analysis, Ronglai Shen, Adam Olshen, Marc Ladanyi
Integrative Clustering Of Multiple Genomic Data Types Using A Joint Latent Variable Model With Application To Breast And Lung Cancer Subtype Analysis, Ronglai Shen, Adam Olshen, Marc Ladanyi
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic, and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment. We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations …
Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo
Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo
Johns Hopkins University, Dept. of Biostatistics Working Papers
Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, and is capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1,000 Genomes Project, plans to fully sequence the genomes of approximately 1,200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings …