Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Mining

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 18 of 18

Full-Text Articles in Life Sciences

Key Performance Indicators Detection Based Data Mining, Fatma Abogabal, Shimaa Mohamed Ouf, Amira M. Idrees Ami Jan 2023

Key Performance Indicators Detection Based Data Mining, Fatma Abogabal, Shimaa Mohamed Ouf, Amira M. Idrees Ami

Future Computing and Informatics Journal

One of the most prosperous domains that Data mining accomplished a great progress is Food Security and safety. Some of Data mining techniques studies applied several machine learning algorithms to enhance and traceability of food supply chain safety procedures and some of them applying machine learning methodologies with several feature selection methods for detecting and predicting the most significant key performance indicators affect food safety. In this research we proposed an adaptive data mining model applying nine machine learning algorithms (Naive Bayes, Bayes Net Key -Nearest Neighbor (KNN), Multilayer Perceptron (MLP), Random Forest (RF), Support Vector Machine (SVM), J48, Hoeffding …


The Proteomexchange Consortium In 2020: Enabling 'Big Data' Approaches In Proteomics., Eric W Deutsch, Nuno Bandeira, Vagisha Sharma, Yasset Perez-Riverol, Jeremy J Carver, Deepti J Kundu, David García-Seisdedos, Andrew F Jarnuczak, Suresh Hewapathirana, Benjamin S Pullman, Julie Wertz, Zhi Sun, Shin Kawano, Shujiro Okuda, Yu Watanabe, Henning Hermjakob, Brendan Maclean, Michael J Maccoss, Yunping Zhu, Yasushi Ishihama, Juan A Vizcaíno Jan 2020

The Proteomexchange Consortium In 2020: Enabling 'Big Data' Approaches In Proteomics., Eric W Deutsch, Nuno Bandeira, Vagisha Sharma, Yasset Perez-Riverol, Jeremy J Carver, Deepti J Kundu, David García-Seisdedos, Andrew F Jarnuczak, Suresh Hewapathirana, Benjamin S Pullman, Julie Wertz, Zhi Sun, Shin Kawano, Shujiro Okuda, Yu Watanabe, Henning Hermjakob, Brendan Maclean, Michael J Maccoss, Yunping Zhu, Yasushi Ishihama, Juan A Vizcaíno

Articles, Abstracts, and Reports

The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination of mass spectrometry proteomics data worldwide since 2012. In this paper, we describe the main developments since the previous update manuscript was published in Nucleic Acids Research in 2017. Since then, in addition to the four PX existing members at the time (PRIDE, PeptideAtlas including the PASSEL resource, MassIVE and jPOST), two new resources have joined PX: iProX (China) and Panorama Public (USA). We first describe the updated submission guidelines, now expanded to include six members. Next, with current data submission statistics, we demonstrate that the …


Signal Detection Of Adverse Drug Reaction Using The Adverse Event Reporting System: Literature Review And Novel Methods, Minh H. Pham Mar 2018

Signal Detection Of Adverse Drug Reaction Using The Adverse Event Reporting System: Literature Review And Novel Methods, Minh H. Pham

USF Tampa Graduate Theses and Dissertations

One of the objectives of the U.S. Food and Drug Administration is to protect the public health through post-marketing drug safety surveillance, also known as Pharmacovigilance. An inexpensive and efficient method to inspect post-marketing drug safety is to use data mining algorithms on electronic health records to discover associations between drugs and adverse events.

The purpose of this study is two-fold. First, we review the methods and algorithms proposed in the literature for identifying association drug interactions to an adverse event and discuss their advantages and drawbacks. Second, we attempt to adapt some novel methods that have been used in …


Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang Feb 2018

Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang

Computer Science Faculty Publications

Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations.

Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor …


Discovering And Linking Public Omics Data Sets Using The Omics Discovery Index., Yasset Perez-Riverol, Mingze Bai, Felipe Da Veiga Leprevost, Silvano Squizzato, Young Mi Park, Kenneth Haug, Adam J Carroll, Dylan Spalding, Justin Paschall, Mingxun Wang, Noemi Del-Toro, Tobias Ternent, Peng Zhang, Nicola Buso, Nuno Bandeira, Eric W Deutsch, David S Campbell, Ronald C Beavis, Reza M Salek, Ugis Sarkans, Robert Petryszak, Maria Keays, Eoin Fahy, Manish Sud, Shankar Subramaniam, Ariana Barbera, Rafael C Jiménez, Alexey I Nesvizhskii, Susanna-Assunta Sansone, Christoph Steinbeck, Rodrigo Lopez, Juan A Vizcaíno, Peipei Ping, Henning Hermjakob May 2017

Discovering And Linking Public Omics Data Sets Using The Omics Discovery Index., Yasset Perez-Riverol, Mingze Bai, Felipe Da Veiga Leprevost, Silvano Squizzato, Young Mi Park, Kenneth Haug, Adam J Carroll, Dylan Spalding, Justin Paschall, Mingxun Wang, Noemi Del-Toro, Tobias Ternent, Peng Zhang, Nicola Buso, Nuno Bandeira, Eric W Deutsch, David S Campbell, Ronald C Beavis, Reza M Salek, Ugis Sarkans, Robert Petryszak, Maria Keays, Eoin Fahy, Manish Sud, Shankar Subramaniam, Ariana Barbera, Rafael C Jiménez, Alexey I Nesvizhskii, Susanna-Assunta Sansone, Christoph Steinbeck, Rodrigo Lopez, Juan A Vizcaíno, Peipei Ping, Henning Hermjakob

Articles, Abstracts, and Reports

No abstract provided.


Stage-Specific Predictive Models For Cancer Survivability, Elham Sagheb Hossein Pour Dec 2016

Stage-Specific Predictive Models For Cancer Survivability, Elham Sagheb Hossein Pour

Theses and Dissertations

Survivability of cancer strongly depends on the stage of cancer. In most previous works, machine learning survivability prediction models for a particular cancer, were trained and evaluated together on all stages of the cancer. In this work, we trained and evaluated survivability prediction models for five major cancers, together on all stages and separately for every stage. We named these models joint and stage-specific models respectively. The obtained results for the cancers which we investigated reveal that, the best model to predict the survivability of the cancer for one specific stage is the model which is specifically built for that …


Elevated Integrin Α6Β4 Expression Is Associated With Venous Invasion And Decreased Overall Survival In Non-Small Cell Lung Cancer, Rachel L. Stewart, Dava West, Chi Wang, Heidi L. Weiss, Tamas S. Gal, Eric B. Durbin, William O'Connor, Min Chen, Kathleen L. O'Connor Aug 2016

Elevated Integrin Α6Β4 Expression Is Associated With Venous Invasion And Decreased Overall Survival In Non-Small Cell Lung Cancer, Rachel L. Stewart, Dava West, Chi Wang, Heidi L. Weiss, Tamas S. Gal, Eric B. Durbin, William O'Connor, Min Chen, Kathleen L. O'Connor

Pathology and Laboratory Medicine Faculty Publications

Lung cancer carries a poor prognosis and is the most common cause of cancer-related death worldwide. The integrin α6β4, a laminin receptor, promotes carcinoma progression in part by cooperating with various growth factor receptors to facilitate invasion and metastasis. In carcinoma cells with mutant TP53, the integrin α6β4 promotes cell survival. TP53 mutations and integrin α6β4 overexpression co-occur in many aggressive malignancies. Because of the high frequency of TP53 mutations in lung squamous cell carcinoma (SCC), we sought to investigate the association of integrin β4 expression with clinicopathologic features and survival in non–small cell lung cancer (NSCLC). We constructed …


Domain Specific Document Retrieval Framework For Real-Time Social Health Data, Swapnil Soni Jul 2015

Domain Specific Document Retrieval Framework For Real-Time Social Health Data, Swapnil Soni

Kno.e.sis Publications

With the advent of the web search and microblogging, the percentage of Online Health Information Seekers (OHIS) using these online services to share and seek health real-time information has in- creased exponentially. OHIS use web search engines or microblogging search services to seek out latest, relevant as well as reliable health in- formation. When OHIS turn to microblogging search services to search real-time content, trends and breaking news, etc. the search results are not promising. Two major challenges exist in the current microblogging search engines are keyword based techniques and results do not contain real-time information. To address these challenges, …


Generating A Focused View Of Disease Ontology Cancer Terms For Pan-Cancer Data Integration And Analysis., Tsung-Jung Wu, Lynn M. Schriml, Qing-Rong Chen, Maureen Colbert, Daniel J. Crichton, Raja Mazumder, Ying Hu, + 10 More Apr 2015

Generating A Focused View Of Disease Ontology Cancer Terms For Pan-Cancer Data Integration And Analysis., Tsung-Jung Wu, Lynn M. Schriml, Qing-Rong Chen, Maureen Colbert, Daniel J. Crichton, Raja Mazumder, Ying Hu, + 10 More

Biochemistry and Molecular Medicine Faculty Publications

Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to …


Mining Effective Multi-Segment Sliding Window For Pathogen Incidence Rate Prediction, Lei Duan, Changjie Tang, Xiasong Li, Guozhu Dong, Xianming Wang, Jie Zuo, Min Jiang, Zhongqi Li, Yongqing Zhang Sep 2013

Mining Effective Multi-Segment Sliding Window For Pathogen Incidence Rate Prediction, Lei Duan, Changjie Tang, Xiasong Li, Guozhu Dong, Xianming Wang, Jie Zuo, Min Jiang, Zhongqi Li, Yongqing Zhang

Kno.e.sis Publications

Pathogen incidence rate prediction, which can be considered as time series modeling, is an important task for infectious disease incidence rate prediction and for public health. This paper investigates the application of a genetic computation technique, namely GEP, for pathogen incidence rate prediction. To overcome the shortcomings of traditional sliding windows in GEP-based time series modeling, the paper introduces the problem of mining effective sliding window, for discovering optimal sliding windows for building accurate prediction models. To utilize the periodical characteristic of pathogen incidence rates, a multi-segment sliding window consisting of several segments from different periodical intervals is proposed and …


Mining Climate Data For Shire Level Wheat Yield Predictions In Western Australia, Yunous Vagh Jan 2013

Mining Climate Data For Shire Level Wheat Yield Predictions In Western Australia, Yunous Vagh

Theses: Doctorates and Masters

Climate change and the reduction of available agricultural land are two of the most important factors that affect global food production especially in terms of wheat stores. An ever increasing world population places a huge demand on these resources. Consequently, there is a dire need to optimise food production.

Estimations of crop yield for the South West agricultural region of Western Australia have usually been based on statistical analyses by the Department of Agriculture and Food in Western Australia. Their estimations involve a system of crop planting recommendations and yield prediction tools based on crop variety trials. However, many crop …


Cross-Ontology Multi-Level Association Rule Mining In The Gene Ontology., Prashanti Manda, Seval Ozkan, Hui Wang, Fiona M. Mccarthy, Susan M. Bridges Oct 2012

Cross-Ontology Multi-Level Association Rule Mining In The Gene Ontology., Prashanti Manda, Seval Ozkan, Hui Wang, Fiona M. Mccarthy, Susan M. Bridges

Bagley College of Engineering Publications and Scholarship

The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by …


Pattern Space Maintenance For Data Updates And Interactive Mining, Mengling Feng, Guozhu Dong, Jinyan Li, Yap-Peng Tan, Limsoon Wong Aug 2010

Pattern Space Maintenance For Data Updates And Interactive Mining, Mengling Feng, Guozhu Dong, Jinyan Li, Yap-Peng Tan, Limsoon Wong

Kno.e.sis Publications

This article addresses the incremental and decremental maintenance of the frequent pattern space. We conduct an in-depth investigation on how the frequent pattern space evolves under both incremental and decremental updates. Based on the evolution analysis, a new data structure, Generator-Enumeration Tree (GE-tree), is developed to facilitate the maintenance of the frequent pattern space. With the concept of GE-tree, we propose two novel algorithms, Pattern Space Maintainer+ (PSM+) and Pattern Space Maintainer− (PSM−), for the incremental and decremental maintenance of frequent patterns. Experimental results demonstrate that the proposed algorithms, on average, outperform the representative state-of-the-art …


Protein Secondary Structure Prediction Using Parallelized Rule Induction From Coverings, Leong Lee, Cyriac Kandoth, Jennifer Leopold, Ronald L. Frank Dec 2009

Protein Secondary Structure Prediction Using Parallelized Rule Induction From Coverings, Leong Lee, Cyriac Kandoth, Jennifer Leopold, Ronald L. Frank

Computer Science Faculty Research & Creative Works

Protein 3D structure prediction has always been an important research area in bioinformatics. In particular, the prediction of secondary structure has been a well-studied research topic. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction algorithms rarely has exceeded 75%. In a previous paper [1], this research team presented a rule-based method called RT-RICO (Relaxed Threshold Rule Induction from Coverings) to predict protein secondary structure. The average Q3 accuracy on the sample datasets using RT-RICO was 80.3%, an improvement over comparable computational methods. …


Efficient Computation Of Iceberg Cubes By Bounding Aggregate Functions, Xiuzhen Zhang, Pauline Lienhua Chou, Guozhu Dong Jul 2007

Efficient Computation Of Iceberg Cubes By Bounding Aggregate Functions, Xiuzhen Zhang, Pauline Lienhua Chou, Guozhu Dong

Kno.e.sis Publications

The iceberg cubing problem is to compute the multidimensional group-by partitions that satisfy given aggregation constraints. Pruning unproductive computation for iceberg cubing when nonantimonotone constraints are present is a great challenge because the aggregate functions do not increase or decrease monotonically along the subset relationship between partitions. In this paper, we propose a novel bound prune cubing (BP-Cubing) approach for iceberg cubing with nonantimonotone aggregation constraints. Given a cube over n dimensions, an aggregate for any group-by partition can be computed from aggregates for the most specific n--dimensional partitions (MSPs). The largest and smallest aggregate values computed this way become …


Analysis Of Factors Affecting Corn Masa Byproduct Generation, Kurt A. Rosentrater Aug 2004

Analysis Of Factors Affecting Corn Masa Byproduct Generation, Kurt A. Rosentrater

Kurt A. Rosentrater

The production of corn masa-based products in the U.S. has been increasing over the last several years, and consequently, so has the volume of waste materials generated from this processing sector. These byproducts, which consist of corn dry matter losses that occur during the nixtamalization process, are currently underutilized, but have much potential for value-added processing and utilization, and thus hold the simultaneous promises of economic benefit for corn processors as well as decreased potential impact on surrounding ecosystems. Because information concerning masa byproducts, and the rate at which they are generated, is currently very limited and not readily available, …


Identifying Character Non-Independence In Phylogenetic Data Using Data Mining Techniques, Jennifer Leopold, Anne M. Maglia, M. Thakur, B. Patel, Fikret ErçAl Jan 2004

Identifying Character Non-Independence In Phylogenetic Data Using Data Mining Techniques, Jennifer Leopold, Anne M. Maglia, M. Thakur, B. Patel, Fikret ErçAl

Computer Science Faculty Research & Creative Works

Undiscovered relationships in a data set may confound analyses, particularly those that assume data independence. Such problems occur when characters used for phylogenetic analyses are not independent of one another. A main assumption of phylogenetic inference methods such as maximum likelihood and parsimony is that each character serves as an independent hypothesis of evolution. When this assumption is violated, the resulting phylogeny may not reflect true evolutionary history. Therefore, it is imperative that character non-independence be identified prior to phylogenetic analyses. To identify dependencies between phylogenetic characters, we applied three data mining techniques: 1) Bayesian networks, 2) decision tree induction, …


Summarizing Data Sets For Classification, Christopher W. Kinzig, Krishnaprasad Thirunarayan, Gary B. Lamont, Robert E. Marmelstein Jun 2001

Summarizing Data Sets For Classification, Christopher W. Kinzig, Krishnaprasad Thirunarayan, Gary B. Lamont, Robert E. Marmelstein

Kno.e.sis Publications

This paper describes our approach and experiences with implementing a data mining system using genetic algorithms in C++. In contrast with earlier classification algorithms that tended to “tile” the data sets using some pre-specified “shapes”, the proposed system is based on Marmelstein’s work on determining natural boundaries for class homogeneous regions. These boundaries are further refined to construct a compact set of simple data mining rules for classification.