Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 16 of 16

Full-Text Articles in Computer Sciences

Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek Jul 2015

Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek

Martin Masek

A new approach is proposed in this paper for identification of biomarkers and classification on Alzheimer's disease data by employing a rough-fuzzy hybrid approach called ARFIS (a framework for Adaptive TS-type Rough-Fuzzy Inference Systems). In this approach, the entropy-based discretization technique is employed first on the training data to generate clusters for each attribute with respect to the output information. The rough set-based feature reduction method is then utilized to reduce the number of features in a decision table obtained using the cluster information. Another rough set-based approach is employed for the generation of decision rules. After the construction and …


Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz Jun 2014

Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz

David LO

Specification mining extracts candidate specification from existing systems, to be used for downstream tasks such as testing and verification. Specifically, we are interested in the extraction of behavior models from execution traces. In this paper we introduce mining of branching-time scenarios in the form of existential, conditional Live Sequence Charts, using a statistical data-mining algorithm. We show the power of branching scenarios to reveal alternative scenario-based behaviors, which could not be mined by previous approaches. The work contrasts and complements previous works on mining linear-time scenarios. An implementation and evaluation over execution trace sets recorded from several real-world applications shows …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Jun 2014

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

David LO

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


The Rule Of Law In Cyberspace, Mireille Hildebrandt Jun 2013

The Rule Of Law In Cyberspace, Mireille Hildebrandt

Mireille Hildebrandt

This is a translation of my inaugural lecture at Radboud University Nijmegen. The Dutch version has been published as a booklet, the English version in available on my bepress site.


Applying Data Mining Techniques In The Selection Of Plant Traits, Dean Diepeveen, Leisa Armstrong Feb 2012

Applying Data Mining Techniques In The Selection Of Plant Traits, Dean Diepeveen, Leisa Armstrong

Leisa Armstrong

In the agricultural sector, farmers are provided with crop related information by various research agencies in order to make critical decisions about which is the most profitable crop variety choice. Research agencies provide information which is generic, rather than being tailored to the individual farmers cropping situation. A number of specific plant and growth traits are used to establish the most suitable crop varieties. When selecting crop varieties for release to growers, the application of data mining techniques to crop research data enables the customization of information to each individual farmers farming situation. The challenge for agricultural research perspective is …


An Evaluation Of Methodologies For Eagriculture In An Australian Context, Leisa Armstrong, Dean Diepeveen Feb 2012

An Evaluation Of Methodologies For Eagriculture In An Australian Context, Leisa Armstrong, Dean Diepeveen

Leisa Armstrong

Australian agricultural producers’ profits are dependent on the decisions they make about farm productivity systems. They may use recommendations and information provided by government agencies and private consultants. For cereal growers, success is dependent on decisions made about selection of crop varieties suitable for their agronomic and climatic conditions. This paper reports on research which aimed to evaluate some current eAgriculture methodologies for their application in the Western Australian agricultural industry. In particular the paper illustrates the findings from a project which aimed to explain the variability seen in crop varieties grown in Western Australia. The problems associated with crop …


An Eagriculture-Based Decision Support Framework For Information Dissemination, Leisa Armstrong, Dean Diepeveen, Khumphicha Tantisantisom Feb 2012

An Eagriculture-Based Decision Support Framework For Information Dissemination, Leisa Armstrong, Dean Diepeveen, Khumphicha Tantisantisom

Leisa Armstrong

The ability of farmers to acquire knowledge to make decisions is limited by the information quality and applicability. Inconsistencies information delivery and standards/or the integration o/information also limit decision making processes. This research uses a similar approach to the Knowledge Discovery in Databases (KDD) methodology to develop an ICT based framework which can be used to facilitate the acquisition of knowledge for farmer's' decision making processes. This is one of the leading areas of research and development for information technology in an agricultural industry, which is yet to utilize such technologies fully. The Farmer Knowledge and Decision Support Framework (FKDSF) …


An Information-Based Decision Support Framework For Eagriculture, Leisa Armstrong, Dean Diepeveen Feb 2012

An Information-Based Decision Support Framework For Eagriculture, Leisa Armstrong, Dean Diepeveen

Leisa Armstrong

The ability of farmers to acquire knowledge to make decisions is limited by the information quality and applicability. An inconsistency in information delivery and standards for the integration of information also limits the decision making process. Knowledge Discovery in Databases (KDD) methodology described for the data mining is an example of how frameworks can be used to facilitate such data integration. This research will examine how such a ICT based framework can be used to facilitate the acquisition of knowledge for the farmer decision making process. The Farmer Knowledge and Decision Support Framework (FKDSF) takes information provided to farmers and …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Nov 2011

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder Sep 2008

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder

William B. Andreopoulos

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an …


Unsupervised Deduplication Using Cross-Field Dependencies, Robert Hall, Charles Sutton, Andrew Mccallum Jan 2008

Unsupervised Deduplication Using Cross-Field Dependencies, Robert Hall, Charles Sutton, Andrew Mccallum

Andrew McCallum

Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent---because venues tend to focus on a few research areas---but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a cross-field dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models cross-field dependence. Our model uses a single set …


Generalized Component Analysis For Text With Heterogeneous Attributes, Xuerui Wang, Chris Pal, Andrew Mccallum Jan 2007

Generalized Component Analysis For Text With Heterogeneous Attributes, Xuerui Wang, Chris Pal, Andrew Mccallum

Andrew McCallum

We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. Our model generalizes techniques such as Principal Component Analysis to heterogeneous data types. In contrast to other approaches, this framework allows modalities such as words, authors and timestamps to be captured in their natural, probabilistic encodings. We demonstrate the effectiveness of our framework on the task of author prediction from 13 years of the NIPS conference proceedings and for a recipient prediction task using a 10-month academic email archive of a researcher. Our approach should be …


Canonicalization Of Database Records Using Adaptive Similarity Measures, Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, Andrew Mccallum Jan 2007

Canonicalization Of Database Records Using Adaptive Similarity Measures, Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, Andrew Mccallum

Andrew McCallum

It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can be constructed by automatically extracting titles, authors, and conference information from papers and their references. A common difficulty in consolidating data from multiple sources is that records are referenced in a variety of ways (e.g. abbreviations, aliases, and misspellings). Therefore, it can be difficult to construct a single, standard representation to present to the user. We refer to the task of constructing this representation as canonicalization. Despite its importance, there is very little existing work on canonicalization. …


Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang Jun 2006

Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang

William B. Andreopoulos

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.


Topics Over Time: A Nonmarkov Continuoustime Model Of Topical Trends, Xuerui Wang, Andrew Mccallum Jan 2006

Topics Over Time: A Nonmarkov Continuoustime Model Of Topical Trends, Xuerui Wang, Andrew Mccallum

Andrew McCallum

This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, …


Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum Jan 2005

Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum

Andrew McCallum

We present a probabilistic generative model of entity relationships and textual attributes that simultaneously discovers groups among the entities and topics among the corresponding text. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the words associated with certain relationships. Significantly, joint inference allows the discovery of groups to be guided by the emerging topics, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and 43 years …