Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Publication
- File Type
Articles 1 - 16 of 16
Full-Text Articles in Computer Sciences
Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek
Rough-Fuzzy Hybrid Approach For Identification Of Bio-Markers And Classification On Alzheimer's Disease Data, Changsu Lee, Chiou-Peng Lam, Martin Masek
Martin Masek
A new approach is proposed in this paper for identification of biomarkers and classification on Alzheimer's disease data by employing a rough-fuzzy hybrid approach called ARFIS (a framework for Adaptive TS-type Rough-Fuzzy Inference Systems). In this approach, the entropy-based discretization technique is employed first on the training data to generate clusters for each attribute with respect to the output information. The rough set-based feature reduction method is then utilized to reduce the number of features in a decision table obtained using the cluster information. Another rough set-based approach is employed for the generation of decision rules. After the construction and …
Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz
Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz
David LO
Specification mining extracts candidate specification from existing systems, to be used for downstream tasks such as testing and verification. Specifically, we are interested in the extraction of behavior models from execution traces. In this paper we introduce mining of branching-time scenarios in the form of existential, conditional Live Sequence Charts, using a statistical data-mining algorithm. We show the power of branching scenarios to reveal alternative scenario-based behaviors, which could not be mined by previous approaches. The work contrasts and complements previous works on mining linear-time scenarios. An implementation and evaluation over execution trace sets recorded from several real-world applications shows …
Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall
Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall
David LO
Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …
The Rule Of Law In Cyberspace, Mireille Hildebrandt
The Rule Of Law In Cyberspace, Mireille Hildebrandt
Mireille Hildebrandt
This is a translation of my inaugural lecture at Radboud University Nijmegen. The Dutch version has been published as a booklet, the English version in available on my bepress site.
Applying Data Mining Techniques In The Selection Of Plant Traits, Dean Diepeveen, Leisa Armstrong
Applying Data Mining Techniques In The Selection Of Plant Traits, Dean Diepeveen, Leisa Armstrong
Leisa Armstrong
In the agricultural sector, farmers are provided with crop related information by various research agencies in order to make critical decisions about which is the most profitable crop variety choice. Research agencies provide information which is generic, rather than being tailored to the individual farmers cropping situation. A number of specific plant and growth traits are used to establish the most suitable crop varieties. When selecting crop varieties for release to growers, the application of data mining techniques to crop research data enables the customization of information to each individual farmers farming situation. The challenge for agricultural research perspective is …
An Evaluation Of Methodologies For Eagriculture In An Australian Context, Leisa Armstrong, Dean Diepeveen
An Evaluation Of Methodologies For Eagriculture In An Australian Context, Leisa Armstrong, Dean Diepeveen
Leisa Armstrong
Australian agricultural producers’ profits are dependent on the decisions they make about farm productivity systems. They may use recommendations and information provided by government agencies and private consultants. For cereal growers, success is dependent on decisions made about selection of crop varieties suitable for their agronomic and climatic conditions. This paper reports on research which aimed to evaluate some current eAgriculture methodologies for their application in the Western Australian agricultural industry. In particular the paper illustrates the findings from a project which aimed to explain the variability seen in crop varieties grown in Western Australia. The problems associated with crop …
An Eagriculture-Based Decision Support Framework For Information Dissemination, Leisa Armstrong, Dean Diepeveen, Khumphicha Tantisantisom
An Eagriculture-Based Decision Support Framework For Information Dissemination, Leisa Armstrong, Dean Diepeveen, Khumphicha Tantisantisom
Leisa Armstrong
The ability of farmers to acquire knowledge to make decisions is limited by the information quality and applicability. Inconsistencies information delivery and standards/or the integration o/information also limit decision making processes. This research uses a similar approach to the Knowledge Discovery in Databases (KDD) methodology to develop an ICT based framework which can be used to facilitate the acquisition of knowledge for farmer's' decision making processes. This is one of the leading areas of research and development for information technology in an agricultural industry, which is yet to utilize such technologies fully. The Farmer Knowledge and Decision Support Framework (FKDSF) …
An Information-Based Decision Support Framework For Eagriculture, Leisa Armstrong, Dean Diepeveen
An Information-Based Decision Support Framework For Eagriculture, Leisa Armstrong, Dean Diepeveen
Leisa Armstrong
The ability of farmers to acquire knowledge to make decisions is limited by the information quality and applicability. An inconsistency in information delivery and standards for the integration of information also limits the decision making process. Knowledge Discovery in Databases (KDD) methodology described for the data mining is an example of how frameworks can be used to facilitate such data integration. This research will examine how such a ICT based framework can be used to facilitate the acquisition of knowledge for the farmer decision making process. The Farmer Knowledge and Decision Support Framework (FKDSF) takes information provided to farmers and …
Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi
Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi
David LO
In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …
Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder
Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder
William B. Andreopoulos
Unsupervised Deduplication Using Cross-Field Dependencies, Robert Hall, Charles Sutton, Andrew Mccallum
Unsupervised Deduplication Using Cross-Field Dependencies, Robert Hall, Charles Sutton, Andrew Mccallum
Andrew McCallum
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes collectively, they do not model them collectively. For example, in citations in the research literature, canonical venue strings and title strings are dependent---because venues tend to focus on a few research areas---but this dependence is not modeled by current unsupervised techniques. We call this dependence between fields in a record a cross-field dependence. In this paper, we present an unsupervised generative model for the deduplication problem that explicitly models cross-field dependence. Our model uses a single set …
Generalized Component Analysis For Text With Heterogeneous Attributes, Xuerui Wang, Chris Pal, Andrew Mccallum
Generalized Component Analysis For Text With Heterogeneous Attributes, Xuerui Wang, Chris Pal, Andrew Mccallum
Andrew McCallum
We present a class of richly structured, undirected hidden variable models suitable for simultaneously modeling text along with other attributes encoded in different modalities. Our model generalizes techniques such as Principal Component Analysis to heterogeneous data types. In contrast to other approaches, this framework allows modalities such as words, authors and timestamps to be captured in their natural, probabilistic encodings. We demonstrate the effectiveness of our framework on the task of author prediction from 13 years of the NIPS conference proceedings and for a recipient prediction task using a 10-month academic email archive of a researcher. Our approach should be …
Canonicalization Of Database Records Using Adaptive Similarity Measures, Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, Andrew Mccallum
Canonicalization Of Database Records Using Adaptive Similarity Measures, Aron Culotta, Michael Wick, Robert Hall, Matthew Marzilli, Andrew Mccallum
Andrew McCallum
It is becoming increasingly common to construct databases from information automatically culled from many heterogeneous sources. For example, a research publication database can be constructed by automatically extracting titles, authors, and conference information from papers and their references. A common difficulty in consolidating data from multiple sources is that records are referenced in a variety of ways (e.g. abbreviations, aliases, and misspellings). Therefore, it can be difficult to construct a single, standard representation to present to the user. We refer to the task of constructing this representation as canonicalization. Despite its importance, there is very little existing work on canonicalization. …
Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang
Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang
William B. Andreopoulos
Topics Over Time: A Nonmarkov Continuoustime Model Of Topical Trends, Xuerui Wang, Andrew Mccallum
Topics Over Time: A Nonmarkov Continuoustime Model Of Topical Trends, Xuerui Wang, Andrew Mccallum
Andrew McCallum
This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, …
Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum
Group And Topic Discovery From Relations And Text, Xuerui Wang, Natasha Mohanty, Andrew Mccallum
Andrew McCallum
We present a probabilistic generative model of entity relationships and textual attributes that simultaneously discovers groups among the entities and topics among the corresponding text. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the words associated with certain relationships. Significantly, joint inference allows the discovery of groups to be guided by the emerging topics, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and 43 years …