Computer Sciences | Open Access Articles | Digital Commons Network™

Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie Sep 2010

Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets.

We have the following goals: i. To …

Go to article

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Sep 2010

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

Research Collection School Of Computing and Information Systems

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …

Go to article

Where In The World? Demographic Patterns In Access Data, Mimi Recker, Beijie Xu, Sherry Hsi, Christine Garrard Jun 2010

Where In The World? Demographic Patterns In Access Data, Mimi Recker, Beijie Xu, Sherry Hsi, Christine Garrard

Instructional Technology and Learning Sciences Faculty Publications

Standard webmetrics tools record the IP address of users’ computers, thereby providing fodder for analyses of their geographical location, and for understanding the impact of e-learning and teaching. Here we describe how two web-based educational systems were engineered to collect geo-referenced data. This is followed by a description of joining these data with demographic and educational datasets for the United States, and mapping different datasets using geographic information system (GIS) techniques to visually display their relationships. Results from statistical analyses of these relationships that highlight areas of significance are given.

Go to article

Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jun 2010

Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Spatio-temporal data concerning the movement of individuals over space and time contains latent information on the associations among these individuals. Sources of spatio-temporal data include usage logs of mobile and Internet technologies. This article defines a spatio-temporal event by the co-occurrences among individuals that indicate potential associations among them. Each spatio-temporal event is assigned a weight based on the precision and uniqueness of the event. By aggregating the weights of events relating two individuals, we can determine the strength of association between them. We conduct extensive experimentation to investigate both the efficacy of the proposed model as well as the …

Go to article

Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller Apr 2010

Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller

Life Sciences Faculty Research

Background

Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive predictions.

Methodology/Principal Findings

Certain domains and minimotifs are known to be strongly associated with a known cellular process or molecular function. Therefore, we hypothesized that by restricting minimotif predictions to those where the minimotif containing protein and target protein have …

Go to article

Interrogation Of Water Catchment Data Sets Using Data Mining Techniques, Ajdin Sehovic, Leisa Armstrong, Dean Diepeveen Jan 2010

Interrogation Of Water Catchment Data Sets Using Data Mining Techniques, Ajdin Sehovic, Leisa Armstrong, Dean Diepeveen

Research outputs pre 2011

Current environmental challenges such as increasing dry land salinity, water logging, eutrophication and high nutrient runoff in south western regions of Western Australia (WA) may have both cultural and environmental implications in the near future. Advances in computing through the application of data mining ,and geographic information services provide the tools to conduct •studies that can indicate possible changes in these water catchment areas of WA. The research examines the existing spatial data mining techniques that can be used to interpret trends in WA water catchment land use. Large GIS data sets of the water catchments on Peel-Harvey region have …

Go to article

An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum Jan 2010

An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum

Faculty and Research Publications

In this paper, we present our continuous research on similarity search problems. Previously we proposed PanKNN[18]which is a novel technique that explores the meaning of K nearest neighbors from a new perspective, redefines the distances between data points and a given query point Q, and efficiently and effectively selects data points which are closest to Q. It can be applied in various data mining fields. In this paper, we present our approach to solving the similarity search problem in the presence of obstacles. We apply the concept of obstacle points and process the similarity search problems in a different way. …

Go to article

Computer Sciences Commons^™

Full-Text Articles in Computer Sciences

Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

Research Collection School Of Computing and Information Systems

Where In The World? Demographic Patterns In Access Data, Mimi Recker, Beijie Xu, Sherry Hsi, Christine Garrard

Instructional Technology and Learning Sciences Faculty Publications

Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller

Life Sciences Faculty Research

Background

Methodology/Principal Findings

Interrogation Of Water Catchment Data Sets Using Data Mining Techniques, Ajdin Sehovic, Leisa Armstrong, Dean Diepeveen

Research outputs pre 2011

An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum

Faculty and Research Publications