Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Computer Sciences

Evolutionary Strategies For Data Mining, Rose Lowe Dec 2010

Evolutionary Strategies For Data Mining, Rose Lowe

All Dissertations

Learning classifier systems (LCS) have been successful in generating rules for solving classification problems in data mining. The rules are of the form IF condition THEN action. The condition encodes the features of the input space and the action encodes the class label. What is lacking in those systems is the ability to express each feature using a function that is appropriate for that feature. The genetic algorithm is capable of doing this but cannot because only one type of membership function
is provided. Thus, the genetic algorithm learns only the shape and placement of the membership function, and in …


Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie Sep 2010

Extreme Data Mining: Inference From Small Datasets, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets.

We have the following goals: i. To …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Sep 2010

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

Research Collection School Of Computing and Information Systems

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Event-Driven Similarity And Classification Of Scanpaths, Thomas Grindinger Aug 2010

Event-Driven Similarity And Classification Of Scanpaths, Thomas Grindinger

All Dissertations

Eye tracking experiments often involve recording the pattern of deployment of visual attention over the stimulus as viewers perform a given task (e.g., visual search). It is useful in training applications, for example, to make available an expert's sequence of eye movements, or scanpath, to novices for their inspection and subsequent learning. It may also be potentially useful to be able to assess the conformance of the novice's scanpath to that of the expert. A computational tool is proposed that provides a framework for performing such classification, based on the use of a probabilistic machine learning algorithm. The approach was …


Where In The World? Demographic Patterns In Access Data, Mimi Recker, Beijie Xu, Sherry Hsi, Christine Garrard Jun 2010

Where In The World? Demographic Patterns In Access Data, Mimi Recker, Beijie Xu, Sherry Hsi, Christine Garrard

Instructional Technology and Learning Sciences Faculty Publications

Standard webmetrics tools record the IP address of users’ computers, thereby providing fodder for analyses of their geographical location, and for understanding the impact of e-learning and teaching. Here we describe how two web-based educational systems were engineered to collect geo-referenced data. This is followed by a description of joining these data with demographic and educational datasets for the United States, and mapping different datasets using geographic information system (GIS) techniques to visually display their relationships. Results from statistical analyses of these relationships that highlight areas of significance are given.


Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jun 2010

Stevent: Spatio-Temporal Event Model For Social Network Discovery, Hady W. Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Spatio-temporal data concerning the movement of individuals over space and time contains latent information on the associations among these individuals. Sources of spatio-temporal data include usage logs of mobile and Internet technologies. This article defines a spatio-temporal event by the co-occurrences among individuals that indicate potential associations among them. Each spatio-temporal event is assigned a weight based on the precision and uniqueness of the event. By aggregating the weights of events relating two individuals, we can determine the strength of association between them. We conduct extensive experimentation to investigate both the efficacy of the proposed model as well as the …


Enterprise Users And Web Search Behavior, April Ann Lewis May 2010

Enterprise Users And Web Search Behavior, April Ann Lewis

Masters Theses

This thesis describes analysis of user web query behavior associated with Oak Ridge National Laboratory’s (ORNL) Enterprise Search System (Hereafter, ORNL Intranet). The ORNL Intranet provides users a means to search all kinds of data stores for relevant business and research information using a single query. The Global Intranet Trends for 2010 Report suggests the biggest current obstacle for corporate intranets is “findability and Siloed content”. Intranets differ from internets in the way they create, control, and share content which can make it often difficult and sometimes impossible for users to find information. Stenmark (2006) first noted studies of corporate …


Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller Apr 2010

Partitioning Of Minimotifs Based On Function With Improved Prediction Accuracy, Sanguthevar Rajasekaran, Tian Mi, Jerlin Camilus Merlin, Aaron Oommen, Patrick R. Gradie, Martin R. Schiller

Life Sciences Faculty Research

Background

Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive predictions.

Methodology/Principal Findings

Certain domains and minimotifs are known to be strongly associated with a known cellular process or molecular function. Therefore, we hypothesized that by restricting minimotif predictions to those where the minimotif containing protein and target protein have …


The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham Jan 2010

The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham

LSU Doctoral Dissertations

Current classification approaches usually do not try to achieve a balance between fitting and generalization when they infer models from training data. Such approaches ignore the possibility of different penalty costs for the false-positive, false-negative, and unclassifiable types. Thus, their performances may not be optimal or may even be coincidental. This dissertation analyzes the above issues in depth. It also proposes two new approaches called the Homogeneity-Based Algorithm (HBA) and the Convexity-Based Algorithm (CBA) to address these issues. These new approaches aim at optimally balancing the data fitting and generalization behaviors of models when some traditional classification approaches are used. …


Interrogation Of Water Catchment Data Sets Using Data Mining Techniques, Ajdin Sehovic, Leisa Armstrong, Dean Diepeveen Jan 2010

Interrogation Of Water Catchment Data Sets Using Data Mining Techniques, Ajdin Sehovic, Leisa Armstrong, Dean Diepeveen

Research outputs pre 2011

Current environmental challenges such as increasing dry land salinity, water logging, eutrophication and high nutrient runoff in south western regions of Western Australia (WA) may have both cultural and environmental implications in the near future. Advances in computing through the application of data mining ,and geographic information services provide the tools to conduct •studies that can indicate possible changes in these water catchment areas of WA. The research examines the existing spatial data mining techniques that can be used to interpret trends in WA water catchment land use. Large GIS data sets of the water catchments on Peel-Harvey region have …


An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum Jan 2010

An Attempt To Find Neighbors, Yong Shi, Ryan Rosenblum

Faculty and Research Publications

In this paper, we present our continuous research on similarity search problems. Previously we proposed PanKNN[18]which is a novel technique that explores the meaning of K nearest neighbors from a new perspective, redefines the distances between data points and a given query point Q, and efficiently and effectively selects data points which are closest to Q. It can be applied in various data mining fields. In this paper, we present our approach to solving the similarity search problem in the presence of obstacles. We apply the concept of obstacle points and process the similarity search problems in a different way. …