Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

2009

Data mining

Discipline
Institution
Publication
Publication Type

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

Educational Data Mining Approaches For Digital Libraries, Mimi Recker, Sherry Hsi, Beijie Xu, Rob Rothfarb Nov 2009

Educational Data Mining Approaches For Digital Libraries, Mimi Recker, Sherry Hsi, Beijie Xu, Rob Rothfarb

Instructional Technology and Learning Sciences Faculty Publications

This collaborative research project between the Exploratorium and Utah State's Department of Instructional Technology and Learning Sciences investigates online evaluation approaches and the application of educational data mining to educational digital libraries and services. Much work over the past decades has focused on developing algorithms and methods for discovering patterns in large datasets, known as Knowledge Discovery from Data (KDD). Webmetrics, the application of KDD to web usage mining, is growing rapidly in areas such as e-commerce. Educational Data Mining (EDM) is just beginning to emerge as a tool to analyze the massive, longitudinal user data that are captured in …


Artificial Intelligence – Ii: Anomaly Detection In Data Streams Using Fuzzy Logic, Muhammad Umair Khan Aug 2009

Artificial Intelligence – Ii: Anomaly Detection In Data Streams Using Fuzzy Logic, Muhammad Umair Khan

International Conference on Information and Communication Technologies

Unsupervised data mining techniques require human intervention for understanding and analysis of the clustering results. This becomes an issue in dynamic users/applications and there is a need for real-time decision making and interpretation. In this paper we will present an approach to automate the annotation of results obtained from data stream clustering to facilitate interpreting that whether the given cluster is an anomaly or not. We use fuzzy logic to label the data. The results will be obtained on the basis of density function & the number of elements in a certain cluster.


Artificial Intelligence – I: A Two-Step Approach For Improving Efficiency Of Feedforward Multilayer Perceptrons Network, Shoukat Ullah, Zakia Hussain Aug 2009

Artificial Intelligence – I: A Two-Step Approach For Improving Efficiency Of Feedforward Multilayer Perceptrons Network, Shoukat Ullah, Zakia Hussain

International Conference on Information and Communication Technologies

An artificial neural network has got greater importance in the field of data mining. Although it may have complex structure, long training time, and uneasily understandable representation of results, neural network has high accuracy and is preferable in data mining. This research paper is aimed to improve efficiency and to provide accurate results on the basis of same behaviour data. To achieve these objectives, an algorithm is proposed that uses two data mining techniques, that is, attribute selection method and cluster analysis. The algorithm works by applying attribute selection method to eliminate irrelevant attributes, so that input dimensionality is reduced …


Artificial Intelligence – I: Subjective Decision Making Using Type-2 Fuzzy Logic Advisor, Owais Malik Aug 2009

Artificial Intelligence – I: Subjective Decision Making Using Type-2 Fuzzy Logic Advisor, Owais Malik

International Conference on Information and Communication Technologies

In this paper, we present and compare two-stage type-2 fuzzy logic advisor (FLA) for subjective decision making in the domain of students' performance evaluation. We test our proposed model for evaluating students' performance in our computer science and engineering department at HBCC/KFUPM in two domains namely cooperating training and capstone/senior project assessment where we find these FLAs very useful and promising. In our proposed model, the assessment criteria for different components of cooperative training and senior project are transformed into linguistic labels and evaluation information is extracted into the form of IF-THEN rules from the experts. These rules are modeled …


Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu Aug 2009

Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu

Research Collection School Of Computing and Information Systems

To improve software productivity and quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks. However, mining SE data poses several challenges. The authors present various algorithms to effectively mine sequences, graphs, and text from such data.


A Framework For Consistency Based Feature Selection, Pengpeng Lin May 2009

A Framework For Consistency Based Feature Selection, Pengpeng Lin

Masters Theses & Specialist Projects

Feature selection is an effective technique in reducing the dimensionality of features in many applications where datasets involve hundreds or thousands of features. The objective of feature selection is to find an optimal subset of relevant features such that the feature size is reduced and understandability of a learning process is improved without significantly decreasing the overall accuracy and applicability. This thesis focuses on the consistency measure where a feature subset is consistent if there exists a set of instances of length more than two with the same feature values and the same class labels. This thesis introduces a new …


Strategic Data Mining And Database Development For Research Projects At Lake Mead, Nevada-Arizona Usa, James Pollard, Gretchen M. Andrew Jan 2009

Strategic Data Mining And Database Development For Research Projects At Lake Mead, Nevada-Arizona Usa, James Pollard, Gretchen M. Andrew

Lake Mead Science Symposium

“Water 2025” is a Department of Interior initiative designed to guide the management of scarce water resources in the American West. As an important Colorado River reservoir, Lake Mead is a fundamental component of Water 2025. For Water 2025 to achieve its goals, comprehensive knowledge is needed of historic and current Lake Mead water quality data. A task agreement between the National Park Service and the University of Nevada, Las Vegas provides for a strategic data mining project to identify research and monitoring projects on Lake Mead that have been conducted in the past, prioritize relevant projects, and ensure data …


Detecting Malicious Software By Dynamicexecution, Jianyong Dai Jan 2009

Detecting Malicious Software By Dynamicexecution, Jianyong Dai

Electronic Theses and Dissertations

Traditional way to detect malicious software is based on signature matching. However, signature matching only detects known malicious software. In order to detect unknown malicious software, it is necessary to analyze the software for its impact on the system when the software is executed. In one approach, the software code can be statically analyzed for any malicious patterns. Another approach is to execute the program and determine the nature of the program dynamically. Since the execution of malicious code may have negative impact on the system, the code must be executed in a controlled environment. For that purpose, we have …


Sentiment Classification Of Reviews Using Sentiwordnet, Bruno Ohana, Brendan Tierney Jan 2009

Sentiment Classification Of Reviews Using Sentiwordnet, Bruno Ohana, Brendan Tierney

Conference papers

Sentiment classification concerns the use of automatic methods for predicting the orientation of subjective content on text documents, with applications on a number of areas including recommender and advertising systems, customer intelligence and information retrieval. SentiWordNet is an opinion lexicon derived from the WordNet database where each term is associated with numerical scores indicating positive and negative sentiment information. This research presents the results of applying the SentiWordNet lexical resource to the problem of automatic sentiment classification of film reviews. Our approach comprises counting positive and negative term scores to determine sentiment orientation, and an improvement is presented by building …


Parallel Mining Of Association Rules Using A Lattice Based Approach, Wessel Morant Thomas Jan 2009

Parallel Mining Of Association Rules Using A Lattice Based Approach, Wessel Morant Thomas

CCE Theses and Dissertations

The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to …


Investigating Data Mining Techniques For Extracting Information From Alzheimer's Disease Data, Vinh Quoc Dang Jan 2009

Investigating Data Mining Techniques For Extracting Information From Alzheimer's Disease Data, Vinh Quoc Dang

Theses : Honours

Data mining techniques have been used widely in many areas such as business, science, engineering and more recently in clinical medicine. These techniques allow an enormous amount of high dimensional data to be analysed for extraction of interesting information as well as the construction of models for prediction. One of the foci in health related research is Alzheimer's disease which is currently a non-curable disease where diagnosis can only be confirmed after death via an autopsy. Using multi-dimensional data and the applications of data mining techniques, researchers hope to find biomarkers that will diagnose Alzheimer's disease as early as possible. …


Bootstrapping Events And Relations From Text, Ting Liu Jan 2009

Bootstrapping Events And Relations From Text, Ting Liu

Legacy Theses & Dissertations (2009 - 2024)

Information Extraction (IE) is a technique for automatically extracting structured data from text documents. One of the key analytical tasks is extraction of important and relevant information from textual sources. While information is plentiful and readily available, from the Internet, news services, media, etc., extracting the critical nuggets that matter to business or to national security is a cognitively demanding and time consuming task. Intelligence and business analysts spend many hours poring over endless streams of text documents pulling out reference to entities of interest (people, locations, organizations) as well as their relationships as reported in text. Such extracted "information …


An Enhanced Data Mining Life Cycle, Markus Hofmann, Brendan Tierney Jan 2009

An Enhanced Data Mining Life Cycle, Markus Hofmann, Brendan Tierney

Conference papers

Data mining projects are complex and can have a high failure rate. In order to improve project management and success rates of such projects a life cycle is vital to the overall success of the project. This paper reports on a research project that was concerned with the life cycle development for data mining projects, its team members and their role. The paper provides a detailed view of the design and development of the data mining life cycle called DMLC. The life cycle aims to support all members of data mining project teams as well as IT managers and academic …


Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni Jan 2009

Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni

UNLV Theses, Dissertations, Professional Papers, and Capstones

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.

For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …