Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Data mining

Discipline
Institution
Publication Year
Publication
File Type

Articles 91 - 120 of 125

Full-Text Articles in Computer Sciences

Medical Data Analysis Method For Epilepsy, Ameen Eetemadi Jan 2012

Medical Data Analysis Method For Epilepsy, Ameen Eetemadi

Wayne State University Theses

Applying data mining techniques on medical databases which contain un-structured and semi-structured data is a challenging task. It is not only due to the complexity of such databases but also due to the characteristics of the medical domain. This thesis describes how multiple layers of data mining techniques have been applied to a Human Brain Image Database system. It starts with data preparation which paves the way for conventional data analysis techniques to be applied to the data. A similarity based patient retrieval tool has been designed and developed to assist in treatment planning and outcome estimation for epileptic patients. …


Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han Jan 2012

Computer Methods For Pre-Microrna Secondary Structure Prediction, Dianwei Han

Theses and Dissertations--Computer Science

This thesis presents a new algorithm to predict the pre-microRNA secondary structure. An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions.

It has been shown that studying the functions of multiple genes and predicting the secondary structure of multiple related microRNA is more important …


Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu Jan 2012

Decision Rule Induction For Service Sector Using Data Mining- A Rough Set Theory Approach, Zhonghua Hu

Open Access Theses & Dissertations

Nowadays, data mining is more widely used than ever before; not only by the academic area, but also in the industry and business area. Apart from execution of business processes, the creation of knowledge base and its utilization for the benefit of the organization is becoming a strategy tool to compete. Despite of having ever growing data bases, the problem is that the finance company fails to fully capitalize the true benefits which can be gained from this great wealth of information. The data mining technology instead of classic statistical analysis is developed to help the people to discover the …


Data Mining Based Learning Algorithms For Semi-Supervised Object Identification And Tracking, Michael P. Dessauer Jan 2011

Data Mining Based Learning Algorithms For Semi-Supervised Object Identification And Tracking, Michael P. Dessauer

Doctoral Dissertations

Sensor exploitation (SE) is the crucial step in surveillance applications such as airport security and search and rescue operations. It allows localization and identification of movement in urban settings and can significantly boost knowledge gathering, interpretation and action. Data mining techniques offer the promise of precise and accurate knowledge acquisition techniques in high-dimensional data domains (and diminishing the “curse of dimensionality” prevalent in such datasets), coupled by algorithmic design in feature extraction, discriminative ranking, feature fusion and supervised learning (classification). Consequently, data mining techniques and algorithms can be used to refine and process captured data and to detect, recognize, classify, …


A Web Based Fuzzy Data Mining Using Combs Inference Method And Decision Predictor, Shajia Akhter Sharmin Jan 2011

A Web Based Fuzzy Data Mining Using Combs Inference Method And Decision Predictor, Shajia Akhter Sharmin

All Graduate Theses, Dissertations, and Other Capstone Projects

Fuzzy logic has become a very popular method of reasoning a system with approximate input system instead of a precise one. When qualitative variables are used to determine the decisions then we have to create some specific functions where the membership values of the input can be any number between 0 to 1 instead of 1 or 0 which is used in binary logic. When number of input attribute increases it the combinatorial rules increases exponentially, and diminishes performance of the system. The problem is generally known as “combinatorial rule explosion”. The Information Technology Department of Minnesota State University, Mankato …


Combining Natural Language Processing And Statistical Text Mining: A Study Of Specialized Versus Common Languages, Jay Jarman Jan 2011

Combining Natural Language Processing And Statistical Text Mining: A Study Of Specialized Versus Common Languages, Jay Jarman

USF Tampa Graduate Theses and Dissertations

This dissertation focuses on developing and evaluating hybrid approaches for analyzing free-form text in the medical domain. This research draws on natural language processing (NLP) techniques that are used to parse and extract concepts based on a controlled vocabulary. Once important concepts are extracted, additional machine learning algorithms, such as association rule mining and decision tree induction, are used to discover classification rules for specific targets. This multi-stage pipeline approach is contrasted with traditional statistical text mining (STM) methods based on term counts and term-by-document frequencies. The aim is to create effective text analytic processes by adapting and combining individual …


Parallel Surrogate Detection In Large-Scale Simulations, Lei Jiang Jan 2011

Parallel Surrogate Detection In Large-Scale Simulations, Lei Jiang

LSU Master's Theses

Simulation has become a useful approach in scientific computing and engineering for its ability to model real natural or human systems. In particular, for complex systems such as hurricanes, wildfire disasters, and real-time road traffic, simulation methods are able to provide researchers, engineers and decision makers predicted values in order to help them to take appropriate actions. For large-scale problems, the simulations usually take a lot of time on supercomputers, thus making real-time predictions more difficult. Approximation models that mimic the behavior of simulation models but are computationally cheaper, namely "surrogate models", are desired in such scenarios. In the thesis, …


Determining A Patient Recovery From A Total Knee Replacement Using Fuzzy Logic And Active Databases, Robert Azarbod Jan 2011

Determining A Patient Recovery From A Total Knee Replacement Using Fuzzy Logic And Active Databases, Robert Azarbod

All Graduate Theses, Dissertations, and Other Capstone Projects

The purpose of the knowledge-based system is to predict the rehabilitation timeline of a patient in physical therapy for a total knee replacement. All patients have various attributes that contribute to their rehabilitation rate such as: weight, gender, smoking habit, medications, physical ability, or other medical problems. A combination of any one or several of these attributes will affect the recovery process. The proposed FRTP (Fuzzy Rehabilitation Timeline Predictor) is a fuzzy data mining model that can predict the recovery length of a patient in physical therapy for a total knee replacement and provide feedback to experts for revision of …


Evolutionary Strategies For Data Mining, Rose Lowe Dec 2010

Evolutionary Strategies For Data Mining, Rose Lowe

All Dissertations

Learning classifier systems (LCS) have been successful in generating rules for solving classification problems in data mining. The rules are of the form IF condition THEN action. The condition encodes the features of the input space and the action encodes the class label. What is lacking in those systems is the ability to express each feature using a function that is appropriate for that feature. The genetic algorithm is capable of doing this but cannot because only one type of membership function
is provided. Thus, the genetic algorithm learns only the shape and placement of the membership function, and in …


Event-Driven Similarity And Classification Of Scanpaths, Thomas Grindinger Aug 2010

Event-Driven Similarity And Classification Of Scanpaths, Thomas Grindinger

All Dissertations

Eye tracking experiments often involve recording the pattern of deployment of visual attention over the stimulus as viewers perform a given task (e.g., visual search). It is useful in training applications, for example, to make available an expert's sequence of eye movements, or scanpath, to novices for their inspection and subsequent learning. It may also be potentially useful to be able to assess the conformance of the novice's scanpath to that of the expert. A computational tool is proposed that provides a framework for performing such classification, based on the use of a probabilistic machine learning algorithm. The approach was …


Enterprise Users And Web Search Behavior, April Ann Lewis May 2010

Enterprise Users And Web Search Behavior, April Ann Lewis

Masters Theses

This thesis describes analysis of user web query behavior associated with Oak Ridge National Laboratory’s (ORNL) Enterprise Search System (Hereafter, ORNL Intranet). The ORNL Intranet provides users a means to search all kinds of data stores for relevant business and research information using a single query. The Global Intranet Trends for 2010 Report suggests the biggest current obstacle for corporate intranets is “findability and Siloed content”. Intranets differ from internets in the way they create, control, and share content which can make it often difficult and sometimes impossible for users to find information. Stenmark (2006) first noted studies of corporate …


The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham Jan 2010

The Impact Of Overfitting And Overgeneralization On The Classification Accuracy In Data Mining, Huy Nguyen Anh Pham

LSU Doctoral Dissertations

Current classification approaches usually do not try to achieve a balance between fitting and generalization when they infer models from training data. Such approaches ignore the possibility of different penalty costs for the false-positive, false-negative, and unclassifiable types. Thus, their performances may not be optimal or may even be coincidental. This dissertation analyzes the above issues in depth. It also proposes two new approaches called the Homogeneity-Based Algorithm (HBA) and the Convexity-Based Algorithm (CBA) to address these issues. These new approaches aim at optimally balancing the data fitting and generalization behaviors of models when some traditional classification approaches are used. …


Detecting Malicious Software By Dynamicexecution, Jianyong Dai Jan 2009

Detecting Malicious Software By Dynamicexecution, Jianyong Dai

Electronic Theses and Dissertations

Traditional way to detect malicious software is based on signature matching. However, signature matching only detects known malicious software. In order to detect unknown malicious software, it is necessary to analyze the software for its impact on the system when the software is executed. In one approach, the software code can be statically analyzed for any malicious patterns. Another approach is to execute the program and determine the nature of the program dynamically. Since the execution of malicious code may have negative impact on the system, the code must be executed in a controlled environment. For that purpose, we have …


Parallel Mining Of Association Rules Using A Lattice Based Approach, Wessel Morant Thomas Jan 2009

Parallel Mining Of Association Rules Using A Lattice Based Approach, Wessel Morant Thomas

CCE Theses and Dissertations

The discovery of interesting patterns from database transactions is one of the major problems in knowledge discovery in database. One such interesting pattern is the association rules extracted from these transactions. Parallel algorithms are required for the mining of association rules due to the very large databases used to store the transactions. In this paper we present a parallel algorithm for the mining of association rules. We implemented a parallel algorithm that used a lattice approach for mining association rules. The Dynamic Distributed Rule Mining (DDRM) is a lattice-based algorithm that partitions the lattice into sublattices to be assigned to …


Investigating Data Mining Techniques For Extracting Information From Alzheimer's Disease Data, Vinh Quoc Dang Jan 2009

Investigating Data Mining Techniques For Extracting Information From Alzheimer's Disease Data, Vinh Quoc Dang

Theses : Honours

Data mining techniques have been used widely in many areas such as business, science, engineering and more recently in clinical medicine. These techniques allow an enormous amount of high dimensional data to be analysed for extraction of interesting information as well as the construction of models for prediction. One of the foci in health related research is Alzheimer's disease which is currently a non-curable disease where diagnosis can only be confirmed after death via an autopsy. Using multi-dimensional data and the applications of data mining techniques, researchers hope to find biomarkers that will diagnose Alzheimer's disease as early as possible. …


Bootstrapping Events And Relations From Text, Ting Liu Jan 2009

Bootstrapping Events And Relations From Text, Ting Liu

Legacy Theses & Dissertations (2009 - 2024)

Information Extraction (IE) is a technique for automatically extracting structured data from text documents. One of the key analytical tasks is extraction of important and relevant information from textual sources. While information is plentiful and readily available, from the Internet, news services, media, etc., extracting the critical nuggets that matter to business or to national security is a cognitively demanding and time consuming task. Intelligence and business analysts spend many hours poring over endless streams of text documents pulling out reference to entities of interest (people, locations, organizations) as well as their relationships as reported in text. Such extracted "information …


Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni Jan 2009

Effects Of Similarity Metrics On Document Clustering, Rushikesh Veni

UNLV Theses, Dissertations, Professional Papers, and Capstones

Document clustering or unsupervised document classification is an automated process of grouping documents with similar content. A typical technique uses a similarity function to compare documents. In the literature, many similarity functions such as dot product or cosine measures are proposed for the comparison operator.

For the thesis, we evaluate the effects a similarity function may have on clustering. We start by representing a document and a query, both as a vector of high-dimensional space corresponding to the keywords followed by using an appropriate distance measure in k-means to compute similarity between the document vector and the query vector to …


Data Exploration By Using The Monotonicity Property, Hongyi Chen Jan 2008

Data Exploration By Using The Monotonicity Property, Hongyi Chen

LSU Master's Theses

Dealing with different misclassification costs has been a big problem for classification. Some algorithms can predict quite accurately when assuming the misclassification costs for each class are the same, like most rule induction methods. However, when the misclassification costs change, which is a common phenomenon in reality, these algorithms are not capable of adjusting their results. Some other algorithms, like the Bayesian methods, have the ability to yield probabilities of a certain unclassified example belonging to given classes, which is helpful to make modification on the results according to different misclassification costs. The shortcoming of such algorithms is, when the …


Structure Pattern Analysis Using Term Rewriting And Clustering Algorithm, Xuezheng Fu Jun 2007

Structure Pattern Analysis Using Term Rewriting And Clustering Algorithm, Xuezheng Fu

Computer Science Dissertations

Biological data is accumulated at a fast pace. However, raw data are generally difficult to understand and not useful unless we unlock the information hidden in the data. Knowledge/information can be extracted as the patterns or features buried within the data. Thus data mining, aims at uncovering underlying rules, relationships, and patterns in data, has emerged as one of the most exciting fields in computational science. In this dissertation, we develop efficient approaches to the structure pattern analysis of RNA and protein three dimensional structures. The major techniques used in this work include term rewriting and clustering algorithms. Firstly, a …


An Investigation Into The Application Of Data Mining Techniques To Characterize Agricultural Soil Profiles, Rowan J. Maddern Jan 2007

An Investigation Into The Application Of Data Mining Techniques To Characterize Agricultural Soil Profiles, Rowan J. Maddern

Theses : Honours

The advances in computing and information storage have provided vast amounts of data. The challenge has been to extract knowledge from this raw data; this has led to new methods and techniques such as data mining that can bridge the knowledge gap. The research aims to use these new data mining techniques and apply them to a soil science database to establish if meaningful relationships can be found. A data set extracted from the WA Department of Agriculture and Food (DAFW A) soils database has been used to conduct this research. The database contains measurements of soil profile data from …


Enhancing Web Marketing By Using Ontology, Xuan Zhou May 2006

Enhancing Web Marketing By Using Ontology, Xuan Zhou

Dissertations

The existence of the Web has a major impact on people's life styles. Online shopping, online banking, email, instant messenger services, search engines and bulletin boards have gradually become parts of our daily life. All kinds of information can be found on the Web. Web marketing is one of the ways to make use of online information. By extracting demographic information and interest information from the Web, marketing knowledge can be augmented by applying data mining algorithms. Therefore, this knowledge which connects customers to products can be used for marketing purposes and for targeting existing and potential customers. The Web …


Temporal Data Mining In A Dynamic Feature Space, Brent K. Wenerstrom May 2006

Temporal Data Mining In A Dynamic Feature Space, Brent K. Wenerstrom

Theses and Dissertations

Many interesting real-world applications for temporal data mining are hindered by concept drift. One particular form of concept drift is characterized by changes to the underlying feature space. Seemingly little has been done to address this issue. This thesis presents FAE, an incremental ensemble approach to mining data subject to concept drift. FAE achieves better accuracies over four large datasets when compared with a similar incremental learning algorithm.


Detecting Potential Insider Threats Through Email Datamining, James S. Okolica Mar 2006

Detecting Potential Insider Threats Through Email Datamining, James S. Okolica

Theses and Dissertations

No abstract provided.


Text Mining With Exploitation Of User's Background Knowledge : Discovering Novel Association Rules From Text, Xin Chen Jan 2006

Text Mining With Exploitation Of User's Background Knowledge : Discovering Novel Association Rules From Text, Xin Chen

Dissertations

The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments.

This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two …


Efficient Generation Of Social Network Data From Computer-Mediated Communication Logs, Jason Wei Sung Yee Mar 2005

Efficient Generation Of Social Network Data From Computer-Mediated Communication Logs, Jason Wei Sung Yee

Theses and Dissertations

The insider threat poses a significant risk to any network or information system. A general definition of the insider threat is an authorized user performing unauthorized actions, a broad definition with no specifications on severity or action. While limited research has been able to classify and detect insider threats, it is generally understood that insider attacks are planned, and that there is a time period in which the organization's leadership can intervene and prevent the attack. Previous studies have shown that the person's behavior will generally change, and it is possible that social network analysis could be used to observe …


Pattern Discovery In Structural Databases With Applications To Bioinformatics, Sen Zhang Jan 2005

Pattern Discovery In Structural Databases With Applications To Bioinformatics, Sen Zhang

Dissertations

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this thesis, two new FSM techniques are proposed for finding patterns in unordered labeled trees. Such trees can be used to model evolutionary histories of different species, among others.

The first FSM technique finds cousin pairs in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our …


New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert May 2004

New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert

Dissertations

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardized, methods and frameworks must be developed to handle both structural and traditional data.

The BIG-AJAX framework has been developed for solving these problems through both data cleaning and data integration. This framework exploits declarative data cleaning and exploratory data mining …


Customer Relationship Management For Banking System, Pingyu Hou Jan 2004

Customer Relationship Management For Banking System, Pingyu Hou

Theses Digitization Project

The purpose of this project is to design, build, and implement a Customer Relationship Management (CRM) system for a bank. CRM BANKING is an online application that caters to strengthening and stabilizing customer relationships in a bank.


High Performance Data Mining Techniques For Intrusion Detection, Muazzam Ahmed Siddiqui Jan 2004

High Performance Data Mining Techniques For Intrusion Detection, Muazzam Ahmed Siddiqui

Electronic Theses and Dissertations

The rapid growth of computers transformed the way in which information and data was stored. With this new paradigm of data access, comes the threat of this information being exposed to unauthorized and unintended users. Many systems have been developed which scrutinize the data for a deviation from the normal behavior of a user or system, or search for a known signature within the data. These systems are termed as Intrusion Detection Systems (IDS). These systems employ different techniques varying from statistical methods to machine learning algorithms. Intrusion detection systems use audit data generated by operating systems, application softwares or …


Using Sequence Analysis To Perform Application-Based Anomaly Detection Within An Artificial Immune System Framework, Larissa A. O'Brien Mar 2003

Using Sequence Analysis To Perform Application-Based Anomaly Detection Within An Artificial Immune System Framework, Larissa A. O'Brien

Theses and Dissertations

The Air Force and other Department of Defense (DoD) computer systems typically rely on traditional signature-based network IDSs to detect various types of attempted or successful attacks. Signature-based methods are limited to detecting known attacks or similar variants; anomaly-based systems, by contrast, alert on behaviors previously unseen. The development of an effective anomaly-detecting, application based IDS would increase the Air Force's ability to ward off attacks that are not detected by signature-based network IDSs, thus strengthening the layered defenses necessary to acquire and maintain safe, secure communication capability. This system follows the Artificial Immune System (AIS) framework, which relies on …