Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Mining

Theses/Dissertations

Discipline
Institution
Publication Year
Publication

Articles 1 - 17 of 17

Full-Text Articles in Computer Engineering

A Study Of Heart Disease Diagnosis Using Machine Learning And Data Mining, Intisar Ahmed Dec 2022

A Study Of Heart Disease Diagnosis Using Machine Learning And Data Mining, Intisar Ahmed

Electronic Theses, Projects, and Dissertations

Heart disease is the leading cause of death for people around the world today. Diagnosis for various forms of heart disease can be detected with numerous medical tests, however, predicting heart disease without such tests is very difficult. Machine learning can help process medical big data and provide hidden knowledge which otherwise would not be possible with the naked eye. The aim of this project is to explore how machine learning algorithms can be used in predicting heart disease by building an optimized model. The research questions are; 1) What Machine learning algorithms are used in the diagnosis of heart …


Strategies In Botnet Detection And Privacy Preserving Machine Learning, Di Zhuang Mar 2021

Strategies In Botnet Detection And Privacy Preserving Machine Learning, Di Zhuang

USF Tampa Graduate Theses and Dissertations

Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the infrastructure that responsible for various of cyber-crimes. Though a few existing work claimed to detect traditional botnets effectively, the problem of detecting P2P botnets involves more challenges. In this dissertation, we present two P2P botnet detection systems, PeerHunter and Enhanced PeerHunter. PeerHunter starts from a P2P hosts detection component. Then, it uses mutual contacts as the main feature to cluster bots into communities. Finally, it uses community behavior analysis to detect potential botnet communities and further identify bot candidates. Enhanced PeerHunter is an …


Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao Jan 2018

Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao

Theses and Dissertations--Computer Science

With exponential growth on a daily basis, there is potentially valuable information hidden in complex electronic medical records (EMR) systems. In this thesis, several efficient data mining algorithms were explored to discover hidden knowledge in insurance claims data. The first aim was to cluster three levels of information overload(IO) groups among chronic rheumatic disease (CRD) patient groups based on their clinical events extracted from insurance claims data. The second aim was to discover hidden patterns using three renowned pattern mining algorithms: Apriori, frequent pattern growth(FP-Growth), and sequential pattern discovery using equivalence classes(SPADE). The SPADE algorithm was found to be the …


Demand Side Management In Smart Grid Using Big Data Analytics, Sidhant Chatterjee Dec 2017

Demand Side Management In Smart Grid Using Big Data Analytics, Sidhant Chatterjee

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Smart Grids are the next generation electrical grid system that utilizes smart meter-ing devices and sensors to manage the grid operations. Grid management includes the prediction of load and and classification of the load patterns and consumer usage behav-iors. These predictions can be performed using machine learning methods which are often supervised. Supervised machine learning signifies that the algorithm trains the model to efficiently predict decisions based on the previously available data.

Smart grids are employed with numerous smart meters that send user statistics to a central server. The data can be accumulated and processed using data mining and machine …


Prediction Of Graduation Delay Based On Student Characterisitics And Performance, Tushar Ojha Jul 2017

Prediction Of Graduation Delay Based On Student Characterisitics And Performance, Tushar Ojha

Electrical and Computer Engineering ETDs

A college student's success depends on many factors including pre-university characteristics and university student support services. Student graduation rates are often used as an objective metric to measure institutional effectiveness. This work studies the impact of such factors on graduation rates, with a particular focus on delay in graduation. In this work, we used feature selection methods to identify a subset of the pre-institutional features with the highest discriminative power. In particular, Forward Selection with Linear Regression, Backward Elimination with Linear Regression, and Lasso Regression were applied. The feature sets were selected in a multivariate fashion. High school GPA, ACT …


Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo Jan 2017

Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo

CCE Theses and Dissertations

As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed …


Enhancing Snort Ids Performance Using Data Mining, Mohammed Ali Almaleki May 2016

Enhancing Snort Ids Performance Using Data Mining, Mohammed Ali Almaleki

Theses

Intrusion detection systems (IDSs) such as Snort apply deep packet inspection to detect intrusions. Usually, these are rule-based systems, where each incoming packet is matched with a set of rules. Each rule consists of two parts: the rule header and the rule options. The rule header is compared with the packet header. The rule options usually contain a signature string that is matched with packet content using an efficient string matching algorithm. The traditional approach to IDS packet inspection checks a packet against the detection rules by scanning from the first rule in the set and continuing to scan all …


Knowledge Discovery And Predictive Modeling From Brain Tumor Mris, Mu Zhou Sep 2015

Knowledge Discovery And Predictive Modeling From Brain Tumor Mris, Mu Zhou

USF Tampa Graduate Theses and Dissertations

Quantitative cancer imaging is an emerging field that develops computational techniques to acquire a deep understanding of cancer characteristics for cancer diagnosis and clinical decision making. The recent emergence of growing clinical imaging data provides a wealth of opportunity to systematically explore quantitative information to advance cancer diagnosis. Crucial questions arise as to how we can develop specific computational models that are capable of mining meaningful knowledge from a vast quantity of imaging data and how to transform such findings into improved personalized health care?

This dissertation presents a set of computational models in the context of malignant brain tumors— …


Contrast Pattern Aided Regression And Classification, Vahid Taslimitehrani Jan 2015

Contrast Pattern Aided Regression And Classification, Vahid Taslimitehrani

Browse all Theses and Dissertations

Regression and classification techniques play an essential role in many data mining tasks and have broad applications. However, most of the state-of-the-art regression and classification techniques are often unable to adequately model the interactions among predictor variables in highly heterogeneous datasets. New techniques that can effectively model such complex and heterogeneous structures are needed to significantly improve prediction accuracy. In this dissertation, we propose a novel type of accurate and interpretable regression and classification models, named as Pattern Aided Regression (PXR) and Pattern Aided Classification (PXC) respectively. Both PXR and PXC rely on identifying regions in the data space where …


Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing, Samir Al-Stouhi Jan 2013

Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing, Samir Al-Stouhi

Wayne State University Dissertations

As machine learning methods extend to more complex and diverse set of problems, situations arise where the complexity and availability of data presents a situation where the information source is not "adequate" to generate a representative hypothesis. Learning from multiple sources of data is a promising research direction as researchers leverage ever more diverse sources of information. Since data is not readily available, knowledge has to be transferred from other sources and new methods (both supervised and un-supervised) have to be developed to selectively share and transfer knowledge. In this dissertation, we present both supervised and un-supervised techniques to tackle …


Application Of Self-Monitoring For Situational Awareness, Christopher Trickler Jan 2013

Application Of Self-Monitoring For Situational Awareness, Christopher Trickler

Electronic Theses and Dissertations

Self-monitoring devices and services are used for physical wellness, personal tracking and self-improvement. These individual devices and services can only provide information based on what they can measure directly or historically without an intermediate system. This paper proposes a self-monitoring system to perform situational awareness which may extend into providing insight into predictable behaviors. Knowing an individual’s current state and likelihood of particular behaviors occurring is a general solution. This knowledge-based solution derived from sensory data has many applications. The proposed system could monitor current individual situational status, automatically provide personal status as it changes, aid personal improvement, contribute to …


Privacy Preserving Distributed Data Mining, Zhenmin Lin Jan 2012

Privacy Preserving Distributed Data Mining, Zhenmin Lin

Theses and Dissertations--Computer Science

Privacy preserving distributed data mining aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the data privacy. My research focuses on the design and implementation of privacy preserving two-party protocols based on homomorphic encryption. I present new results in this area, including new secure protocols for basic operations and two fundamental privacy preserving data mining protocols.

I propose a number of secure protocols for basic operations in the additive secret-sharing scheme based on homomorphic encryption. I derive a basic relationship between a secret number and its shares, with which we develop efficient secure …


Clustering Educational Digital Library Usage Data: Comparisons Of Latent Class Analysis And K-Means Algorithms, Beijie Xu May 2011

Clustering Educational Digital Library Usage Data: Comparisons Of Latent Class Analysis And K-Means Algorithms, Beijie Xu

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

There are common pitfalls and neglected areas when using clustering approaches to solve educational problems. A clustering algorithm is often used without the choice being justified. Few comparisons between a selected algorithm and a competing algorithm are presented, and results are presented without validation. Lastly, few studies fully utilize data provided in an educational environment to evaluate their findings. In response to these problems, this thesis describes a rigorous study comparing two clustering algorithms in the context of an educational digital library service, called the Instructional Architect.

First, a detailed description of the chosen clustering algorithm, namely, latent class analysis …


An Architecture For High-Performance Privacy-Preserving And Distributed Data Mining, James Secretan Jan 2009

An Architecture For High-Performance Privacy-Preserving And Distributed Data Mining, James Secretan

Electronic Theses and Dissertations

This dissertation discusses the development of an architecture and associated techniques to support Privacy Preserving and Distributed Data Mining. The field of Distributed Data Mining (DDM) attempts to solve the challenges inherent in coordinating data mining tasks with databases that are geographically distributed, through the application of parallel algorithms and grid computing concepts. The closely related field of Privacy Preserving Data Mining (PPDM) adds the dimension of privacy to the problem, trying to find ways that organizations can collaborate to mine their databases collectively, while at the same time preserving the privacy of their records. Developing data mining algorithms for …


Scalable And Efficient Outlier Detection In Large Distributed Data Sets With Mixed-Type Attributes, Anna Koufakou Jan 2009

Scalable And Efficient Outlier Detection In Large Distributed Data Sets With Mixed-Type Attributes, Anna Koufakou

Electronic Theses and Dissertations

An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding …


Matrix Decomposition For Data Disclosure Control And Data Mining Applications, Jie Wang Jan 2008

Matrix Decomposition For Data Disclosure Control And Data Mining Applications, Jie Wang

University of Kentucky Doctoral Dissertations

Access to huge amounts of various data with private information brings out a dual demand for preservation of data privacy and correctness of knowledge discovery, which are two apparently contradictory tasks. Low-rank approximations generated by matrix decompositions are a fundamental element in this dissertation for the privacy preserving data mining (PPDM) applications. Two categories of PPDM are studied: data value hiding (DVH) and data pattern hiding (DPH). A matrix-decomposition-based framework is designed to incorporate matrix decomposition techniques into data preprocessing to distort original data sets. With respect to the challenge in the DVH, how to protect sensitive/confidential attribute values without …


Comparative Microarray Data Mining, Shihong Mao Jan 2007

Comparative Microarray Data Mining, Shihong Mao

Browse all Theses and Dissertations

As a revolutionary technology, microarrays have great potential to provide genome-wide patterns of gene expression, to make accurate medical diagnosis, and to explore genetic causes underlying diseases. It is commonly believed that suitable analysis of microarray datasets can lead to achieve the above goals. While much has been done in microarray data mining, few previous studies, if any, focused on multiple datasets at the comparative level. This dissertation aims to fill this gap by developing tools and methods for set-based comparative microarray data mining. Specifically, we mine highly differentiative gene groups (HDGGs) from given datasets/classes, evaluate the concordance of datasets …