Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Mining

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 41

Full-Text Articles in Computer Engineering

Risk Assessment Approaches In Banking Sector –A Survey, Mona Sharaf, Shimaa Mohamed Ouf, Amira M. Idrees Ami Jul 2023

Risk Assessment Approaches In Banking Sector –A Survey, Mona Sharaf, Shimaa Mohamed Ouf, Amira M. Idrees Ami

Future Computing and Informatics Journal

Prediction analysis is a method that makes predictions based on the data currently available. Bank loans come with a lot of risks to both the bank and the borrowers. One of the most exciting and important areas of research is data mining, which aims to extract information from vast amounts of accumulated data sets. The loan process is one of the key processes for the banking industry, and this paper examines various prior studies that used data mining techniques to extract all served entities and attributes necessary for analytical purposes, categorize these attributes, and forecast the future of their business …


A Study Of Heart Disease Diagnosis Using Machine Learning And Data Mining, Intisar Ahmed Dec 2022

A Study Of Heart Disease Diagnosis Using Machine Learning And Data Mining, Intisar Ahmed

Electronic Theses, Projects, and Dissertations

Heart disease is the leading cause of death for people around the world today. Diagnosis for various forms of heart disease can be detected with numerous medical tests, however, predicting heart disease without such tests is very difficult. Machine learning can help process medical big data and provide hidden knowledge which otherwise would not be possible with the naked eye. The aim of this project is to explore how machine learning algorithms can be used in predicting heart disease by building an optimized model. The research questions are; 1) What Machine learning algorithms are used in the diagnosis of heart …


An Analysis On Network Flow-Based Iot Botnet Detection Using Weka, Cian Porteous Jan 2022

An Analysis On Network Flow-Based Iot Botnet Detection Using Weka, Cian Porteous

Dissertations

Botnets pose a significant and growing risk to modern networks. Detection of botnets remains an important area of open research in order to prevent the proliferation of botnets and to mitigate the damage that can be caused by botnets that have already been established. Botnet detection can be broadly categorised into two main categories: signature-based detection and anomaly-based detection. This paper sets out to measure the accuracy, false-positive rate, and false-negative rate of four algorithms that are available in Weka for anomaly-based detection of a dataset of HTTP and IRC botnet data. The algorithms that were selected to detect botnets …


A Literature Review For Contributing Mining Approaches For Business Process Reengineering, Noha Ahmed Bayomy Nab, Ayman E. Khedr Aek, Laila A. Abd-Elmegid Laa, Amira M. Idrees Ami May 2021

A Literature Review For Contributing Mining Approaches For Business Process Reengineering, Noha Ahmed Bayomy Nab, Ayman E. Khedr Aek, Laila A. Abd-Elmegid Laa, Amira M. Idrees Ami

Future Computing and Informatics Journal

Due to the changing dynamics of the business environment, organizations need to redesign or reengineer their business processes in order to provide services with the lowest cost and shortest response time while increasing quality. Thence, Business Process Re-engineering (BPR) provides a roadmap to achieve operational goals that leads to enhance flexibility and productivity, cost reduction, and quality of service/product. In this paper, we propose a literature review for the different proposed models for Business Process Reengineering. The models specify where the breakdowns occur in BPR implementation, justifies why such breakdowns occur, and propose techniques to prevent their occurrence again. The …


Strategies In Botnet Detection And Privacy Preserving Machine Learning, Di Zhuang Mar 2021

Strategies In Botnet Detection And Privacy Preserving Machine Learning, Di Zhuang

USF Tampa Graduate Theses and Dissertations

Peer-to-peer (P2P) botnets have become one of the major threats in network security for serving as the infrastructure that responsible for various of cyber-crimes. Though a few existing work claimed to detect traditional botnets effectively, the problem of detecting P2P botnets involves more challenges. In this dissertation, we present two P2P botnet detection systems, PeerHunter and Enhanced PeerHunter. PeerHunter starts from a P2P hosts detection component. Then, it uses mutual contacts as the main feature to cluster bots into communities. Finally, it uses community behavior analysis to detect potential botnet communities and further identify bot candidates. Enhanced PeerHunter is an …


Machine Learning Techniques For Credit Card Fraud Detection, Hossam Eldin Mohammed Abd El-Hamid Ahmed Abdou, Wael Khalifa, Mohamed Ismail Roushdy, Abdel-Badeeh M. Salem Sep 2020

Machine Learning Techniques For Credit Card Fraud Detection, Hossam Eldin Mohammed Abd El-Hamid Ahmed Abdou, Wael Khalifa, Mohamed Ismail Roushdy, Abdel-Badeeh M. Salem

Future Computing and Informatics Journal

The term “fraud”, it always concerned about credit card fraud in our minds. And after the significant increase in the transactions of credit card, the fraud of credit card increased extremely in last years. So the fraud detection should include surveillance of the spending attitude for the person/customer to the determination, avoidance, and detection of unwanted behavior. Because the credit card is the most payment predominant way for the online and regular purchasing, the credit card fraud raises highly. The Fraud detection is not only concerned with capturing of the fraudulent practices, but also, discover it as fast as they …


Reputation-Aware Trajectory-Based Data Mining In The Internet Of Things (Iot), Samia Tasnim Nov 2019

Reputation-Aware Trajectory-Based Data Mining In The Internet Of Things (Iot), Samia Tasnim

FIU Electronic Theses and Dissertations

Internet of Things (IoT) is a critically important technology for the acquisition of spatiotemporally dense data in diverse applications, ranging from environmental monitoring to surveillance systems. Such data helps us improve our transportation systems, monitor our air quality and the spread of diseases, respond to natural disasters, and a bevy of other applications. However, IoT sensor data is error-prone due to a number of reasons: sensors may be deployed in hazardous environments, may deplete their energy resources, have mechanical faults, or maybe become the targets of malicious attacks by adversaries. While previous research has attempted to improve the quality of …


Learnfca: A Fuzzy Fca And Probability Based Approach For Learning And Classification, Suraj Ketan Samal Aug 2019

Learnfca: A Fuzzy Fca And Probability Based Approach For Learning And Classification, Suraj Ketan Samal

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.

This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide …


Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao Jan 2018

Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao

Theses and Dissertations--Computer Science

With exponential growth on a daily basis, there is potentially valuable information hidden in complex electronic medical records (EMR) systems. In this thesis, several efficient data mining algorithms were explored to discover hidden knowledge in insurance claims data. The first aim was to cluster three levels of information overload(IO) groups among chronic rheumatic disease (CRD) patient groups based on their clinical events extracted from insurance claims data. The second aim was to discover hidden patterns using three renowned pattern mining algorithms: Apriori, frequent pattern growth(FP-Growth), and sequential pattern discovery using equivalence classes(SPADE). The SPADE algorithm was found to be the …


Demand Side Management In Smart Grid Using Big Data Analytics, Sidhant Chatterjee Dec 2017

Demand Side Management In Smart Grid Using Big Data Analytics, Sidhant Chatterjee

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Smart Grids are the next generation electrical grid system that utilizes smart meter-ing devices and sensors to manage the grid operations. Grid management includes the prediction of load and and classification of the load patterns and consumer usage behav-iors. These predictions can be performed using machine learning methods which are often supervised. Supervised machine learning signifies that the algorithm trains the model to efficiently predict decisions based on the previously available data.

Smart grids are employed with numerous smart meters that send user statistics to a central server. The data can be accumulated and processed using data mining and machine …


Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw Aug 2017

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a …


Prediction Of Graduation Delay Based On Student Characterisitics And Performance, Tushar Ojha Jul 2017

Prediction Of Graduation Delay Based On Student Characterisitics And Performance, Tushar Ojha

Electrical and Computer Engineering ETDs

A college student's success depends on many factors including pre-university characteristics and university student support services. Student graduation rates are often used as an objective metric to measure institutional effectiveness. This work studies the impact of such factors on graduation rates, with a particular focus on delay in graduation. In this work, we used feature selection methods to identify a subset of the pre-institutional features with the highest discriminative power. In particular, Forward Selection with Linear Regression, Backward Elimination with Linear Regression, and Lasso Regression were applied. The feature sets were selected in a multivariate fashion. High school GPA, ACT …


Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo Jan 2017

Data Masking, Encryption, And Their Effect On Classification Performance: Trade-Offs Between Data Security And Utility, Juan C. Asenjo

CCE Theses and Dissertations

As data mining increasingly shapes organizational decision-making, the quality of its results must be questioned to ensure trust in the technology. Inaccuracies can mislead decision-makers and cause costly mistakes. With more data collected for analytical purposes, privacy is also a major concern. Data security policies and regulations are increasingly put in place to manage risks, but these policies and regulations often employ technologies that substitute and/or suppress sensitive details contained in the data sets being mined. Data masking and substitution and/or data encryption and suppression of sensitive attributes from data sets can limit access to important details. It is believed …


Significant Permission Identification For Android Malware Detection, Lichao Sun Jul 2016

Significant Permission Identification For Android Malware Detection, Lichao Sun

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

A recent report indicates that a newly developed malicious app for Android is introduced every 11 seconds. To combat this alarming rate of malware creation, we need a scalable malware detection approach that is effective and efficient. In this thesis, we introduce SigPID, a malware detection system based on permission analysis to cope with the rapid increase in the number of Android malware. Instead of analyzing all 135 Android permissions, our approach applies 3-level pruning by mining the permission data to identify only significant permissions that can be effective in distinguishing benign and malicious apps. Based on the identified significant …


Enhancing Snort Ids Performance Using Data Mining, Mohammed Ali Almaleki May 2016

Enhancing Snort Ids Performance Using Data Mining, Mohammed Ali Almaleki

Theses

Intrusion detection systems (IDSs) such as Snort apply deep packet inspection to detect intrusions. Usually, these are rule-based systems, where each incoming packet is matched with a set of rules. Each rule consists of two parts: the rule header and the rule options. The rule header is compared with the packet header. The rule options usually contain a signature string that is matched with packet content using an efficient string matching algorithm. The traditional approach to IDS packet inspection checks a packet against the detection rules by scanning from the first rule in the set and continuing to scan all …


Efspredictor: Predicting Configuration Bugs With Ensemble Feature Selection, Bowen Xu, David Lo, Xin Xia, Ashish Sureka, Shanping Li May 2016

Efspredictor: Predicting Configuration Bugs With Ensemble Feature Selection, Bowen Xu, David Lo, Xin Xia, Ashish Sureka, Shanping Li

Research Collection School Of Computing and Information Systems

The configuration of a system determines the system behavior and wrong configuration settings can adversely impact system's availability, performance, and correctness. We refer to these wrong configuration settings as configuration bugs. The importance of configuration bugs has prompted many researchers to study it, and past studies can be grouped into three categories: detection, localization, and fixing of configuration bugs. In the work, we focus on the detection of configuration bugs, in particular, we follow the line-of-work that tries to predict if a bug report is caused by a wrong configuration setting. Automatically prediction of whether a bug is a configuration …


Find: Framework For Intelligent Research Discovery, Clint Cuffy, Tej Mehta, Hengbin Li Jan 2016

Find: Framework For Intelligent Research Discovery, Clint Cuffy, Tej Mehta, Hengbin Li

Capstone Design Expo Posters

Computers are essential to research yet the different ways in which computers can automate or speed up research have not been fully explored. Researchers are publishing experimental results faster than ever before and the number of articles to read on a single subject now presents an overwhelming task. FiND aims to expedite this process through an extensible backbone infrastructure for the automated synthesizing of data. FiND includes a Web User Interface, Perl Core and MySQL Database developed using the software constraints of Perl, HTML, CSS, CGI and MySQL. FiND attempts to simplify this task by reducing time spent in exhaustive …


Knowledge Discovery And Predictive Modeling From Brain Tumor Mris, Mu Zhou Sep 2015

Knowledge Discovery And Predictive Modeling From Brain Tumor Mris, Mu Zhou

USF Tampa Graduate Theses and Dissertations

Quantitative cancer imaging is an emerging field that develops computational techniques to acquire a deep understanding of cancer characteristics for cancer diagnosis and clinical decision making. The recent emergence of growing clinical imaging data provides a wealth of opportunity to systematically explore quantitative information to advance cancer diagnosis. Crucial questions arise as to how we can develop specific computational models that are capable of mining meaningful knowledge from a vast quantity of imaging data and how to transform such findings into improved personalized health care?

This dissertation presents a set of computational models in the context of malignant brain tumors— …


An Evaluation Of The Use Of Diversity To Improve The Accuracy Of Predicted Ratings In Recommender Systems, Gillian Browne May 2015

An Evaluation Of The Use Of Diversity To Improve The Accuracy Of Predicted Ratings In Recommender Systems, Gillian Browne

Dissertations

The diversity; versus accuracy trade off, has become an important area of research within recommender systems as online retailers attempt to better serve their customers and gain a competitive advantage through an improved customer experience. This dissertation attempted to evaluate the use of diversity measures in predictive models as a means of improving predicted ratings. Research literature outlines a number of influencing factors such as personality, taste, mood and social networks in addition to approaches to the diversity challenge post recommendation. A number of models were applied included DecisionStump, Linear Regression, J48 Decision Tree and Naive Bayes. Various evaluation metrics …


Using Support Vector Machine Ensembles For Target Audience Classification On Twitter, Siaw Ling Lo, Raymond Chiong, David Cornforth Apr 2015

Using Support Vector Machine Ensembles For Target Audience Classification On Twitter, Siaw Ling Lo, Raymond Chiong, David Cornforth

Research Collection School Of Computing and Information Systems

The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results …


Contrast Pattern Aided Regression And Classification, Vahid Taslimitehrani Jan 2015

Contrast Pattern Aided Regression And Classification, Vahid Taslimitehrani

Browse all Theses and Dissertations

Regression and classification techniques play an essential role in many data mining tasks and have broad applications. However, most of the state-of-the-art regression and classification techniques are often unable to adequately model the interactions among predictor variables in highly heterogeneous datasets. New techniques that can effectively model such complex and heterogeneous structures are needed to significantly improve prediction accuracy. In this dissertation, we propose a novel type of accurate and interpretable regression and classification models, named as Pattern Aided Regression (PXR) and Pattern Aided Classification (PXC) respectively. Both PXR and PXC rely on identifying regions in the data space where …


Discovering Comprehensible Hydrogeological Profiles In The Margarita Island's Aquifers Including Post-Processing In A Data Mining Process, Conti Dante, Gibert Karina Jun 2014

Discovering Comprehensible Hydrogeological Profiles In The Margarita Island's Aquifers Including Post-Processing In A Data Mining Process, Conti Dante, Gibert Karina

International Congress on Environmental Modelling and Software

Groundwater wells are one of the most important water resources in the world. Control and management of these resources are of high importance due to the implicit need of water as the main resource for life. This research focuses on a hydrogeological analysis with clustering, which is one of the most popular data mining methods, including In the classical data mining scheme, last step corresponds to the effective production of knowledge. In this paper, special focus on that part is done, by means of post-processing tools. The main goal is to discover prototypical profiles from the acquifer Pedro González in …


Silvio, Modelling Social Vulnerability Under A Local Perspective, Thaís López-Inojosa, Martina Neuburger, Sebastian Medina-Plascencia, Framklin Davila Jun 2014

Silvio, Modelling Social Vulnerability Under A Local Perspective, Thaís López-Inojosa, Martina Neuburger, Sebastian Medina-Plascencia, Framklin Davila

International Congress on Environmental Modelling and Software

The objective of this research is to develop and to model an indicator of social vulnerability as part of a multidimensional, multivariable and non linear process. Social vulnerability can be considered as a complex system, in which many relationships in society and environment can be described by considering individual and structural factors. Numerous studies provide tools to analyze and revise social systems but these tools were developed generalizing and overviewing the relationship among many intermediate realities, for another type of systems, more formalized and conceptualized. Social systems do not have this classical formalization. The development of the SocIaL Vulnerability I …


Mining The Online Social Network Data: Influence, Summarization, And Organization, Jingxuan Li Mar 2014

Mining The Online Social Network Data: Influence, Summarization, And Organization, Jingxuan Li

FIU Electronic Theses and Dissertations

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it.

This dissertation focuses on applying data mining and information retrieval techniques to …


Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing, Samir Al-Stouhi Jan 2013

Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing, Samir Al-Stouhi

Wayne State University Dissertations

As machine learning methods extend to more complex and diverse set of problems, situations arise where the complexity and availability of data presents a situation where the information source is not "adequate" to generate a representative hypothesis. Learning from multiple sources of data is a promising research direction as researchers leverage ever more diverse sources of information. Since data is not readily available, knowledge has to be transferred from other sources and new methods (both supervised and un-supervised) have to be developed to selectively share and transfer knowledge. In this dissertation, we present both supervised and un-supervised techniques to tackle …


Application Of Self-Monitoring For Situational Awareness, Christopher Trickler Jan 2013

Application Of Self-Monitoring For Situational Awareness, Christopher Trickler

Electronic Theses and Dissertations

Self-monitoring devices and services are used for physical wellness, personal tracking and self-improvement. These individual devices and services can only provide information based on what they can measure directly or historically without an intermediate system. This paper proposes a self-monitoring system to perform situational awareness which may extend into providing insight into predictable behaviors. Knowing an individual’s current state and likelihood of particular behaviors occurring is a general solution. This knowledge-based solution derived from sensory data has many applications. The proposed system could monitor current individual situational status, automatically provide personal status as it changes, aid personal improvement, contribute to …


Data Mining Text Book, Abbas Madraky Jan 2012

Data Mining Text Book, Abbas Madraky

Abbas Madraky

No abstract provided.


Privacy Preserving Distributed Data Mining, Zhenmin Lin Jan 2012

Privacy Preserving Distributed Data Mining, Zhenmin Lin

Theses and Dissertations--Computer Science

Privacy preserving distributed data mining aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the data privacy. My research focuses on the design and implementation of privacy preserving two-party protocols based on homomorphic encryption. I present new results in this area, including new secure protocols for basic operations and two fundamental privacy preserving data mining protocols.

I propose a number of secure protocols for basic operations in the additive secret-sharing scheme based on homomorphic encryption. I derive a basic relationship between a secret number and its shares, with which we develop efficient secure …


Clustering Educational Digital Library Usage Data: Comparisons Of Latent Class Analysis And K-Means Algorithms, Beijie Xu May 2011

Clustering Educational Digital Library Usage Data: Comparisons Of Latent Class Analysis And K-Means Algorithms, Beijie Xu

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

There are common pitfalls and neglected areas when using clustering approaches to solve educational problems. A clustering algorithm is often used without the choice being justified. Few comparisons between a selected algorithm and a competing algorithm are presented, and results are presented without validation. Lastly, few studies fully utilize data provided in an educational environment to evaluate their findings. In response to these problems, this thesis describes a rigorous study comparing two clustering algorithms in the context of an educational digital library service, called the Instructional Architect.

First, a detailed description of the chosen clustering algorithm, namely, latent class analysis …


Polygonal Spatial Clustering, Deepti Joshi Apr 2011

Polygonal Spatial Clustering, Deepti Joshi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Clustering, the process of grouping together similar objects, is a fundamental task in data mining to help perform knowledge discovery in large datasets. With the growing number of sensor networks, geospatial satellites, global positioning devices, and human networks tremendous amounts of spatio-temporal data that measure the state of the planet Earth are being collected every day. This large amount of spatio-temporal data has increased the need for efficient spatial data mining techniques. Furthermore, most of the anthropogenic objects in space are represented using polygons, for example – counties, census tracts, and watersheds. Therefore, it is important to develop data mining …