Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

2019

Discipline
Institution
Publication
Publication Type

Articles 1 - 24 of 24

Full-Text Articles in Computer Sciences

Energy Efficiency Data Mining And Scheduling Optimization Of Discrete Workshop, Yugu Lin, Wang Yan Dec 2019

Energy Efficiency Data Mining And Scheduling Optimization Of Discrete Workshop, Yugu Lin, Wang Yan

Journal of System Simulation

Abstract: This paper addresses the optimization of energy consumption in discrete workshops and establishes the energy efficiency optimization model of discrete workshops. The relationship between data mining and knowledge discovery is established. Through scheduling data preprocessing and C4.5 decision tree learning algorithm, the discovery of scheduling knowledge is realized. Energy efficiency optimization calculation is achieved in discrete workshops by the combination of scheduling knowledge and improved differential evolution algorithm (IDE). By comparing with TLBO, GA and PSO, the feasibility of IDE algorithm is verified.


Bullynet: Unmasking Cyberbullies On Social Networks, Aparna Sankaran Dec 2019

Bullynet: Unmasking Cyberbullies On Social Networks, Aparna Sankaran

Boise State University Theses and Dissertations

Social media has changed the way people communicate with each other, and consecutively affected people's ability to empathize in both positive and negative ways. One of the most harmful consequences of social media is the rise of cyberbullying, which tends to be more sinister than traditional bullying given that online records typically live on the internet for quite a long time and are hard to control. In this thesis, we present a three-phase algorithm, called BullyNet, for detecting cyberbullies on Twitter social network. We exploit bullying tendencies by proposing a robust method for constructing a cyberbullying signed network. BullyNet analyzes …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Feature Space Modeling For Accurate And Efficient Learning From Non-Stationary Data, Ayesha Akter Oct 2019

Feature Space Modeling For Accurate And Efficient Learning From Non-Stationary Data, Ayesha Akter

Doctoral Dissertations

A non-stationary dataset is one whose statistical properties such as the mean, variance, correlation, probability distribution, etc. change over a specific interval of time. On the contrary, a stationary dataset is one whose statistical properties remain constant over time. Apart from the volatile statistical properties, non-stationary data poses other challenges such as time and memory management due to the limitation of computational resources mostly caused by the recent advancements in data collection technologies which generate a variety of data at an alarming pace and volume. Additionally, when the collected data is complex, managing data complexity, emerging from its dimensionality and …


Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang Aug 2019

Applied Deep Learning In Intelligent Transportation Systems And Embedding Exploration, Xiaoyuan Liang

Dissertations

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization …


Phenomena Of Social Dynamics In Online Games, Essa Alhazmi Jul 2019

Phenomena Of Social Dynamics In Online Games, Essa Alhazmi

USF Tampa Graduate Theses and Dissertations

Online communities exhibit dynamic social phenomena that, if understood, can both influence the design of technical platforms and inform theories about general social dynamics. With increasing popularity, online games provide a rich recording of social dynamics that can contribute to understanding human behavior. This dissertation studies two phenomena of social dynamics at large scale using data traces from online games. The first phenomenon is team formation and the second is players mobility between gaming servers.

This dissertation first presents a framework for collecting data from online gaming through crawling. It includes the data sources and the tools used for data …


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko Jun 2019

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in …


Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski Jun 2019

Data Mining And Machine Learning To Improve Northern Florida’S Foster Care System, Daniel Oldham, Nathan Foster, Mihhail Berezovski

Beyond: Undergraduate Research Journal

The purpose of this research project is to use statistical analysis, data mining, and machine learning techniques to determine identifiable factors in child welfare service records that could lead to a child entering the foster care system multiple times. This would allow us the capability of accurately predicting a case’s outcome based on these factors. We were provided with eight years of data in the form of multiple spreadsheets from Partnership for Strong Families (PSF), a child welfare services organization based in Gainesville, Florida, who is contracted by the Florida Department for Children and Families (DCF). This data contained a …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


Alpha Insurance: A Predictive Analytics Case To Analyze Automobile Insurance Fraud Using Sas Enterprise Miner (Tm), Richard Mccarthy, Wendy Ceccucci, Mary Mccarthy, Leila Halawi Apr 2019

Alpha Insurance: A Predictive Analytics Case To Analyze Automobile Insurance Fraud Using Sas Enterprise Miner (Tm), Richard Mccarthy, Wendy Ceccucci, Mary Mccarthy, Leila Halawi

Publications

Automobile Insurance fraud costs the insurance industry billions of dollars annually. This case study addresses claim fraud based on data extracted from Alpha Insurance’s automobile claim database. Students are provided the business problem and data sets. Initially, the students are required to develop their hypotheses and analyze the data. This includes identification of any missing or inaccurate data values and outliers as well as evaluation of the 22 variables. Next students will develop and optimize their predictive models using five techniques: regression, decision tree, neural network, gradient boosting, and ensemble. Then students will determine which model is the best fit …


Applications Of Supervised Machine Learning In Autism Spectrum Disorder Research: A Review, Kayleigh K. Hyde, Marlena N. Novack, Nicholas Lahaye, Chelsea Parlett-Pelleriti, Raymond Anden, Dennis R. Dixon, Erik Linstead Feb 2019

Applications Of Supervised Machine Learning In Autism Spectrum Disorder Research: A Review, Kayleigh K. Hyde, Marlena N. Novack, Nicholas Lahaye, Chelsea Parlett-Pelleriti, Raymond Anden, Dennis R. Dixon, Erik Linstead

Engineering Faculty Articles and Research

Autism spectrum disorder (ASD) research has yet to leverage "big data" on the same scale as other fields; however, advancements in easy, affordable data collection and analysis may soon make this a reality. Indeed, there has been a notable increase in research literature evaluating the effectiveness of machine learning for diagnosing ASD, exploring its genetic underpinnings, and designing effective interventions. This paper provides a comprehensive review of 45 papers utilizing supervised machine learning in ASD, including algorithms for classification and text analysis. The goal of the paper is to identify and describe supervised machine learning trends in ASD literature as …


Knowing Without Knowing: Real-Time Usage Identification Of Computer Systems, Leila Mohammed Hawana Jan 2019

Knowing Without Knowing: Real-Time Usage Identification Of Computer Systems, Leila Mohammed Hawana

Dissertations and Theses

Contemporary computers attempt to understand a user's actions and preferences in order to make decisions that better serve the user. In pursuit of this goal, computers can make observations that range from simple pattern recognition to listening in on conversations without the device being intentionally active. While these developments are incredibly useful for customization, the inherent security risks involving personal data are not always worth it. This thesis attempts to tackle one issue in this domain, computer usage identification, and presents a solution that identifies high-level usage of a system at any given moment without looking into any personal data. …


Optimization Of Material Release For Printed Circuit Board Template Based On Data Mining, Shengping Lü, Qiangsheng Yue, Liu Tao Jan 2019

Optimization Of Material Release For Printed Circuit Board Template Based On Data Mining, Shengping Lü, Qiangsheng Yue, Liu Tao

Journal of System Simulation

Abstract: Data mining were employed for the optimization of material release of PCB (Printed Circuit Board) template. PCB scrap ratio related parameters were specified and prediction model variables were chosen according to hypothesis test. Multiple linear regression (MLR), Chi-squared automatic interaction detector, artificial neural network and support vector machine approaches for the prediction of scrap ratio were employed. Evaluation indictors called as superfluous ratio, supplement release ratio and weighted sum of the two were presented; the material release simulation was conducted and then the four approaches were compared and MLR was taken as the preferred one. Adjust coefficient …


Mining And Validation Of Attacking Behavior In The Robocup 2d Simulation, Chen Bing, Zhang Heng, Zekai Cheng, Dong Peng, Lin Chao Jan 2019

Mining And Validation Of Attacking Behavior In The Robocup 2d Simulation, Chen Bing, Zhang Heng, Zekai Cheng, Dong Peng, Lin Chao

Journal of System Simulation

Abstract: Robocup is an international academic competition which focuses on artificial intelligence and robotics. The 2D simulation is one of the earliest and most influential projects in Robocup. Attacking is the core behaviour of the simulated football game, as well as the attack recognition is considered as an important part in team-confrontations. This paper selects some active and contribution index of attacking, extracts lots of attacking behaviour data of the key agents, proposes two kinds of attacking patterns of 2D simulation, as ‘separate attack’ and ‘cooperative attack’, according to the human-player actions. The following simulation tests give the accuracy of …


Exploratory Factor Analysis Of Graphical Features For Link Prediction In Social Networks, Lale Madahali, Lotfi Najjar, Margeret Hall Jan 2019

Exploratory Factor Analysis Of Graphical Features For Link Prediction In Social Networks, Lale Madahali, Lotfi Najjar, Margeret Hall

Interdisciplinary Informatics Faculty Proceedings & Presentations

Social Networks attract much attention due to their ability to replicate social interactions at scale. Link prediction, or the assessment of which unconnected nodes are likely to connect in the future, is an interesting but non-trivial research area. Three approaches exist to deal with the link prediction problem: feature-based models, Bayesian probabilistic models, probabilistic relational models. In feature-based methods, graphical features are extracted and used for classification. Usually, these features are subdivided into three feature groups based on their formula. Some formulas are extracted based on neighborhood graph traverse. Accordingly, there exists three groups of features, neighborhood features, path-based features, …


A Data Mining Framework For Improving Student Outcomes On Step 1 Of The United States Medical Licensing Examination, James Clark Jan 2019

A Data Mining Framework For Improving Student Outcomes On Step 1 Of The United States Medical Licensing Examination, James Clark

CCE Theses and Dissertations

Identifying the factors associated with medical students who fail Step 1 of the United States Medical Licensing Examination (USMLE) has been a focus of investigation for many years. Some researchers believe lower scores on the Medical Colleges Admissions Test (MCAT) are the sole factor used to identify failure. Other researchers believe lower course outcomes during the first two years of medical training are better indicators of failure. Yet, there are medical students who fail Step 1 of the USMLE who enter medical school with high MCAT scores, and conversely medical students with lower academic credentials who are expected to have …


Citationally Enhanced Semantic Literature Based Discovery, John David Fleig Jan 2019

Citationally Enhanced Semantic Literature Based Discovery, John David Fleig

CCE Theses and Dissertations

We are living within the age of information. The ever increasing flow of data and publications poses a monumental bottleneck to scientific progress as despite the amazing abilities of the human mind, it is woefully inadequate in processing such a vast quantity of multidimensional information. The small bits of flotsam and jetsam that we leverage belies the amount of useful information beneath the surface. It is imperative that automated tools exist to better search, retrieve, and summarize this content. Combinations of document indexing and search engines can quickly find you a document whose content best matches your query - if …


Data Analysis Through Social Media According To The Classified Crime, Serkan Savaş, Nuretti̇n Topaloğlu Jan 2019

Data Analysis Through Social Media According To The Classified Crime, Serkan Savaş, Nuretti̇n Topaloğlu

Turkish Journal of Electrical Engineering and Computer Sciences

The amount and variety of data generated through social media sites has increased along with the widespread use of social media sites. In addition, the data production rate has increased in the same way. The inclusion of personal information within these data makes it important to process the data and reach meaningful information within it. This process can be called intelligence and this meaningful information may be for commercial, academic, or security purposes. An example application is developed in this study for intelligence on Twitter. Crimes in Turkey are classified according to Turkish Statistical Institute criminal data and keywords are …


Brexit: A Granger Causality Of Twitter Political Polarisation On The Ftse 100 Index And The Pound, James Usher, Lucia Morales, Pierpaolo Dondio Jan 2019

Brexit: A Granger Causality Of Twitter Political Polarisation On The Ftse 100 Index And The Pound, James Usher, Lucia Morales, Pierpaolo Dondio

Conference papers

BREXIT is the single biggest geopolitical event in British history since WWII. Whilst the political fallout has become a tragicomedy, the political ramifications has had a profound impact on the Pound and the FTSE 100 index. This paper examines Twitter political discourse surrounding the BREXIT withdrawal agreement. In particular we focus on the discussions around four different exit strategies known as “Norway”, “Article 50”, the“Backstop” and “No Deal” and their effect on the pound and FTSE 100 index from the period of rumblings of the cancellation of the Meaning Vote on December 10th 2018 inclusive of second defeat on the …


Wordnet-Based Criminal Networks Mining For Cybercrime Investigation, Farkhund Iqbal, Benjamin C.M. Fung, Mourad Debbabi, Rabia Batool, Andrew Marrington Jan 2019

Wordnet-Based Criminal Networks Mining For Cybercrime Investigation, Farkhund Iqbal, Benjamin C.M. Fung, Mourad Debbabi, Rabia Batool, Andrew Marrington

All Works

© 2019 IEEE. Cybercriminals exploit the opportunities provided by the information revolution and social media to communicate and conduct underground illicit activities, such as online fraudulence, cyber predation, cyberbullying, hacking, blackmailing, and drug smuggling. To combat the increasing number of criminal activities, structure and content analysis of criminal communities can provide insight and facilitate cybercrime forensics. In this paper, we propose a framework to analyze chat logs for crime investigation using data mining and natural language processing techniques. The proposed framework extracts the social network from chat logs and summarizes conversation into topics. The crime investigator can use information visualizer …


Learning From Heterogeneous Data, Lu Wang Jan 2019

Learning From Heterogeneous Data, Lu Wang

Wayne State University Dissertations

Data with both heterogeneity and homogeneity is now ubiquitous due to the development of multitudinous data collection techniques. To encode the data heterogeneity and homogeneity, we focus on unsupervised and supervised learning approaches. In unsupervised learning, to consider both data heterogeneity and homogeneity, we develop three clustering frameworks to maximize the heterogeneity among data sub-groups and homogeneity within each data sub-group for over-dispersed data in three different data types, i.e., alphabetic, network and mixed feature types data. In supervised learning, the traditional approaches, however, either build a global model for a whole group including all sub-groups, which fail to consider …


Heart Attack Mortality Prediction: An Application Of Machine Learning Methods, Issam Salman Jan 2019

Heart Attack Mortality Prediction: An Application Of Machine Learning Methods, Issam Salman

Turkish Journal of Electrical Engineering and Computer Sciences

The heart is an important organ in the human body, and acute myocardial infarction (AMI) is the leading cause of death in most countries. Researchers are doing a lot of data analysis work to assist doctors in predicting the heart problem. An analysis of the data related to different health problems and its functions can help in predicting the wellness of this organ with a degree of certainty. Our research reported in this paper consists of two main parts. In the first part of the paper, we compare different predictive models of hospital mortality for patients with AMI. All results …


Efficient Algorithms For Mining Healthcare Data :, Yan Hu Jan 2019

Efficient Algorithms For Mining Healthcare Data :, Yan Hu

Legacy Theses & Dissertations (2009 - 2024)

Data-Driven Healthcare (DDH) is defined as the usage of available medical big data to provide the best and most personalized care, which is believed to be one of the most promising directions for transforming healthcare. The healthcare data includes claims and cost data, clinical data, pharmaceutical R&D data, patient behavior and sentiment data, and health data on the web. There has been a remarkable upsurge in the adoption of healthcare data over the past several years. In particular, it has been used for medical concept extraction, patient trajectory modeling, disease inference, etc.


Predictive Analysis Of Real-Time Strategy Games Using Graph Mining, Isam Abdulmunem Alobaidi Jan 2019

Predictive Analysis Of Real-Time Strategy Games Using Graph Mining, Isam Abdulmunem Alobaidi

Doctoral Dissertations

"Machine learning and computational intelligence have facilitated the development of recommendation systems for a broad range of domains. Such recommendations are based on contextual information that is explicitly provided or pervasively collected. Recommendation systems often improve decision-making or increase the efficacy of a task. Real-Time Strategy (RTS) video games are not only a popular entertainment medium, they also are an abstraction of many real-world applications where the aim is to increase your resources and decrease those of your opponent. Using predictive analytics, which examines past examples of success and failure, we can learn how to predict positive outcomes for such …