Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Computer Sciences

Institution
Publication Year
Publication
Publication Type

Articles 31 - 43 of 43

Full-Text Articles in Computer Engineering

Iot+Small Data: Transforming In-Store Shopping Analytics And Services, Meera Radhakrishnan, Sougata Sen, Vigneshwaran Subbaraju, Archan Misra, Rajesh Balan Jan 2016

Iot+Small Data: Transforming In-Store Shopping Analytics And Services, Meera Radhakrishnan, Sougata Sen, Vigneshwaran Subbaraju, Archan Misra, Rajesh Balan

Research Collection School Of Computing and Information Systems

We espouse a vision of small data-based immersive retail analytics, where a combination of sensor data, from personal wearable-devices and store-deployed sensors & IoT devices, is used to create real-time, individualized services for in-store shoppers. Key challenges include (a) appropriate joint mining of sensor & wearable data to capture a shopper’s product level interactions, and (b) judicious triggering of power-hungry wearable sensors (e.g., camera) to capture only relevant portions of a shopper’s in-store activities. To explore the feasibility of our vision, we conducted experiments with 5 smartwatch-wearing users who interacted with objects placed on cupboard racks in our lab (to …


Automatic Classification Of Harmonic Data Using $K$-Means And Least Square Support Vector Machine, Hüseyi̇n Eri̇şti̇, Vedat Tümen, Özal Yildirim, Belkis Eri̇şti̇, Yakup Demi̇r Jan 2015

Automatic Classification Of Harmonic Data Using $K$-Means And Least Square Support Vector Machine, Hüseyi̇n Eri̇şti̇, Vedat Tümen, Özal Yildirim, Belkis Eri̇şti̇, Yakup Demi̇r

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, an effective classification approach to classify harmonic data has been proposed. In the proposed classifier approach, harmonic data obtained through a 3-phase system have been classified by using $k$-means and least square support vector machine (LS-SVM) models. In order to obtain class details regarding harmonic data, a $k$-means clustering algorithm has been applied to these data first. The training of the LS-SVM model has been realized with the class details obtained through the $k$-means algorithm. To increase the efficiency of the LS-SVM model, the regularization and kernel parameters of this model have been determined with a grid …


A Theory Of Name Resolution, Pierre Néron, Andrew Tolmach, Eelco Visser, Guido Wachsmuth Jan 2015

A Theory Of Name Resolution, Pierre Néron, Andrew Tolmach, Eelco Visser, Guido Wachsmuth

Computer Science Faculty Publications and Presentations

We describe a language-independent theory for name binding and resolution, suitable for programming languages with complex scoping rules including both lexical scoping and modules. We formulate name resolution as a two-stage problem. First a language-independent scope graph is constructed using language-specific rules from an abstract syntax tree. Then references in the scope graph are resolved to corresponding declarations using a language-independent resolution process. We introduce a resolution calculus as a concise, declarative, and language- independent specification of name resolution. We develop a resolution algorithm that is sound and complete with respect to the calculus. Based on the resolution calculus we …


Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague Jan 2014

Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague

Journal of Digital Forensics, Security and Law

Email is the most common and comparatively the most efficient means of exchanging information in today's world. However, given the widespread use of emails in all sectors, they have been the target of spammers since the beginning. Filtering spam emails has now led to critical actions such as forensic activities based on mining spam email. The data mine for spam emails at the University of Alabama at Birmingham is considered to be one of the most prominent resources for mining and identifying spam sources. It is a widely researched repository used by researchers from different global organizations. The usual process …


M-Fdbscan: A Multicore Density-Based Uncertain Data Clustering Algorithm, Atakan Erdem, Taflan İmre Gündem Jan 2014

M-Fdbscan: A Multicore Density-Based Uncertain Data Clustering Algorithm, Atakan Erdem, Taflan İmre Gündem

Turkish Journal of Electrical Engineering and Computer Sciences

In many data mining applications, we use a clustering algorithm on a large amount of uncertain data. In this paper, we adapt an uncertain data clustering algorithm called fast density-based spatial clustering of applications with noise (FDBSCAN) to multicore systems in order to have fast processing. The new algorithm, which we call multicore FDBSCAN (M-FDBSCAN), splits the data domain into c rectangular regions, where c is the number of cores in the system. The FDBSCAN algorithm is then applied to each rectangular region simultaneously. After the clustering operation is completed, semiclusters that occur during splitting are detected and merged to …


Discovery Of Hydrometeorological Patterns, Mete Çeli̇k, Fi̇li̇z Dadaşer Çeli̇k, Ahmet Şaki̇r Dokuz Jan 2014

Discovery Of Hydrometeorological Patterns, Mete Çeli̇k, Fi̇li̇z Dadaşer Çeli̇k, Ahmet Şaki̇r Dokuz

Turkish Journal of Electrical Engineering and Computer Sciences

Hydrometeorological patterns can be defined as meaningful and nontrivial associations between hydrological and meteorological parameters over a region. Discovering hydrometeorological patterns is important for many applications, including forecasting hydrometeorological hazards (floods and droughts), predicting the hydrological responses of ungauged basins, and filling in missing hydrological or meteorological records. However, discovering these patterns is challenging due to the special characteristics of hydrological and meteorological data, and is computationally complex due to the archival history of the datasets. Moreover, defining monotonic interest measures to quantify these patterns is difficult. In this study, we propose a new monotonic interest measure, called the hydrometeorological …


An Urgent Precaution System To Detect Students At Risk Of Substance Abuse Through Classification Algorithms, Faruk Bulut, İhsan Ömür Bucak Jan 2014

An Urgent Precaution System To Detect Students At Risk Of Substance Abuse Through Classification Algorithms, Faruk Bulut, İhsan Ömür Bucak

Turkish Journal of Electrical Engineering and Computer Sciences

In recent years, the use of addictive drugs and substances has turned out to be a challenging social problem worldwide. The illicit use of these types of drugs and substances appears to be increasing among elementary and high school students. After becoming addicted to drugs, life becomes unbearable and gets even worse for their users. Scientific studies show that it becomes extremely difficult for an individual to break this habit after being a user. Hence, preventing teenagers from addiction becomes an important issue. This study focuses on an urgent precaution system that helps families and educators prevent teenagers from developing …


Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan Apr 2013

Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan

Research Collection School Of Computing and Information Systems

ContextSQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that …


A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek Jan 2013

A Rule Induction Algorithm For Knowledge Discovery And Classification, Ömer Akgöbek

Turkish Journal of Electrical Engineering and Computer Sciences

Classification and rule induction are key topics in the fields of decision making and knowledge discovery. The objective of this study is to present a new algorithm developed for automatic knowledge acquisition in data mining. The proposed algorithm has been named RES-2 (Rule Extraction System). It aims at eliminating the pitfalls and disadvantages of the techniques and algorithms currently in use. The proposed algorithm makes use of the direct rule extraction approach, rather than the decision tree. For this purpose, it uses a set of examples to induce general rules. In this study, 15 datasets consisting of multiclass values with …


Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin Oct 2012

Semi-Automatic Simulation Initialization By Mining Structured And Unstructured Data Formats From Local And Web Data Sources, Olcay Sahin

Computational Modeling & Simulation Engineering Theses & Dissertations

Initialization is one of the most important processes for obtaining successful results from a simulation. However, initialization is a challenge when 1) a simulation requires hundreds or even thousands of input parameters or 2) re-initializing the simulation due to different initial conditions or runtime errors. These challenges lead to the modeler spending more time initializing a simulation and may lead to errors due to poor input data.

This thesis proposes two semi-automatic simulation initialization approaches that provide initialization using data mining from structured and unstructured data formats from local and web data sources. First, the System Initialization with Retrieval (SIR) …


Data Mining Of Protein Databases, Christopher Assi Jul 2012

Data Mining Of Protein Databases, Christopher Assi

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Data mining of protein databases poses special challenges because many protein databases are non-relational whereas most data mining and machine learning algorithms assume the input data to be a relational database. Protein databases are non-relational mainly because they often contain set data types. We developed new data mining algorithms that can restructure non-relational protein databases so that they become relational and amenable for various data mining and machine learning tools. We applied the new restructuring algorithms to a pancreatic protein database. After the restructuring, we also applied two classification methods, such as decision tree and SVM classifiers and compared their …


Data Mining Based Learning Algorithms For Semi-Supervised Object Identification And Tracking, Michael P. Dessauer Jan 2011

Data Mining Based Learning Algorithms For Semi-Supervised Object Identification And Tracking, Michael P. Dessauer

Doctoral Dissertations

Sensor exploitation (SE) is the crucial step in surveillance applications such as airport security and search and rescue operations. It allows localization and identification of movement in urban settings and can significantly boost knowledge gathering, interpretation and action. Data mining techniques offer the promise of precise and accurate knowledge acquisition techniques in high-dimensional data domains (and diminishing the “curse of dimensionality” prevalent in such datasets), coupled by algorithmic design in feature extraction, discriminative ranking, feature fusion and supervised learning (classification). Consequently, data mining techniques and algorithms can be used to refine and process captured data and to detect, recognize, classify, …


Artificial Intelligence – I: A Two-Step Approach For Improving Efficiency Of Feedforward Multilayer Perceptrons Network, Shoukat Ullah, Zakia Hussain Aug 2009

Artificial Intelligence – I: A Two-Step Approach For Improving Efficiency Of Feedforward Multilayer Perceptrons Network, Shoukat Ullah, Zakia Hussain

International Conference on Information and Communication Technologies

An artificial neural network has got greater importance in the field of data mining. Although it may have complex structure, long training time, and uneasily understandable representation of results, neural network has high accuracy and is preferable in data mining. This research paper is aimed to improve efficiency and to provide accurate results on the basis of same behaviour data. To achieve these objectives, an algorithm is proposed that uses two data mining techniques, that is, attribute selection method and cluster analysis. The algorithm works by applying attribute selection method to eliminate irrelevant attributes, so that input dimensionality is reduced …