Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 32

Full-Text Articles in Computer Engineering

Ki-Cook: Clustering Multimodal Cooking Representations Through Knowledge-Infused Learning, Revathy Venkataramanan, Swati Padhee, Saini Rohan Rao, Ronak Kaoshik, Anirudh Sundara Rajan, Amit Sheth Jul 2023

Ki-Cook: Clustering Multimodal Cooking Representations Through Knowledge-Infused Learning, Revathy Venkataramanan, Swati Padhee, Saini Rohan Rao, Ronak Kaoshik, Anirudh Sundara Rajan, Amit Sheth

Publications

Cross-modal recipe retrieval has gained prominence due to its ability to retrieve a text representation given an image representation and vice versa. Clustering these recipe representations based on similarity is essential to retrieve relevant information about unknown food images. Existing studies cluster similar recipe representations in the latent space based on class names. Due to inter-class similarity and intraclass variation, associating a recipe with a class name does not provide sufficient knowledge about recipes to determine similarity. However, recipe title, ingredients, and cooking actions provide detailed knowledge about recipes and are a better determinant of similar recipes. In this study, …


A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead Jun 2021

A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead

Mathematics, Physics, and Computer Science Faculty Articles and Research

In previous works, we have shown the efficacy of using Deep Belief Networks, paired with clustering, to identify distinct classes of objects within remotely sensed data via cluster analysis and qualitative analysis of the output data in comparison with reference data. In this paper, we quantitatively validate the methodology against datasets currently being generated and used within the remote sensing community, as well as show the capabilities and benefits of the data fusion methodologies used. The experiments run take the output of our unsupervised fusion and segmentation methodology and map them to various labeled datasets at different levels of global …


Learning Discriminative And Efficient Attention For Person Re-Identification Using Agglomerative Clustering Frameworks, Kshitij Nikhal Apr 2021

Learning Discriminative And Efficient Attention For Person Re-Identification Using Agglomerative Clustering Frameworks, Kshitij Nikhal

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Recent advancements like multiple contextual analysis, attention mechanisms, distance-aware optimization, and multi-task guidance have been widely used for supervised person re-identification (ReID), but the implementation and effects of such methods in unsupervised person ReID frameworks are non-trivial and unclear, respectively. Moreover, with increasing size and complexity of image- and video-based ReID datasets, manual or semi-automated annotation procedures for supervised ReID are becoming labor intensive and cost prohibitive, which is undesirable especially considering the likelihood of annotation errors increase with scale/complexity of data collections. Therefore, this thesis proposes a new iterative clustering framework that incorporates (a) two attention architectures that learn …


Clustered Mobile Data Collection In Wsns: An Energy-Delay Trade-Of, İzzet Fati̇h Şentürk Jan 2021

Clustered Mobile Data Collection In Wsns: An Energy-Delay Trade-Of, İzzet Fati̇h Şentürk

Turkish Journal of Electrical Engineering and Computer Sciences

Wireless sensor networks enable monitoring remote areas with limited human intervention. However, the network connectivity between sensor nodes and the base station (BS) may not be always possible due to the limited transmission range of the nodes. In such a case, one or more mobile data collectors (MDCs) can be employed to visit nodes for data collection. If multiple MDCs are available, it is desirable to minimize the energy cost of mobility while distributing the cost among the MDCs in a fair manner. Despite availability of various clustering algorithms, there is no single fits all clustering solution when different requirements …


An Efficient Storage-Optimizing Tick Data Clustering Model, Haleh Amintoosi, Masood Niazi Torshiz, Yahya Forghani, Sara Alinejad Jan 2020

An Efficient Storage-Optimizing Tick Data Clustering Model, Haleh Amintoosi, Masood Niazi Torshiz, Yahya Forghani, Sara Alinejad

Turkish Journal of Electrical Engineering and Computer Sciences

Tick data is a large volume of data, related to a phenomenon such as stock market or weather change, with data values changing rapidly over time. An important issue is to store tick data table in a way that it occupies minimum storage space while at the same time it can provide fast execution of queries. In this paper, a mathematical model is proposed to partition tick data tables into clusters with the aim of minimizing the required storage space. The genetic algorithm is then used to solve the mathematical model which is indeed a clustering model. The proposed method …


Bibsqlqc: Brown Infomax Boosted Sql Query Clustering Algorithm To Detectanti-Patterns In The Query Log, Vinothsaravanan Ramakrishnan, Palanisamy Chenniappan Jan 2020

Bibsqlqc: Brown Infomax Boosted Sql Query Clustering Algorithm To Detectanti-Patterns In The Query Log, Vinothsaravanan Ramakrishnan, Palanisamy Chenniappan

Turkish Journal of Electrical Engineering and Computer Sciences

Discovery of antipatterns from arbitrary SQL query log depends on the static code analysis used to enhance the quality and performance of software applications. The existence of antipatterns reduces the quality and leads to redundant SQL statements. SQL log includes a large load on the database and it is difficult for an analyst to extract large patterns in a minimal time. Existing techniques which discover antipatterns in SQL query face a lot of innumerable challenges to discover the normal sequences of queries within the log. In order to discover the antipatterns in the log, an efficient technique called Brown infomax …


Pixel-Level Deep Multi-Dimensional Embeddings For Homogeneous Multiple Object Tracking, Mateusz Mittek Dec 2019

Pixel-Level Deep Multi-Dimensional Embeddings For Homogeneous Multiple Object Tracking, Mateusz Mittek

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

The goal of Multiple Object Tracking (MOT) is to locate multiple objects and keep track of their individual identities and trajectories given a sequence of (video) frames. A popular approach to MOT is tracking by detection consisting of two processing components: detection (identification of objects of interest in individual frames) and data association (connecting data from multiple frames). This work addresses the detection component by introducing a method based on semantic instance segmentation, i.e., assigning labels to all visible pixels such that they are unique among different instances. Modern tracking methods often built around Convolutional Neural Networks (CNNs) and additional, …


Exploring Bigram Character Features For Arabic Text Clustering, Dia Eddin Abuzeina Jan 2019

Exploring Bigram Character Features For Arabic Text Clustering, Dia Eddin Abuzeina

Turkish Journal of Electrical Engineering and Computer Sciences

The vector space model (VSM) is an algebraic model that is widely used for data representation in text mining applications. However, the VSM poses a critical challenge, as it requires a high-dimensional feature space. Therefore, many feature selection techniques, such as employing roots or stems (i.e. words without infixes and prefixes, and/or suffixes) instead of using complete word forms, are proposed to tackle this space challenge problem. Recently, the literature shows that one more basic unit feature can be used to handle the textual features, which is the twoneighboring character form that we call microword. To evaluate this feature type, …


A New Model To Determine The Hierarchical Structure Of The Wireless Sensor Networks, Resmi̇ye Nasi̇boğlu, Zülküf Teki̇n Erten Jan 2019

A New Model To Determine The Hierarchical Structure Of The Wireless Sensor Networks, Resmi̇ye Nasi̇boğlu, Zülküf Teki̇n Erten

Turkish Journal of Electrical Engineering and Computer Sciences

Wireless sensor networks are one of the rising areas of scientific research. Common purpose of these investigations is usually constructing optimal structure of the network by prolonging its lifetime. In this study, a new model has been proposed to construct a hierarchical structure of wireless sensor networks. Methods used in the model to determine clusters and appropriate cluster heads are k-means clustering and fuzzy inference system (FIS), respectively. The weighted averaging based on levels (WABL) defuzzification method is used to calculate crisp outputs of the FIS. A new theorem for calculation of WABL values has been proved in order to …


Evaluating The Attributes Of Remote Sensing Image Pixels For Fast K-Means Clustering, Ali̇ Sağlam, Nurdan Baykan Jan 2019

Evaluating The Attributes Of Remote Sensing Image Pixels For Fast K-Means Clustering, Ali̇ Sağlam, Nurdan Baykan

Turkish Journal of Electrical Engineering and Computer Sciences

Clustering process is an important stage for many data mining applications. In this process, data elements are grouped according to their similarities. One of the most known clustering algorithms is the k-means algorithm. The algorithm initially requires the number of clusters as a parameter and runs iteratively. Many remote sensing image processing applications usually need the clustering stage like many image processing applications. Remote sensing images provide more information about the environments with the development of the multispectral sensor and laser technologies. In the dataset used in this paper, the infrared (IR) and the digital surface maps (DSM) are also …


Efficient Hierarchical Temporal Segmentation Method For Facial Expression Sequences, Jiali Bian, Xue Mei, Yu Xue, Liang Wu, Yao Ding Jan 2019

Efficient Hierarchical Temporal Segmentation Method For Facial Expression Sequences, Jiali Bian, Xue Mei, Yu Xue, Liang Wu, Yao Ding

Turkish Journal of Electrical Engineering and Computer Sciences

Temporal segmentation of facial expression sequences is important to understand and analyze human facial expressions. It is, however, challenging to deal with the complexity of facial muscle movements by finding a suitable metric to distinguish among different expressions and to deal with the uncontrolled environmental factors in the real world. This paper presents a two-step unsupervised segmentation method composed of rough segmentation and fine segmentation stages to compute the optimal segmentation positions in video sequences to facilitate the segmentation of different facial expressions. The proposed method performs localization of facial expression patches to aid in recognition and extraction of specific …


Composite Vector Quantization For Optimizing Antenna Locations, Zekeri̇ya Uykan, Riku Jantti Jan 2018

Composite Vector Quantization For Optimizing Antenna Locations, Zekeri̇ya Uykan, Riku Jantti

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, we study the location optimization problem of remote antenna units (RAUs) in generalized distributed antenna systems (GDASs). We propose a composite vector quantization (CVQ) algorithm that consists of unsupervised and supervised terms for RAU location optimization. We show that the CVQ can be used i) to minimize an \textit{upper bound} to the cell-averaged SNR error for a desired/demanded location-specific SNR function, and ii) to maximize the cell-averaged \textit{effective} \textit{SNR}. The CVQ-DAS includes the standard VQ, and thus the well-known squared distance criterion (SDC) as a special case. Computer simulations confirm the findings and suggest that the proposed …


Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao Jan 2018

Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao

Theses and Dissertations--Computer Science

With exponential growth on a daily basis, there is potentially valuable information hidden in complex electronic medical records (EMR) systems. In this thesis, several efficient data mining algorithms were explored to discover hidden knowledge in insurance claims data. The first aim was to cluster three levels of information overload(IO) groups among chronic rheumatic disease (CRD) patient groups based on their clinical events extracted from insurance claims data. The second aim was to discover hidden patterns using three renowned pattern mining algorithms: Apriori, frequent pattern growth(FP-Growth), and sequential pattern discovery using equivalence classes(SPADE). The SPADE algorithm was found to be the …


Unsupervised Learning Of Allomorphs In Turkish, Burcu Can Jan 2017

Unsupervised Learning Of Allomorphs In Turkish, Burcu Can

Turkish Journal of Electrical Engineering and Computer Sciences

One morpheme may have several surface forms that correspond to allomorphs. In English, ed and $d$ are surface forms of the past tense morpheme, and $s$, es, and ies are surface forms of the plural or present tense morpheme. Turkish has a large number of allomorphs due to its morphophonemic processes. One morpheme can have tens of different surface forms in Turkish. This leads to a sparsity problem in natural language processing tasks in Turkish. Detection of allomorphs has not been studied much because of its difficulty. For example, tü and di are Turkish allomorphs (i.e. past tense morpheme), but …


An Adaptive Clustering Segmentation Algorithm Based On Fcm, Jun Yang, Yun-Sheng Ke, Mao-Zheng Wang Jan 2017

An Adaptive Clustering Segmentation Algorithm Based On Fcm, Jun Yang, Yun-Sheng Ke, Mao-Zheng Wang

Turkish Journal of Electrical Engineering and Computer Sciences

The cluster number and the initial clustering centers must be reasonably set before the analysis of clustering in most cases. Traditional clustering segmentation algorithms have many shortcomings, such as high reliance on the specially established initial clustering center, tendency to fall into the local maximum point, and poor performance with multithreshold values. To overcome these defects, an adaptive fuzzy C-means segmentation algorithm based on a histogram (AFCMH), which synthesizes both main peaks of the histogram and optimized Otsu criterion, is proposed. First, the main peaks of the histogram are chosen by operations like histogram smoothing, merging of adjacent peaks, and …


An Intelligent Pso-Based Energy Efficient Load Balancing Multipath Technique In Wireless Sensor Networks, Sukhchandan Randhawa, Sushma Jain Jan 2017

An Intelligent Pso-Based Energy Efficient Load Balancing Multipath Technique In Wireless Sensor Networks, Sukhchandan Randhawa, Sushma Jain

Turkish Journal of Electrical Engineering and Computer Sciences

To provide a reliable and efficient service, load balancing plays an important role in wireless sensor networks (WSNs). There is a need to maximize the network lifetime for WSNs applications with periodic generation of data. Due to the relationship between energy consumption and network sensor node lifetime, energy consumption in a network should be minimized and balanced in order to increase network lifetime. Energy-efficient load-balancing techniques are needed to solve this problem. In this paper, a particle swarm optimization (PSO)-based energy-efficient load-balancing technique is proposed, in which the required number of routing paths and energy consumption of different nodes and …


Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi Jan 2017

Proposing A New Clustering Method To Detect Phishing Websites, Morteza Arab, Mohammad Karim Sohrabi

Turkish Journal of Electrical Engineering and Computer Sciences

Phishing websites are fake ones that are developed by ill-intentioned people to imitate real and legal websites. Most of these types of web pages have high visual similarities to hustle the victims. The victims of phishing websites may give their bank accounts, passwords, credit card numbers, and other important information to the designers and owners of phishing websites. The increasing number of phishing websites has become a great challenge in e-business in general and in electronic banking specifically. In the present study, a novel framework based on model-based clustering is introduced to fight against phishing websites. First, a model is …


A Clustering Approach Using A Combination Of Gravitational Search Algorithm And K-Harmonic Means And Its Application In Text Document Clustering, Mina Mirhosseini Jan 2017

A Clustering Approach Using A Combination Of Gravitational Search Algorithm And K-Harmonic Means And Its Application In Text Document Clustering, Mina Mirhosseini

Turkish Journal of Electrical Engineering and Computer Sciences

Data clustering is one of the most popular techniques of information management, which is used in many applications of science and engineering such as machine learning, pattern reorganization, image processing, data mining, and web mining. Different algorithms have been suggested by researchers, where the evolutionary algorithms are the best in data clustering and especially in big datasets. It is illustrated that GSA-KM, which is a combination of the gravitational search algorithm (GSA) and K-means (KM), is superior over some other comparative evolutionary methods. One of the drawbacks of this approach is dependency on the initial seeds. In this paper, a …


A Novel Approach For Extracting Ideal Exemplars By Clustering For Massivetime-Ordered Datasets, Ömer Faruk Ertuğrul Jan 2017

A Novel Approach For Extracting Ideal Exemplars By Clustering For Massivetime-Ordered Datasets, Ömer Faruk Ertuğrul

Turkish Journal of Electrical Engineering and Computer Sciences

The number and length of massive datasets have increased day by day and this yields more complex machine learning stages due to the high computational costs. To decrease the computational cost many methods were proposed in the literature such as data condensing, feature selection, and filtering. Although clustering methods are generally employed to divide samples into groups, another way of data condensing is by determining ideal exemplars (or prototypes), which can be used instead of the whole dataset. In this study, first the efficiency of traditional data condensing by clustering approach was confirmed according to obtained accuracies and condensing ratios …


Neuron Clustering For Mitigating Catastrophic Forgetting In Supervised And Reinforcement Learning, Benjamin Frederick Goodrich Dec 2015

Neuron Clustering For Mitigating Catastrophic Forgetting In Supervised And Reinforcement Learning, Benjamin Frederick Goodrich

Doctoral Dissertations

Neural networks have had many great successes in recent years, particularly with the advent of deep learning and many novel training techniques. One issue that has affected neural networks and prevented them from performing well in more realistic online environments is that of catastrophic forgetting. Catastrophic forgetting affects supervised learning systems when input samples are temporally correlated or are non-stationary. However, most real-world problems are non-stationary in nature, resulting in prolonged periods of time separating inputs drawn from different regions of the input space.

Reinforcement learning represents a worst-case scenario when it comes to precipitating catastrophic forgetting in neural networks. …


Comparison Of Clustering Techniques For Traffic Accident Detection, Nejdet Doğru, Abdülhami̇t Subaşi Jan 2015

Comparison Of Clustering Techniques For Traffic Accident Detection, Nejdet Doğru, Abdülhami̇t Subaşi

Turkish Journal of Electrical Engineering and Computer Sciences

Transportation infrastructure in intelligent transportation systems (ITSs) is complemented with information and communication technologies to achieve better passenger safety and reduced transportation time, fuel consumption, and vehicle wear and tear. This paper shows how data mining techniques are used in ITSs for accident detection and prevention on motorways. In traffic, vehicles show similar behavior to that of vehicles in closed neighborhoods. Vehicles that show different behaviors than neighbor vehicles in cases like accidents, inappropriate lane changes, and speeding can be considered as anomalies and detected. In this paper, a traffic accident is simulated and the effectiveness of different clustering techniques …


Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague Jan 2014

Hot Zone Identification: Analyzing Effects Of Data Sampling On Spam Clustering, Rasib Khan, Mainul Mizan, Ragib Hasan, Alan Sprague

Journal of Digital Forensics, Security and Law

Email is the most common and comparatively the most efficient means of exchanging information in today's world. However, given the widespread use of emails in all sectors, they have been the target of spammers since the beginning. Filtering spam emails has now led to critical actions such as forensic activities based on mining spam email. The data mine for spam emails at the University of Alabama at Birmingham is considered to be one of the most prominent resources for mining and identifying spam sources. It is a widely researched repository used by researchers from different global organizations. The usual process …


M-Fdbscan: A Multicore Density-Based Uncertain Data Clustering Algorithm, Atakan Erdem, Taflan İmre Gündem Jan 2014

M-Fdbscan: A Multicore Density-Based Uncertain Data Clustering Algorithm, Atakan Erdem, Taflan İmre Gündem

Turkish Journal of Electrical Engineering and Computer Sciences

In many data mining applications, we use a clustering algorithm on a large amount of uncertain data. In this paper, we adapt an uncertain data clustering algorithm called fast density-based spatial clustering of applications with noise (FDBSCAN) to multicore systems in order to have fast processing. The new algorithm, which we call multicore FDBSCAN (M-FDBSCAN), splits the data domain into c rectangular regions, where c is the number of cores in the system. The FDBSCAN algorithm is then applied to each rectangular region simultaneously. After the clustering operation is completed, semiclusters that occur during splitting are detected and merged to …


Motion Clustering On Video Sequences Using A Competitive Learning Network, Sali̇h Görgünoğlu, Şafak Altay Jan 2014

Motion Clustering On Video Sequences Using A Competitive Learning Network, Sali̇h Görgünoğlu, Şafak Altay

Turkish Journal of Electrical Engineering and Computer Sciences

It is necessary to track human movements in crowded places and environments such as stations, subways, metros, and schoolyards, where security is of great importance. As a result, undesired injuries, accidents, and unusual movements can be determined and various precautionary measures can be taken against them. In this study, real-time or existing video sequences are used within the system. These video sequences are obtained from objects such as humans or vehicles, moving actively in various environments. At first, some preprocesses are made respectively, such as converting gray scale, finding the edges of the objects existing in the images, and thresholding …


Eetbr: Energy Efficient Token-Based Routing For Wireless Sensor Networks, Taner Çevi̇k, Abdül Hali̇m Zai̇m Jan 2013

Eetbr: Energy Efficient Token-Based Routing For Wireless Sensor Networks, Taner Çevi̇k, Abdül Hali̇m Zai̇m

Turkish Journal of Electrical Engineering and Computer Sciences

The most significant drawback of wireless sensor networks is energy scarcity. As there is an increasing need for operating these networks for relatively long times, energy saving becomes the key challenge in the design of the architectures and protocols for sensor networks. Therefore, several research studies have been performed for making contributions to the analysis of this energy shortage problem. Most of these research activities have been focused on finding solutions for the energy consumption of the communication unit, which is the dominant energy dissipating component of the sensor nodes. In this paper, a novel, token-based routing protocol adapted with …


A Reputation-Based Privacy Management System For Social Networking Sites, Mehmet Erkan Yüksel, Asim Si̇nan Yüksel, Abdül Hali̇m Zai̇m Jan 2013

A Reputation-Based Privacy Management System For Social Networking Sites, Mehmet Erkan Yüksel, Asim Si̇nan Yüksel, Abdül Hali̇m Zai̇m

Turkish Journal of Electrical Engineering and Computer Sciences

Social networking sites form a special type of virtual community where we share our personal information with people and develop new relationships on the Internet. These sites allow the users to share just about everything, including photos, videos, favorite music, and games, and record all user interactions and retain them for potential use in social data mining. This storing and sharing of large amounts of information causes privacy problems for the users of these websites. In order to prevent these problems, we have to provide strict privacy policies, data protection mechanisms, and trusted and built-in applications that help to protect …


Outlier Rejection Fuzzy C-Means (Orfcm) Algorithm For Image Segmentation, Fasahat Ullah Siddiqui, Nor Ashidi Mat Isa, Abid Yahya Jan 2013

Outlier Rejection Fuzzy C-Means (Orfcm) Algorithm For Image Segmentation, Fasahat Ullah Siddiqui, Nor Ashidi Mat Isa, Abid Yahya

Turkish Journal of Electrical Engineering and Computer Sciences

This paper presents a fuzzy clustering-based technique for image segmentation. Many attempts have been put into practice to increase the conventional fuzzy c-means (FCM) performance. In this paper, the sensitivity of the soft membership function of the FCM algorithm to the outlier is considered and the new exponent operator on the Euclidean distance is implemented in the membership function to improve the outlier rejection characteristics of the FCM. The comparative quantitative and qualitative studies are performed among the conventional k-means (KM), moving KM, and FCM algorithms; the latest state-of-the-art clustering algorithms, namely the adaptive fuzzy moving KM , adaptive fuzzy …


Prevention And Detection Of Intrusions In Wireless Sensor Networks, Ismail Butun Jan 2013

Prevention And Detection Of Intrusions In Wireless Sensor Networks, Ismail Butun

USF Tampa Graduate Theses and Dissertations

Wireless Sensor Networks (WSNs) continue to grow as one of the most exciting and challenging research areas of engineering. They are characterized by severely constrained computational and energy

resources and also restricted by the ad-hoc network operational

environment. They pose unique challenges, due to limited power

supplies, low transmission bandwidth, small memory sizes and limited energy. Therefore, security techniques used in traditional networks cannot be directly adopted. So, new ideas and approaches are needed, in order to increase the overall security of the network. Security applications in such resource constrained WSNs with minimum overhead provides significant challenges, and is the …


Reeling In Big Phish With A Deep Md5 Net, Brad Wardman, Gary Warner, Heather Mccalley, Sarah Turner, Anthony Skjellum Jan 2010

Reeling In Big Phish With A Deep Md5 Net, Brad Wardman, Gary Warner, Heather Mccalley, Sarah Turner, Anthony Skjellum

Journal of Digital Forensics, Security and Law

Phishing continues to grow as phishers discover new exploits and attack vectors for hosting malicious content; the traditional response using takedowns and blacklists does not appear to impede phishers significantly. A handful of law enforcement projects — for example the FBI's Digital PhishNet and the Internet Crime and Complaint Center (ic3.gov) — have demonstrated that they can collect phishing data in substantial volumes, but these collections have not yet resulted in a significant decline in criminal phishing activity. In this paper, a new system is demonstrated for prioritizing investigative resources to help reduce the time and effort expended examining this …


Clustering Spam Domains And Destination Websites: Digital Forensics With Data Mining, Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum Jan 2010

Clustering Spam Domains And Destination Websites: Digital Forensics With Data Mining, Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum

Journal of Digital Forensics, Security and Law

Spam related cyber crimes have become a serious threat to society. Current spam research mainly aims to detect spam more effectively. We believe the identification and disruption of the supporting infrastructure used by spammers is a more effective way of stopping spam than filtering. The termination of spam hosts will greatly reduce the profit a spammer can generate and thwart his ability to send more spam. This research proposes an algorithm for clustering spam domains extracted from spam emails based on the hosting IP addresses and tracing the IP addresses over a period of time. The results show that many …