Physical Sciences and Mathematics | Open Access Articles

A Clustering Comparison Measure Using Density Profiles And Its Application To The Discovery Of Alternate Clusterings, Eric Bae, James Bailey, Guozhu Dong Nov 2010

A Clustering Comparison Measure Using Density Profiles And Its Application To The Discovery Of Alternate Clusterings, Eric Bae, James Bailey, Guozhu Dong

Kno.e.sis Publications

Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measures to quantify the degree of similarity between alternative clusterings. Existing measures, though, can be limited in their ability to assess similarity and sometimes generate unintuitive results. They also cannot be applied to compare clusterings which contain different data points, an activity which is important for scenarios such as data stream analysis. In this paper, we introduce a …

Go to article

A Virtual Infrastructure For Mitigating Typical Challenges In Sensor Networks, Hady S. Abdel Salam Oct 2010

A Virtual Infrastructure For Mitigating Typical Challenges In Sensor Networks, Hady S. Abdel Salam

Computer Science Theses & Dissertations

Sensor networks have their own distinguishing characteristics that set them apart from other types of networks. Typically, the sensors are deployed in large numbers and in random fashion and the resulting sensor network is expected to self-organize in support of the mission for which it was deployed. Because of the random deployment of sensors that are often scattered from an overflying aircraft, the resulting network is not easy to manage since the sensors do not know their location, do not know how to aggregate their sensory data and where and how to route the aggregated data. The limited energy budget …

Go to article

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo Sep 2010

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

Research Collection School Of Computing and Information Systems

Named constants are used heavily in operating systems code, both as internal flags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …

Go to article

Using Clustering For Modeling Monthly Salary Grade, R. W. Hndoosh Jul 2010

Using Clustering For Modeling Monthly Salary Grade, R. W. Hndoosh

R. W. Hndoosh

Clustering is considered as one of the most scientifically developments which the scientists reached at in the field of recent knowledge and technologies to discover the cluster's group. The clustering concept was introduced firstly by Ronald in 1955. The clustering's fundamental notion is represented in dividing the data into clusters. This research aims to using clustering for actual data modeling for the monthly salary grade of the teaching staff for one of the Mosul University's College in 2009, by using HCM algorithm to these data. Matlab software is used to write down the proposed algorithm programs. Results proved the efficiency …

Go to article

Clustering Weblogs On The Basis Of A Topic Detection Method, Fernando Perez-Tellez, David Pinto, John Cardiff, Paolo Rosso Jan 2010

Clustering Weblogs On The Basis Of A Topic Detection Method, Fernando Perez-Tellez, David Pinto, John Cardiff, Paolo Rosso

Conference Papers

In recent years we have seen a vast increase in the volume of information published on weblog sites and also the creation of new web technologies where people discuss actual events. The need for automatic tools to organize this massive amount of information is clear, but the particular characteristics of weblogs such as shortness and overlapping vocabulary make this task difficult. In this work, we present a novel methodology to cluster weblog posts according to the topics discussed therein. This methodology is based on a generative probabilistic model in conjunction with a Self-Term Expansion methodology. We present our results which …

Go to article

Clustering Spam Domains And Destination Websites: Digital Forensics With Data Mining, Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum Jan 2010

Clustering Spam Domains And Destination Websites: Digital Forensics With Data Mining, Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum

Journal of Digital Forensics, Security and Law

Spam related cyber crimes have become a serious threat to society. Current spam research mainly aims to detect spam more effectively. We believe the identification and disruption of the supporting infrastructure used by spammers is a more effective way of stopping spam than filtering. The termination of spam hosts will greatly reduce the profit a spammer can generate and thwart his ability to send more spam. This research proposes an algorithm for clustering spam domains extracted from spam emails based on the hosting IP addresses and tracing the IP addresses over a period of time. The results show that many …

Go to article

Localized Feature Selection For Unsupervised Learning, Yuanhong Li Jan 2010

Localized Feature Selection For Unsupervised Learning, Yuanhong Li

Wayne State University Dissertations

Clustering is the unsupervised classification of data objects into different groups (clusters) such that objects in one group are similar together and dissimilar from another group. Feature selection for unsupervised learning is a technique that chooses the best feature subset for clustering. In general, unsupervised feature selection algorithms conduct feature selection in a global sense by producing a common feature subset for all the clusters. This, however, can be invalid in clustering practice, where the local intrinsic property of data matters more, which implies that localized feature selection is more desirable.

In this dissertation, we focus on cluster-wise feature selection …

Go to article

A Contrast Pattern Based Clustering Algorithm For Categorical Data, Neil Koberlein Fore Jan 2010

A Contrast Pattern Based Clustering Algorithm For Categorical Data, Neil Koberlein Fore

Browse all Theses and Dissertations

The data clustering problem has received much attention in the data mining, machine learning, and pattern recognition communities over a long period of time. Many previous approaches to solving this problem require the use of a distance function. However, since clustering is highly explorative and is usually performed on data which are rather new, it is debatable whether users can provide good distance functions for the data. This thesis proposes a Contrast Pattern based Clustering (CPC) algorithm to construct clusters without a distance function, by focusing on the quality and diversity/richness of contrast patterns that contrast the clusters in a …

Go to article

Reeling In Big Phish With A Deep Md5 Net, Brad Wardman, Gary Warner, Heather Mccalley, Sarah Turner, Anthony Skjellum Jan 2010

Reeling In Big Phish With A Deep Md5 Net, Brad Wardman, Gary Warner, Heather Mccalley, Sarah Turner, Anthony Skjellum

Journal of Digital Forensics, Security and Law

Phishing continues to grow as phishers discover new exploits and attack vectors for hosting malicious content; the traditional response using takedowns and blacklists does not appear to impede phishers significantly. A handful of law enforcement projects — for example the FBI's Digital PhishNet and the Internet Crime and Complaint Center (ic3.gov) — have demonstrated that they can collect phishing data in substantial volumes, but these collections have not yet resulted in a significant decline in criminal phishing activity. In this paper, a new system is demonstrated for prioritizing investigative resources to help reduce the time and effort expended examining this …

Go to article

Svm Based Active Learning With Exploration, Patrick Lindstrom, Rong Hu, Sarah Jane Delany, Brian Mac Namee Jan 2010

Svm Based Active Learning With Exploration, Patrick Lindstrom, Rong Hu, Sarah Jane Delany, Brian Mac Namee

Conference papers

No abstract provided.

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

A Clustering Comparison Measure Using Density Profiles And Its Application To The Discovery Of Alternate Clusterings, Eric Bae, James Bailey, Guozhu Dong

Kno.e.sis Publications

A Virtual Infrastructure For Mitigating Typical Challenges In Sensor Networks, Hady S. Abdel Salam

Computer Science Theses & Dissertations

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

Research Collection School Of Computing and Information Systems

Using Clustering For Modeling Monthly Salary Grade, R. W. Hndoosh

R. W. Hndoosh

Clustering Weblogs On The Basis Of A Topic Detection Method, Fernando Perez-Tellez, David Pinto, John Cardiff, Paolo Rosso

Conference Papers

Clustering Spam Domains And Destination Websites: Digital Forensics With Data Mining, Chun Wei, Alan Sprague, Gary Warner, Anthony Skjellum

Journal of Digital Forensics, Security and Law

Localized Feature Selection For Unsupervised Learning, Yuanhong Li

Wayne State University Dissertations

A Contrast Pattern Based Clustering Algorithm For Categorical Data, Neil Koberlein Fore

Browse all Theses and Dissertations

Reeling In Big Phish With A Deep Md5 Net, Brad Wardman, Gary Warner, Heather Mccalley, Sarah Turner, Anthony Skjellum

Journal of Digital Forensics, Security and Law

Svm Based Active Learning With Exploration, Patrick Lindstrom, Rong Hu, Sarah Jane Delany, Brian Mac Namee

Conference papers