Open Access. Powered by Scholars. Published by Universities.®
Databases and Information Systems Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication
- Publication Type
Articles 1 - 6 of 6
Full-Text Articles in Databases and Information Systems
Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao
Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao
Zhongmei Yao
Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …
Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa
Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa
Theses and Dissertations--Computer Science
Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern …
A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse
A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse
Dr. Huanjing Wang
Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …
A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse
A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse
Computer Science Faculty Publications
Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …
Temporal Data Classification Using Linear Classifiers, Peter Revesz, Thomas Triplet
Temporal Data Classification Using Linear Classifiers, Peter Revesz, Thomas Triplet
CSE Conference and Workshop Papers
Data classification is usually based on measurements recorded at the same time. This paper considers temporal data classification where the input is a temporal database that describes measurements over a period of time in history while the predicted class is expected to occur in the future. We describe a new temporal classification method that improves the accuracy of standard classification methods. The benefits of the method are tested on weather forecasting using the meteorological database from the Texas Commission on Environmental Quality.
Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao
Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao
Computer Science Faculty Publications
Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …