Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

A Pseudo Nearest-Neighbor Approach For Missing Data Recovery On Gaussian Random Data Sets, Xiaolu Huang, Qiuming Zhu Nov 2002

A Pseudo Nearest-Neighbor Approach For Missing Data Recovery On Gaussian Random Data Sets, Xiaolu Huang, Qiuming Zhu

Computer Science Faculty Publications

Missing data handling is an important preparation step for most data discrimination or mining tasks. Inappropriate treatment of missing data may cause large errors or false results. In this paper, we study the effect of a missing data recovery method, namely the pseudo- nearest neighbor substitution approach, on Gaussian distributed data sets that represent typical cases in data discrimination and data mining applications. The error rate of the proposed recovery method is evaluated by comparing the clustering results of the recovered data sets to the clustering results obtained on the originally complete data sets. The results are also compared with …


An Iterative Initial-Points Refinement Algorithm For Categorical Data Clustering, Ying Sun, Qiuming Zhu, Zhengxin Chen May 2002

An Iterative Initial-Points Refinement Algorithm For Categorical Data Clustering, Ying Sun, Qiuming Zhu, Zhengxin Chen

Computer Science Faculty Publications

The original k-means clustering algorithm is designed to work primarily on numeric data sets. This prohibits the algorithm from being directly applied to categorical data clustering in many data mining applications. The k-modes algorithm [Z. Huang, Clustering large data sets with mixed numeric and categorical value, in: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference. World Scientific, Singapore, 1997, pp. 21–34] extended the k-means paradigm to cluster categorical data by using a frequency-based method to update the cluster modes versus the k-means fashion of minimizing a numerically valued cost. However, as is …


Group Properties Of Crossover And Mutation, Jonathan E. Rowe, Michael D. Vose, Alden H. Wright Jan 2002

Group Properties Of Crossover And Mutation, Jonathan E. Rowe, Michael D. Vose, Alden H. Wright

Computer Science Faculty Publications

It is supposed that the finite search space Ω has certain symmetries that can be described in terms of a group of permutations acting upon it. If crossover and mutation respect these symmetries, then these operators can be described in terms of a mixing matrix and a group of permutation matrices. Conditions under which certain subsets of Ω are invariant under crossover are investigated, leading to a generalization of the term schema. Finally, it is sometimes possible for the group acting on Ω to induce a group structure on Ω itself.


Federated Searching Interface Techniques For Heterogeneous Oai Repositories, Xiaoming Liu, Kurt Maly, Mohammad Zubair, Qiaoling Hong, Michael L. Nelson, Frances Knudson, Irma Holtkamp Jan 2002

Federated Searching Interface Techniques For Heterogeneous Oai Repositories, Xiaoming Liu, Kurt Maly, Mohammad Zubair, Qiaoling Hong, Michael L. Nelson, Frances Knudson, Irma Holtkamp

Computer Science Faculty Publications

Federating repositories by harvesting heterogeneous collections with varying degrees of metadata richness poses a number of challenging issues: (1) how to address the lack of uniform control for various metadata fields in terms of building a rich unified search interface, and (2) how easily new collections and freshly harvested data in existing repositories can be incorporated into the federation supporting a unified interface? This paper focuses on the approaches taken to address these issues in Arc, an Open Archives Initiative compliant federated digital library. At present Arc contains over 1M metadata records from 75 data providers from various subject domains. …


A Scalable Architecture For Harvest-Based Digital Libraries, Xiaoming Liu, Tim Brody, Stevan Harnard, Les Carr, Kurt Maly, Mohammad Zubair, Michael L. Nelson Jan 2002

A Scalable Architecture For Harvest-Based Digital Libraries, Xiaoming Liu, Tim Brody, Stevan Harnard, Les Carr, Kurt Maly, Mohammad Zubair, Michael L. Nelson

Computer Science Faculty Publications

This article discusses the requirements of current and emerging applications based on the Open Archives Initiative (OAI) and emphasizes the need for a common infrastructure to support them. Inspired by HTTP proxy, cache, gateway and web service concepts, a design for a scalable and reliable infrastructure that aims at satisfying these requirements is presented. Moreover, it is shown how various applications can exploit the services included in the proposed infrastructure. The article concludes by discussing the current status of several prototype implementations.


Object Persistence And Availability In Digital Libraries, Michael L. Nelson, B. Danette Allen Jan 2002

Object Persistence And Availability In Digital Libraries, Michael L. Nelson, B. Danette Allen

Computer Science Faculty Publications

We have studied object persistence and availability of 1,000 digital library (DL) objects. Twenty World Wide Web accessible DLs were chosen and from each DL, 50 objects were chosen at random. A script checked the availability of each object three times a week for just over 1 year for a total of 161 data samples. During this time span, we found 31 objects (3% of the total) that appear to no longer be available: 24 from PubMed Central, 5 from IDEAS, 1 from CogPrints, and 1 from ETD.


The Single Row Routing Problem Revisited: A Solution Based On Genetic Algorithms, Albert Y. Zomaya, Roger Karpin, Stephan Olariu Jan 2002

The Single Row Routing Problem Revisited: A Solution Based On Genetic Algorithms, Albert Y. Zomaya, Roger Karpin, Stephan Olariu

Computer Science Faculty Publications

With the advent of VLSI technology, circuits with more than one million transistors have been integrated onto a single chip. As the complexity of ICs grows, the time and money spent on designing the circuits become more important. A large, often dominant, part of the cost and time required to design an IC is consumed in the routing operation. The routing of carriers, such as in IC chips and printed circuit boards, is a classical problem in Computer Aided Design. With the complexity inherent in VLSI circuits, high performance routers are necessary. In this paper, a crucial step in the …


Fast Inner Product Computation On Short Buses, R. Lin, S. Olariu Jan 2002

Fast Inner Product Computation On Short Buses, R. Lin, S. Olariu

Computer Science Faculty Publications

We propose a VLSI inner product processor architecture involving broadcasting only over short buses (containing less than 64 switches). The architecture leads to an efficient algorithm for the inner product computation. Specifically, it takes 13 broadcasts, each over less than 64 switches, plus 2 carry-save additions (tcsa) and 2 carry-lookahead additions (tcla) to compute the inner product of two arrays of N = 29 elements, each consisting of m = 64 bits. Using the same order of VLSI area, our algorithm runs faster than the best known fast inner product algorithm of Smith and Torng …