Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 34

Full-Text Articles in Physical Sciences and Mathematics

Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa Dec 2005

Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa

Research Collection School Of Computing and Information Systems

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to …


Accurately Extracting Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai Nov 2005

Accurately Extracting Coherent Relevant Passages Using Hidden Markov Models, Jing Jiang, Chengxiang Zhai

Research Collection School Of Computing and Information Systems

In this paper, we present a principled method for accurately extracting coherent relevant passages of variable lengths using HMMs. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two data sets.


A Threshold-Based Algorithm For Continuous Monitoring Of K Nearest Neighbors, Kyriakos Mouratidis, Dimitris Papadias, Spiridon Bakiras, Yufei Tao Nov 2005

A Threshold-Based Algorithm For Continuous Monitoring Of K Nearest Neighbors, Kyriakos Mouratidis, Dimitris Papadias, Spiridon Bakiras, Yufei Tao

Research Collection School Of Computing and Information Systems

Assume a set of moving objects and a central server that monitors their positions over time, while processing continuous nearest neighbor queries from geographically distributed clients. In order to always report up-to-date results, the server could constantly obtain the most recent position of all objects. However, this naïve solution requires the transmission of a large number of rapid data streams corresponding to location updates. Intuitively, current information is necessary only for objects that may influence some query result (i.e., they may be included in the nearest neighbor set of some client). Motivated by this observation, we present a threshold-based algorithm …


Query Processing In Spatial Databases Containing Obstacles, Jun Zhang, Dimitris Papadias, Kyriakos Mouratidis, Manli Zhu Nov 2005

Query Processing In Spatial Databases Containing Obstacles, Jun Zhang, Dimitris Papadias, Kyriakos Mouratidis, Manli Zhu

Research Collection School Of Computing and Information Systems

Despite the existence of obstacles in many database applications, traditional spatial query processing assumes that points in space are directly reachable and utilizes the Euclidean distance metric. In this paper, we study spatial queries in the presence of obstacles, where the obstructed distance between two points is defined as the length of the shortest path that connects them without crossing any obstacles. We propose efficient algorithms for the most important query types, namely, range search, nearest neighbours, e-distance joins, closest pairs and distance semi-joins, assuming that both data objects and obstacles are indexed by R-trees. The effectiveness of the proposed …


Dsim: A Distance-Based Indexing Method For Genomic Sequences, Xia Cao, Beng-Chin Ooi, Hwee Hwa Pang, Kian-Lee Tan, Anthony K. H. Tung Oct 2005

Dsim: A Distance-Based Indexing Method For Genomic Sequences, Xia Cao, Beng-Chin Ooi, Hwee Hwa Pang, Kian-Lee Tan, Anthony K. H. Tung

Research Collection School Of Computing and Information Systems

In this paper, we propose a Distance-based Sequence Indexing Method (DSIM) for indexing and searching genome databases. Borrowing the idea of video compression, we compress the genomic sequence database around a set of automatically selected reference words, formed from high-frequency data substrings and substrings in past queries. The compression captures the distance of each non-reference word in the database to some reference word. At runtime, a query is processed by comparing its substrings with the compressed data strings, through their distances to the reference words. We also propose an efficient scheme to incrementally update the reference words and the compressed …


On Organizing And Accessing Geospatial And Georeferenced Web Resources Using The G-Portal System, Zehua Liu, Ee Peng Lim, Yin-Leng Theng, Dion Hoe-Lian Goh, Wee-Keong Ng Sep 2005

On Organizing And Accessing Geospatial And Georeferenced Web Resources Using The G-Portal System, Zehua Liu, Ee Peng Lim, Yin-Leng Theng, Dion Hoe-Lian Goh, Wee-Keong Ng

Research Collection School Of Computing and Information Systems

In order to organise and manage geospatial and georeferenced information on the Web making them convenient for searching and browsing, a digital portal known as G-Portal has been designed and implemented. Compared to other digital libraries, G-Portal is unique for several of its features. It maintains metadata resources in XML with flexible resource schemas. Logical groupings of metadata resources as projects and layers are possible to allow the entire metadata collection to be partitioned differently for users with different information needs. These metadata resources can be displayed in both the classification-based and map-based interfaces provided by G-Portal. G-Portal further incorporates …


Managing Geography Learning Objects Using Personalized Project Spaces In G-Portal, Dion Hoe-Lian Goh, Aixin Sun, Wenbo Zong, Dan Wu, Ee Peng Lim, Yin-Leng Theng, John Hedberg, Chew-Hung Chang Sep 2005

Managing Geography Learning Objects Using Personalized Project Spaces In G-Portal, Dion Hoe-Lian Goh, Aixin Sun, Wenbo Zong, Dan Wu, Ee Peng Lim, Yin-Leng Theng, John Hedberg, Chew-Hung Chang

Research Collection School Of Computing and Information Systems

The personalized project space is an important feature in G-Portal that supports individual and group learning activities. Within such a space, its owner can create, delete, and organize metadata referencing learning objects on the Web. Browsing and querying are among the functions provided to access the metadata. In addition, new schemas can be added to accommodate metadata of diverse attribute sets. Users can also easily share metadata across different projects using a “copy-and-paste” approach. Finally, a viewer to support offline viewing of personalized project content is also provided.


Medoid Queries In Large Spatial Databases, Kyriakos Mouratidis, Dimitris Papadias, Spiros Papadimitriou Aug 2005

Medoid Queries In Large Spatial Databases, Kyriakos Mouratidis, Dimitris Papadias, Spiros Papadimitriou

Research Collection School Of Computing and Information Systems

Assume that a franchise plans to open k branches in a city, so that the average distance from each residential block to the closest branch is minimized. This is an instance of the k-medoids problem, where residential blocks constitute the input dataset and the k branch locations correspond to the medoids. Since the problem is NP-hard, research has focused on approximate solutions. Despite an avalanche of methods for small and moderate size datasets, currently there exists no technique applicable to very large databases. In this paper, we provide efficient algorithms that utilize an existing data-partition index to achieve low CPU …


Translation Initiation Sites Prediction With Mixture Gaussian Models In Human Cdna Sequences, G. Li, Tze-Yun Leong, Louxin Zhang Aug 2005

Translation Initiation Sites Prediction With Mixture Gaussian Models In Human Cdna Sequences, G. Li, Tze-Yun Leong, Louxin Zhang

Research Collection School Of Computing and Information Systems

Translation initiation sites (TISs) are important signals in cDNA sequences. Many research efforts have tried to predict TISs in cDNA sequences. In this paper, we propose to use mixture Gaussian models for TIS prediction. Using both local features and some features generated from global measures, the proposed method predicts TISs with a sensitivity of 98 percent and a specificity of 93.6 percent. Our method outperforms many other existing methods in sensitivity while keeping specificity high. We attribute the improvement in sensitivity to the nature of the global features and the mixture Gaussian models. © 2005 IEEE.


Implications Of Spatial Autocorrelation And Dispersal For The Modeling Of Species Distributions, Volker Bahn Aug 2005

Implications Of Spatial Autocorrelation And Dispersal For The Modeling Of Species Distributions, Volker Bahn

Electronic Theses and Dissertations

Modeling the geographical distributions of wildlife species is important for ecology and conservation biology. Spatial autocorrelation in species distributions poses a problem for distribution modeling because it invalidates the assumption of independence among sample locations. I explored the prevalence and causes of spatial autocorrelation in data from the Breeding Bird Survey, covering the conterminous United States, using Regression Trees, Conditional Autoregressive Regressions (CAR), and the partitioning of variance. I also constructed a simulation model to investigate dispersal as a process contributing to spatial autocorrelation, and attempted to verify the connection between dispersal and spatial autocorrelation in species' distributions in empirical …


Constrained Shortest Path Computation, Manolis Terrovitis, Spiridon Bakiras, Dimitris Papadias, Kyriakos Mouratidis Aug 2005

Constrained Shortest Path Computation, Manolis Terrovitis, Spiridon Bakiras, Dimitris Papadias, Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

This paper proposes and solves a-autonomy and k-stops shortest path problems in large spatial databases. Given a source s and a destination d, an aautonomy query retrieves a sequence of data points connecting s and d, such that the distance between any two consecutive points in the path is not greater than a. A k-stops query retrieves a sequence that contains exactly k intermediate data points. In both cases our aim is to compute the shortest path subject to these constraints. Assuming that the dataset is indexed by a data-partitioning method, the proposed techniques initially compute a sub-optimal path by …


Wmxml: A System For Watermarking Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan, Dhruv Mangla Aug 2005

Wmxml: A System For Watermarking Xml Data, Xuan Zhou, Hwee Hwa Pang, Kian-Lee Tan, Dhruv Mangla

Research Collection School Of Computing and Information Systems

As increasing amount of data is published in the form of XML, copyright protection of XML data is becoming an important requirement for many applications. While digital watermarking is a widely used measure to protect digital data from copyright offences, the complex and flexible construction of XML data poses a number of challenges to digital watermarking, such as re-organization and alteration attacks. To overcome these challenges, the watermarking scheme has to be based on the usability of data and the underlying semantics like key attributes and functional dependencies. In this paper, we describe WmXML, a system for watermarking XML documents. …


Web Mining - The Ontology Approach, Ee Peng Lim, Aixin Sun Aug 2005

Web Mining - The Ontology Approach, Ee Peng Lim, Aixin Sun

Research Collection School Of Computing and Information Systems

The World Wide Web today provides users access to extremely large number of Web sites many of which contain information of education and commercial values. Due to the unstructured and semi-structured nature of Web pages and the design idiosyncrasy of Web sites, it is a challenging task to develop digital libraries for organizing and managing digital content from the Web. Web mining research, in its last 10 years, has on the other hand made significant progress in categorizing and extracting content from the Web. In this paper, we represent ontology as a set of concepts and their inter-relationships relevant to …


Geogdl: A Web-Based Approach To Geography Examination, Ee Peng Lim, Dion Hoe-Lian Goh, Yin-Leng Theng Aug 2005

Geogdl: A Web-Based Approach To Geography Examination, Ee Peng Lim, Dion Hoe-Lian Goh, Yin-Leng Theng

Research Collection School Of Computing and Information Systems

The traditional educational approach with students as passive recipients has been the subject of criticism. A constructivist learner-centered approach towards education has been argued to produce greater internalization and application of knowledge compared to the traditional teacher-centered, transmission-oriented approach. Nevertheless, contemporary instructional design models argue for the use and integration of both approaches especially in complex learning tasks. This paper describes GeogDL, a Web-based application developed above a digital library of geographical resources for Singapore students preparing to take a national examination in geography. GeogDL is unique in that it not only provides an environment for active learning, it also …


Social Network Discovery By Mining Spatio-Temporal Events, Hady Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan Jul 2005

Social Network Discovery By Mining Spatio-Temporal Events, Hady Lauw, Ee Peng Lim, Hwee Hwa Pang, Teck-Tim Tan

Research Collection School Of Computing and Information Systems

Knowing patterns of relationship in a social network is very useful for law enforcement agencies to investigate collaborations among criminals, for businesses to exploit relationships to sell products, or for individuals who wish to network with others. After all, it is not just what you know, but also whom you know, that matters. However, finding out who is related to whom on a large scale is a complex problem. Asking every single individual would be impractical, given the huge number of individuals and the changing dynamics of relationships. Recent advancement in technology has allowed more data about activities of individuals …


Detecting Malicious Vbscripts Using Anomaly Host Based Ids Based On Principal Component Analysis (Pca), Racha El Sokkary Jun 2005

Detecting Malicious Vbscripts Using Anomaly Host Based Ids Based On Principal Component Analysis (Pca), Racha El Sokkary

Archived Theses and Dissertations

Intrusion detection research over the last twenty years has focused on the threat of individuals illegally hacking into systems. Nowadays, intrusion threat to computer systems has changed radically. Instead of dealing with hackers, most current works focus on defending the system against code-driven attacks. Today’s web script codes such as VBScript are receiving increasing focus as a backdoor for attacking many computers through e-mail attachments or infected web sites. The nature of these malicious codes is that they can spread widely causing serious damages to many applications. Moreover, the majority of anti-virus tools used today are able to detect known …


Verifying Completeness Of Relational Query Results In Data Publishing, Hwee Hwa Pang, Arpit Jain, Krithi Ramamritham, Kian-Lee Tan Jun 2005

Verifying Completeness Of Relational Query Results In Data Publishing, Hwee Hwa Pang, Arpit Jain, Krithi Ramamritham, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the publisher may be untrusted or susceptible to attacks, it could produce incorrect query results. In this paper, we introduce a scheme for users to verify that their query results are complete (i.e., no qualifying tuples are omitted) and authentic (i.e., all the result values originated from the owner). The scheme supports range selection on key and non-key attributes, project as well as join queries on relational databases. Moreover, the proposed scheme complies with access control policies, is computationally secure, and can be …


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jun 2005

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Computer Science Faculty Publications

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


Conceptual Partitioning: An Efficient Method For Continuous Nearest Neighbor Monitoring, Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris Papadias Jun 2005

Conceptual Partitioning: An Efficient Method For Continuous Nearest Neighbor Monitoring, Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris Papadias

Research Collection School Of Computing and Information Systems

Given a set of objects P and a query point q, a k nearest neighbor (k-NN) query retrieves the k objects in P that lie closest to q. Even though the problem is well-studied for static datasets, the traditional methods do not extend to highly dynamic environments where multiple continuous queries require real-time results, and both objects and queries receive frequent location updates. In this paper we propose conceptual partitioning (CPM), a comprehensive technique for the efficient monitoring of continuous NN queries. CPM achieves low running time by handling location updates only from objects that fall in the vicinity of …


Aggregate Nearest Neighbor Queries In Spatial Databases, Dimitris Papadias, Yufei Tao, Kyriakos Mouratidis, Chun Kit Hui Jun 2005

Aggregate Nearest Neighbor Queries In Spatial Databases, Dimitris Papadias, Yufei Tao, Kyriakos Mouratidis, Chun Kit Hui

Research Collection School Of Computing and Information Systems

Given two spatial datasets P (e.g., facilities) and Q (queries), an aggregate nearest neighbor (ANN) query retrieves the point(s) of P with the smallest aggregate distance(s) to points in Q. Assuming, for example, n users at locations q1,...qn, an ANN query outputs the facility p belongs to P that minimizes the sum of distances |pqi| for 1 is less than or equal to i is less than or equal to n that the users have to travel in order to meet there. Similarly, another ANN query may report the point p belongs to P that minimizes the maximum distance that …


Dsi: A Fully Distributed Spatial Index For Wireless Data Broadcast, Wang-Chien Lee, Baihua Zheng Jun 2005

Dsi: A Fully Distributed Spatial Index For Wireless Data Broadcast, Wang-Chien Lee, Baihua Zheng

Research Collection School Of Computing and Information Systems

Recent announcement of the MSN Direct Service has demonstrated the feasibility and industrial interest in utilizing wireless broadcast for pervasive information services. To support location-based services in wireless data broadcast systems, a distributed spatial index (called DSI) is proposed in this paper. DSI is highly efficient because it has a linear yet fully distributed structure that facilitates multiple search paths to be naturally mixed together by sharing links. Moreover, DSI is very resilient in error-prone wireless communication environments. Search algorithms for two classical location-based queries, window queries and kNN queries, based on DSI are presented. Performance evaluation of DSI shows …


Explicit Building Block Multiobjective Evolutionary Computation: Methods And Applications, Richard O. Day Jun 2005

Explicit Building Block Multiobjective Evolutionary Computation: Methods And Applications, Richard O. Day

Theses and Dissertations

This dissertation presents principles, techniques, and performance of evolutionary computation optimization methods. Concentration is on concepts, design formulation, and prescription for multiobjective problem solving and explicit building block (BB) multiobjective evolutionary algorithms (MOEAs). Current state-of-the-art explicit BB MOEAs are addressed in the innovative design, execution, and testing of a new multiobjective explicit BB MOEA. Evolutionary computation concepts examined are algorithm convergence, population diversity and sizing, genotype and phenotype partitioning, archiving, BB concepts, parallel evolutionary algorithm (EA) models, robustness, visualization of evolutionary process, and performance in terms of effectiveness and efficiency. The main result of this research is the development of …


Information Dissemination Via Wireless Broadcast, Baihua Zheng, Dik Lun Lee May 2005

Information Dissemination Via Wireless Broadcast, Baihua Zheng, Dik Lun Lee

Research Collection School Of Computing and Information Systems

Unrestricted mobility adds a new dimension to data access methodology--- one that must be addressed before true ubiquity can be realized.


Dynamically Optimized Context In Recommender Systems, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang May 2005

Dynamically Optimized Context In Recommender Systems, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

Traditional approaches to recommender systems have not taken into account situational information when making recommendations, and this seriously limits the relevance of the results. This paper advocates context-awareness as a promising approach to enhance the performance of recommenders, and introduces a mechanism to realize this approach. We present a framework that separates the contextual concerns from the actual recommendation module, so that contexts can be readily shared across applications. More importantly, we devise a learning algorithm to dynamically identify the optimal set of contexts for a specific recommendation task and user. An extensive series of experiments has validated that our …


Event-Driven Document Selection For Terrorism, Zhen Sun, Ee Peng Lim, Kuiyu Chang, Teng-Kwee Ong, Rohan Kumar Gunaratna May 2005

Event-Driven Document Selection For Terrorism, Zhen Sun, Ee Peng Lim, Kuiyu Chang, Teng-Kwee Ong, Rohan Kumar Gunaratna

Research Collection School Of Computing and Information Systems

In this paper, we examine the task of extracting information about terrorism related events hidden in a large document collection. The task assumes that a terrorism related event can be described by a set of entity and relation instances. To reduce the amount of time and efforts in extracting these event related instances, one should ideally perform the task on the relevant documents only. We have therefore proposed some document selection strategies based on information extraction (IE) patterns. Each strategy attempts to select one document at a time such that the gain of event related instance information is maximized. Our …


Tosa: A Near-Optimal Scheduling Algorithm For Multi-Channel Data Broadcast, Baihua Zheng, Xia Xu, Xing Jin, Dik Lun Lee May 2005

Tosa: A Near-Optimal Scheduling Algorithm For Multi-Channel Data Broadcast, Baihua Zheng, Xia Xu, Xing Jin, Dik Lun Lee

Research Collection School Of Computing and Information Systems

Wireless broadcast is very suitable for delivering information to a large user population. In this paper, we concentrate on data allocation methods for multiple broadcast channels. To the best of our knowledge, this is the first allocation model that takes into the consideration of items' access frequencies, items' lengths. and bandwidth of different channels. We first derive the optimal average expected delay for multiple channels for the general case where data access frequencies, data sizes, and channel bandwidths can all be non-uniform. Second, we develop TOSA, a multi-channel allocation method that does not assume a uniform broadcast schedule for data …


Mining Social Network From Spatio-Temporal Events, Hady Wirawan Lauw, Ee Peng Lim, Teck Tim Tan, Hwee Hwa Pang Apr 2005

Mining Social Network From Spatio-Temporal Events, Hady Wirawan Lauw, Ee Peng Lim, Teck Tim Tan, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

Knowing patterns of relationship in a social network is very useful for law enforcement agencies to investigate collaborations among criminals, for businesses to exploit relationships to sell products, or for individuals who wish to network with others. After all, it is not just what you know, but also whom you know, that matters. However, finding out who is related to whom on a large scale is a complex problem. Asking every single individual would be impractical, given the huge number of individuals and the changing dynamics of relationships. Recent advancement in technology has allowed more data about activities of individuals …


Proactive Caching For Spatial Queries In Mobile Environments, Haibo Hu, Jianliang Xu, Wing Sing Wong, Baihua Zheng, Dik Lun Lee, Wang-Chien Lee Apr 2005

Proactive Caching For Spatial Queries In Mobile Environments, Haibo Hu, Jianliang Xu, Wing Sing Wong, Baihua Zheng, Dik Lun Lee, Wang-Chien Lee

Research Collection School Of Computing and Information Systems

Semantic caching enables mobile clients to answer spatial queries locally by storing the query descriptions together with the results. However, it supports only a limited number of query types, and sharing results among these types is difficult. To address these issues, we propose a proactive caching model which caches the result objects as well as the index that supports these objects as the results. The cached index enables the objects to be reused for all common types of queries. We also propose an adaptive scheme to cache such an index, which further optimizes the query response time for the best …


Scheduling Queries To Improve The Freshness Of A Website, Haifeng Liu, Wee-Keong Ng, Ee Peng Lim Mar 2005

Scheduling Queries To Improve The Freshness Of A Website, Haifeng Liu, Wee-Keong Ng, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The World Wide Web is a new advertising medium that corporations use to increase their exposure to consumers. Very large websites whose content is derived from a source database need to maintain a freshness that reflects changes that are made to the base data. This issue is particularly significant for websites that present fast-changing information such as stock-exchange information and product information. In this article, we formally define and study the freshness of a website that is refreshed by a scheduled set of queries that fetch fresh data from the databases. We propose several online-scheduling algorithms and compare the performance …


The Good, Bad And The Indifferent: Explorations In Recommender System Health, Benjamin J. Keller, Sun-Mi Kim, N. Srinivas Vemuri, Naren Ramakrishnan, Saverio Perugini Jan 2005

The Good, Bad And The Indifferent: Explorations In Recommender System Health, Benjamin J. Keller, Sun-Mi Kim, N. Srinivas Vemuri, Naren Ramakrishnan, Saverio Perugini

Computer Science Faculty Publications

Our work is based on the premise that analysis of the connections exploited by a recommender algorithm can provide insight into the algorithm that could be useful to predict its performance in a fielded system. We use the jumping connections model defined by Mirza et al. [6], which describes the recommendation process in terms of graphs. Here we discuss our work that has come out of trying to understand algorithm behavior in terms of these graphs. We start by describing a natural extension of the jumping connections model of Mirza et al., and then discuss observations that have come from …