Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

PDF

Research Collection School Of Computing and Information Systems

2005

Classification

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa Dec 2005

Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa

Research Collection School Of Computing and Information Systems

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to …