Open Access. Powered by Scholars. Published by Universities.®

Science and Technology Studies Commons

Open Access. Powered by Scholars. Published by Universities.®

Kno.e.sis Publications

Data Mining

Articles 1 - 5 of 5

Full-Text Articles in Science and Technology Studies

Domain Specific Document Retrieval Framework For Real-Time Social Health Data, Swapnil Soni Jul 2015

Domain Specific Document Retrieval Framework For Real-Time Social Health Data, Swapnil Soni

Kno.e.sis Publications

With the advent of the web search and microblogging, the percentage of Online Health Information Seekers (OHIS) using these online services to share and seek health real-time information has in- creased exponentially. OHIS use web search engines or microblogging search services to seek out latest, relevant as well as reliable health in- formation. When OHIS turn to microblogging search services to search real-time content, trends and breaking news, etc. the search results are not promising. Two major challenges exist in the current microblogging search engines are keyword based techniques and results do not contain real-time information. To address these challenges ...


Mining Effective Multi-Segment Sliding Window For Pathogen Incidence Rate Prediction, Lei Duan, Changjie Tang, Xiasong Li, Guozhu Dong, Xianming Wang, Jie Zuo, Min Jiang, Zhongqi Li, Yongqing Zhang Sep 2013

Mining Effective Multi-Segment Sliding Window For Pathogen Incidence Rate Prediction, Lei Duan, Changjie Tang, Xiasong Li, Guozhu Dong, Xianming Wang, Jie Zuo, Min Jiang, Zhongqi Li, Yongqing Zhang

Kno.e.sis Publications

Pathogen incidence rate prediction, which can be considered as time series modeling, is an important task for infectious disease incidence rate prediction and for public health. This paper investigates the application of a genetic computation technique, namely GEP, for pathogen incidence rate prediction. To overcome the shortcomings of traditional sliding windows in GEP-based time series modeling, the paper introduces the problem of mining effective sliding window, for discovering optimal sliding windows for building accurate prediction models. To utilize the periodical characteristic of pathogen incidence rates, a multi-segment sliding window consisting of several segments from different periodical intervals is proposed and ...


Pattern Space Maintenance For Data Updates And Interactive Mining, Mengling Feng, Guozhu Dong, Jinyan Li, Yap-Peng Tan, Limsoon Wong Aug 2010

Pattern Space Maintenance For Data Updates And Interactive Mining, Mengling Feng, Guozhu Dong, Jinyan Li, Yap-Peng Tan, Limsoon Wong

Kno.e.sis Publications

This article addresses the incremental and decremental maintenance of the frequent pattern space. We conduct an in-depth investigation on how the frequent pattern space evolves under both incremental and decremental updates. Based on the evolution analysis, a new data structure, Generator-Enumeration Tree (GE-tree), is developed to facilitate the maintenance of the frequent pattern space. With the concept of GE-tree, we propose two novel algorithms, Pattern Space Maintainer+ (PSM+) and Pattern Space Maintainer− (PSM−), for the incremental and decremental maintenance of frequent patterns. Experimental results demonstrate that the proposed algorithms, on average, outperform the representative state-of-the-art methods by an order of ...


Efficient Computation Of Iceberg Cubes By Bounding Aggregate Functions, Xiuzhen Zhang, Pauline Lienhua Chou, Guozhu Dong Jul 2007

Efficient Computation Of Iceberg Cubes By Bounding Aggregate Functions, Xiuzhen Zhang, Pauline Lienhua Chou, Guozhu Dong

Kno.e.sis Publications

The iceberg cubing problem is to compute the multidimensional group-by partitions that satisfy given aggregation constraints. Pruning unproductive computation for iceberg cubing when nonantimonotone constraints are present is a great challenge because the aggregate functions do not increase or decrease monotonically along the subset relationship between partitions. In this paper, we propose a novel bound prune cubing (BP-Cubing) approach for iceberg cubing with nonantimonotone aggregation constraints. Given a cube over n dimensions, an aggregate for any group-by partition can be computed from aggregates for the most specific n--dimensional partitions (MSPs). The largest and smallest aggregate values computed this way become ...


Summarizing Data Sets For Classification, Christopher W. Kinzig, Krishnaprasad Thirunarayan, Gary B. Lamont, Robert E. Marmelstein Jun 2001

Summarizing Data Sets For Classification, Christopher W. Kinzig, Krishnaprasad Thirunarayan, Gary B. Lamont, Robert E. Marmelstein

Kno.e.sis Publications

This paper describes our approach and experiences with implementing a data mining system using genetic algorithms in C++. In contrast with earlier classification algorithms that tended to “tile” the data sets using some pre-specified “shapes”, the proposed system is based on Marmelstein’s work on determining natural boundaries for class homogeneous regions. These boundaries are further refined to construct a compact set of simple data mining rules for classification.