Open Access. Powered by Scholars. Published by Universities.®
Databases and Information Systems Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Publication
- Publication Type
Articles 1 - 9 of 9
Full-Text Articles in Databases and Information Systems
Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng
Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng
Research Collection School Of Computing and Information Systems
An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results …
Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan
Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan
Research Collection School Of Computing and Information Systems
Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. Existing literature models RSS as submodular maximization to capture the "diminishing returns" property of representativeness, but often only has a single constraint, which limits its applications to many real-world problems. To capture the recency issue and support various constraints, we formulate dynamic RSS as maximizing submodular functions subject to general d -knapsack constraints (SMDK) over sliding windows. We propose a KnapWindow framework (KW) for SMDK. KW utilizes KnapStream (KS) for SMDK in append-only streams as a subroutine. It maintains a sequence of checkpoints and …
Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao
Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao
Theses
The problem of community structure identification has been an extensively investigated area for biology, physics, social sciences, and computer science in recent years for studying the properties of networks representing complex relationships. Most traditional methods, such as K-means and hierarchical clustering, are based on the assumption that communities have spherical configurations. Lately, Genetic Algorithms (GA) are being utilized for efficient community detection without imposing sphericity. GAs are machine learning methods which mimic natural selection and scale with the complexity of the network. However, traditional GA approaches employ a representation method that dramatically increases the solution space to be searched by …
Data Mining By Grid Computing In The Search For Extrasolar Planets, Oisin Creaner [Thesis]
Data Mining By Grid Computing In The Search For Extrasolar Planets, Oisin Creaner [Thesis]
Doctoral
A system is presented here to provide improved precision in ensemble differential photometry. This is achieved by using the power of grid computing to analyse astronomical catalogues. This produces new catalogues of optimised pointings for each star, which maximise the number and quality of reference stars available. Astronomical phenomena such as exoplanet transits and small-scale structure within quasars may be observed by means of millimagnitude photometric variability on the timescale of minutes to hours. Because of atmospheric distortion, ground-based observations of these phenomena require the use of differential photometry whereby the target is compared with one or more reference stars. …
Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley
Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley
Computer Science Summer Fellows
Cyberbullying is a type of bullying that uses technology such as cell phones to harass or malign another person. To detect acts of cyberbullying, we are developing an algorithm that will detect cyberbullying in SMS (text) messages. Over 80,000 text messages have been collected by software installed on cell phones carried by participants in our study. This paper describes the development of the algorithm to detect cyberbullying messages, using the cell phone data collected previously. The algorithm works by first separating the messages into conversations in an automated way. The algorithm then analyzes the conversations and scores the severity and …
Artificial Intelligence - I: Adaptive Automated Teller Machines - Part Ii, Ghulam Mujtaba, Tariq Mahmood
Artificial Intelligence - I: Adaptive Automated Teller Machines - Part Ii, Ghulam Mujtaba, Tariq Mahmood
International Conference on Information and Communication Technologies
Nowadays, the banking sector is increasingly relying on Automated Teller Machines (ATMs) in order to provide services to its customers. Although thousands of ATMs exist across many banks and different locations, the GUI and content of a typical ATM interface remains, more or less, the same. For instance, any ATM provides typical options for withdrawal, electronic funds transfer, viewing of mini-statements etc. However, such a static interface might not be suitable for all ATM customers, e.g., some users might not prefer to view all the options when they access the ATM, or to view specific withdrawal amounts less than, say, …
Directed Extended Dependency Analysis For Data Mining, Thaddeus T. Shannon, Martin Zwick
Directed Extended Dependency Analysis For Data Mining, Thaddeus T. Shannon, Martin Zwick
Systems Science Faculty Publications and Presentations
Extended dependency analysis (EDA) is a heuristic search technique for finding significant relationships between nominal variables in large data sets. The directed version of EDA searches for maximally predictive sets of independent variables with respect to a target dependent variable. The original implementation of EDA was an extension of reconstructability analysis. Our new implementation adds a variety of statistical significance tests at each decision point that allow the user to tailor the algorithm to a particular objective. It also utilizes data structures appropriate for the sparse data sets customary in contemporary data mining problems. Two examples that illustrate different approaches …
An Overview Of Reconstructability Analysis, Martin Zwick
An Overview Of Reconstructability Analysis, Martin Zwick
Systems Science Faculty Publications and Presentations
This paper is an overview of reconstructability analysis (RA), a discrete multivariate modeling methodology developed in the systems literature; an earlier version of this tutorial is Zwick (2001). RA was derived from Ashby (1964), and was developed by Broekstra, Cavallo, Cellier Conant, Jones, Klir, Krippendorff, and others (Klir, 1986, 1996). RA resembles and partially overlaps log‐line (LL) statistical methods used in the social sciences (Bishop et al., 1978; Knoke and Burke, 1980). RA also resembles and overlaps methods used in logic design and machine learning (LDL) in electrical and computer engineering (e.g. Perkowski et al., 1997). Applications of RA, like …
Reconstructability Analysis With Fourier Transforms, Martin Zwick
Reconstructability Analysis With Fourier Transforms, Martin Zwick
Systems Science Faculty Publications and Presentations
Fourier methods used in two‐ and three‐dimensional image reconstruction can be used also in reconstructability analysis (RA). These methods maximize a variance‐type measure instead of information‐theoretic uncertainty, but the two measures are roughly collinear and the Fourier approach yields results close to that of standard RA. The Fourier method, however, does not require iterative calculations for models with loops. Moreover, the error in Fourier RA models can be assessed without actually generating the full probability distributions of the models; calculations scale with the size of the data rather than the state space. State‐based modeling using the Fourier approach is also …