Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

The Impact Of Automated Feature Selection Techniques On The Interpretation Of Defect Models, Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude Sep 2020

The Impact Of Automated Feature Selection Techniques On The Interpretation Of Defect Models, Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude

Research Collection School Of Computing and Information Systems

The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models may be inconsistent if the selected metrics are inconsistent and correlated. In this paper, we systematically investigate 12 automated feature selection techniques with respect to the consistency, correlation, performance, computational cost, and the impact on the interpretation dimensions. Through an empirical investigation of 14 publicly-available defect datasets, we find that (1) 94–100% of the selected …


Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin Jan 2020

Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin

Research Collection School Of Computing and Information Systems

Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment …


Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang Aug 2019

Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang

Research Collection School Of Computing and Information Systems

The current two-step clustering methods separately learn the similarity matrix and conduct k means clustering. Moreover, the similarity matrix is learnt from the original data, which usually contain noise. As a consequence, these clustering methods cannot achieve good clustering results. To address these issues, this paper proposes a new graph clustering methods (namely Low-rank Sparse Subspace clustering (LSS)) to simultaneously learn the similarity matrix and conduct the clustering from the low-dimensional feature space of the original data. Specifically, the proposed LSS integrates the learning of similarity matrix of the original feature space, the learning of similarity matrix of the low-dimensional …


Compressive Representation For Device-Free Activity Recognition With Passive Rfid Signal Strength, Lina Yao, Quan Z. Sheng, Xue Li, Tao Gu, Mingkui Tan, Xianzhi Wang, Sen Wang, Wenjie Ruan Feb 2018

Compressive Representation For Device-Free Activity Recognition With Passive Rfid Signal Strength, Lina Yao, Quan Z. Sheng, Xue Li, Tao Gu, Mingkui Tan, Xianzhi Wang, Sen Wang, Wenjie Ruan

Research Collection School Of Computing and Information Systems

Understanding and recognizing human activities is a fundamental research topic for a wide range of important applications such as fall detection and remote health monitoring and intervention. Despite active research in human activity recognition over the past years, existing approaches based on computer vision or wearable sensor technologies present several significant issues such as privacy (e.g., using video camera to monitor the elderly at home) and practicality (e.g., not possible for an older person with dementia to remember wearing devices). In this paper, we present a low-cost, unobtrusive, and robust system that supports independent living of older people. The system …


Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu Aug 2017

Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu

Research Collection School Of Computing and Information Systems

Feature selection (FS) is an important technique in machine learning and data mining, especially for large scale high-dimensional data. Most existing studies have been restricted to batch learning, which is often inefficient and poorly scalable when handling big data in real world. As real data may arrive sequentially and continuously, batch learning has to retrain the model for the new coming data, which is very computationally intensive. Online feature selection (OFS) is a promising new paradigm that is more efficient and scalable than batch learning algorithms. However, existing online algorithms usually fall short in their inferior efficacy. In this article, …


Video Event Detection Using Motion Relativity And Feature Selection, Feng Wang, Zhanhu Sun, Yu-Gang Jiang, Chong-Wah Ngo Aug 2014

Video Event Detection Using Motion Relativity And Feature Selection, Feng Wang, Zhanhu Sun, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Event detection plays an essential role in video content analysis. In this paper, we present our approach based on motion relativity and feature selection for video event detection. First, we propose a new motion feature, namely Expanded Relative Motion Histogram of Bag-of-Visual-Words (ERMH-BoW) to employ motion relativity for event detection. In ERMH-BoW, by representing what aspect of an event with Bag-of-Visual-Words (BoW), we construct relative motion histograms between different visual words to depict the objects' activities or how aspect of the event. ERMH-BoW thus integrates both what and how aspects for a complete event description. Meanwhile, we show that by …


Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin Mar 2014

Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in …


An Investigation Of Decision Analytic Methodologies For Stress Identification, Yong Deng, Chao-Hsien Chu, Huayou Si, Qixun Zhang, Zhonghai Wu Sep 2013

An Investigation Of Decision Analytic Methodologies For Stress Identification, Yong Deng, Chao-Hsien Chu, Huayou Si, Qixun Zhang, Zhonghai Wu

Research Collection School Of Computing and Information Systems

In modern society, more and more people are suffering from some type of stress. Monitoring and timely detecting of stress level will be very valuable for the person to take counter measures. In this paper, we investigate the use of decision analytics methodologies to detect stress. We present a new feature selection method based on the principal component analysis (PCA), compare three feature selection methods, and evaluate five information fusion methods for stress detection. A driving stress data set created by the MIT Media lab is used to evaluate the relative performance of these methods. Our study show that the …


Predictive Neural Networks For Gene Expression Data Analysis, Ah-Hwee Tan, Hong Pan Apr 2005

Predictive Neural Networks For Gene Expression Data Analysis, Ah-Hwee Tan, Hong Pan

Research Collection School Of Computing and Information Systems

Gene expression data generated by DNA microarray experiments have provided a vast resource for medical diagnosis and disease understanding. Most prior work in analyzing gene expression data, however, focuses on predictive performance but not so much on deriving human understandable knowledge. This paper presents a systematic approach for learning and extracting rule-based knowledge from gene expression data. A class of predictive self-organizing networks known as Adaptive Resonance Associative Map (ARAM) is used for modelling gene expression data, whose learned knowledge can be transformed into a set of symbolic IF-THEN rules for interpretation. For dimensionality reduction, we illustrate how the system …