Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2019

Feature selection

Discipline
Institution
Publication
Publication Type

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Noise Clipping Algorithm Based On Relative Contribution Rate, Shuoyu Liu, Yueming Dai Dec 2019

Noise Clipping Algorithm Based On Relative Contribution Rate, Shuoyu Liu, Yueming Dai

Journal of System Simulation

Abstract: This paper presents a class noise cutting algorithm (Class noise cutting, CNC) based on relative contribution rate. The algorithm calculates the relative contribution rate of features to the theme. The most valuable feature set is selected by using features distinguish rating. The corresponding candidate categories for each feature are selected, to reduece the candidate category set, improves the classification accuracy, and speed up the response speed of the classifier. Compared with another ECN noise cutting algorithm (Eliminating the class whose), CNC-has higher accuracy and because of its simpler feature dimension dictionary and better candidate category set, the response …


Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang Aug 2019

Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang

Research Collection School Of Computing and Information Systems

The current two-step clustering methods separately learn the similarity matrix and conduct k means clustering. Moreover, the similarity matrix is learnt from the original data, which usually contain noise. As a consequence, these clustering methods cannot achieve good clustering results. To address these issues, this paper proposes a new graph clustering methods (namely Low-rank Sparse Subspace clustering (LSS)) to simultaneously learn the similarity matrix and conduct the clustering from the low-dimensional feature space of the original data. Specifically, the proposed LSS integrates the learning of similarity matrix of the original feature space, the learning of similarity matrix of the low-dimensional …


Sensor - Based Human Activity Recognition Using Smartphones, Mustafa Badshah May 2019

Sensor - Based Human Activity Recognition Using Smartphones, Mustafa Badshah

Master's Projects

It is a significant technical and computational task to provide precise information regarding the activity performed by a human and find patterns of their behavior. Countless applications can be molded and various problems in domains of virtual reality, health and medical, entertainment and security can be solved with advancements in human activity recognition (HAR) systems. HAR is an active field for research for more than a decade, but certain aspects need to be addressed to improve the system and revolutionize the way humans interact with smartphones. This research provides a holistic view of human activity recognition system architecture and discusses …


Selecting Maximally-Predictive Deep Features To Explain What Drives Fixations In Free-Viewing, Matthias Kümmerer, Thomas S.A. Wallis, Matthias Bethge May 2019

Selecting Maximally-Predictive Deep Features To Explain What Drives Fixations In Free-Viewing, Matthias Kümmerer, Thomas S.A. Wallis, Matthias Bethge

MODVIS Workshop

No abstract provided.


Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang Apr 2019

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang

Biostatistics Faculty Publications

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …


Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang Mar 2019

Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang

Biostatistics Faculty Publications

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi Mar 2019

Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi

Dissertations

Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes …


Coal Mine Water Inrush Prediction Based On Lstm Neural Network, Dong Lili, Fei Cheng, Zhang Xiang, Cao Chaofan Feb 2019

Coal Mine Water Inrush Prediction Based On Lstm Neural Network, Dong Lili, Fei Cheng, Zhang Xiang, Cao Chaofan

Coal Geology & Exploration

According to the prediction of water inrush from coal seam floor, based on the summarization of existing water inrush prediction methods and theories, the feature selection experiment shows that water pressure, distance from the working surface, sandstone section thickness, coal thickness, coal seam inclination, fault throw, fissure zone, mining area, mining height and strike length are the main factors affecting the occurrence of water inrush. These factors are complex and non-linear. A water inrush prediction model based on long short-term memory(LSTM) neural network was proposed. The data of the coal mine water inrush case was used as sample data to …


An Investigation Into The Predictive Capability Of Customer Spending In Modelling Mortgage Default, Donal Finn [Thesis] Jan 2019

An Investigation Into The Predictive Capability Of Customer Spending In Modelling Mortgage Default, Donal Finn [Thesis]

Dissertations

The mortgage arrears crisis in Ireland was and is among the most severe experienced on record and although there has been a decreasing trend in the number of mortgages in default in the past four years, it still continues to cause distress to borrowers and vulnerabilities to lenders. There are indications that one of the main factors associated with mortgage default is loan affordability, of which the level of disposable income is a driver. Additionally, guidelines set out by the European Central Bank instructed financial institutions to adopt measures to further reduce and prevent loans defaulting, including the implementation and …


Improving Anomaly Detection In Bgp Time-Series Data By New Guide Features And Moderated Feature Selection Algorithm, Mahmoud Hashem, Ahmed Bashandy, Samir Shaheen Jan 2019

Improving Anomaly Detection In Bgp Time-Series Data By New Guide Features And Moderated Feature Selection Algorithm, Mahmoud Hashem, Ahmed Bashandy, Samir Shaheen

Turkish Journal of Electrical Engineering and Computer Sciences

The Internet infrastructure relies on the Border Gateway Protocol (BGP) to provide essential routing information where abnormal routing behavior impairs global Internet connectivity and stability. Hence, employing anomaly detection algorithms is important for improving the performance of BGP routing protocol. In this paper, we propose two algorithms; the first is the guide feature generator (GFG), which generates guide features from traditional features in BGP time-series data using moving regression in combination with smoothed moving average. The second is a modified random forest feature selection algorithm which is employed to automatically select the most dominant features (ASMDF). Our mechanism shows that …


Optimal Set Of Eeg Features In Infant Sleep Stage Classification, Maja Cic, Mario Milicevic, Igor Mazic Jan 2019

Optimal Set Of Eeg Features In Infant Sleep Stage Classification, Maja Cic, Mario Milicevic, Igor Mazic

Turkish Journal of Electrical Engineering and Computer Sciences

This paper evaluates six classification algorithms to assess the importance of individual EEG rhythms in the context of automatic classification of infant sleep. EEG features were obtained by Fourier transform and by a novel technique based on the empirical mode decomposition and generalized zero crossing method. Of six evaluated classification algorithms, the best classification results were obtained with the support vector machine for the combination of all presented features from four EEG channels. Three methods of attribute ranking were assessed: relief, principal component analysis, and wrapper-based optimized attribute weights. The outcomes revealed that the optimal selection of features requires one …


A Novel Hybrid Teaching-Learning-Based Optimization Algorithm For The Classification Of Data By Using Extreme Learning Machines, Ender Sevi̇nç, Tansel Dökeroğlu Jan 2019

A Novel Hybrid Teaching-Learning-Based Optimization Algorithm For The Classification Of Data By Using Extreme Learning Machines, Ender Sevi̇nç, Tansel Dökeroğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Data classification is the process of organizing data by relevant categories. In this way, the data can be understood and used more efficiently by scientists. Numerous studies have been proposed in the literature for the problem of data classification. However, with recently introduced metaheuristics, it has continued to be riveting to revisit this classical problem and investigate the efficiency of new techniques. Teaching-learning-based optimization (TLBO) is a recent metaheuristic that has been reported to be very effective for combinatorial optimization problems. In this study, we propose a novel hybrid TLBO algorithm with extreme learning machines (ELM) for the solution of …


Combined Feature Compression Encoding In Image Retrieval, Lu Huo, Leijie Zhang Jan 2019

Combined Feature Compression Encoding In Image Retrieval, Lu Huo, Leijie Zhang

Turkish Journal of Electrical Engineering and Computer Sciences

Recently, features extracted by convolutional neural networks (CNNs) are popularly used for image retrieval. In CNN representation, high-level features are usually chosen to represent the images in coarse-grained datasets, while mid-level features are successfully applied to describe the images for fine-grained datasets. In this paper, we combine these different levels of features as a joint feature to propose a robust representation that is suitable for both coarse-grained and fine-grained image retrieval datasets. In addition, in order to solve the problem that the efficiency of image retrieval is influenced by the dimensionality of indexing, a unified subspace learning model named spectral …


Toxicity Prediction Of Small Drug Molecules Of Aryl Hydrocarbon Receptor Using Aproposed Ensemble Model, Vishan Kumar Gupta, Prashant Singh Rana Jan 2019

Toxicity Prediction Of Small Drug Molecules Of Aryl Hydrocarbon Receptor Using Aproposed Ensemble Model, Vishan Kumar Gupta, Prashant Singh Rana

Turkish Journal of Electrical Engineering and Computer Sciences

Quantitative structure-activity relationships and quantitative structure?property relationships have proved their usefulness for predicting toxicities of drug molecules regarding their biological activities. In silico toxicity prediction techniques are essential for reducing testing on rodents (in vivo) and for a less time-consuming and more cost-efficient alternative for the identification of toxic effects at an early stage of drug development. The authors aim to build a prediction model for better assessment of toxicity to quickly and efficiently test whether certain chemical compounds have the potential to disrupt the processes in the human body that may adversely affect human health. Here, we have proposed …


A Hybrid Feature-Selection Approach For Finding The Digital Evidence Of Web Application Attacks, Mohammed Babiker, Eni̇s Karaarslan, Yaşar Hoşcan Jan 2019

A Hybrid Feature-Selection Approach For Finding The Digital Evidence Of Web Application Attacks, Mohammed Babiker, Eni̇s Karaarslan, Yaşar Hoşcan

Turkish Journal of Electrical Engineering and Computer Sciences

The most critical challenge of web attack forensic investigations is the sheer amount of data and level of complexity. Machine learning technology might be an efficient solution for web attack analysis and investigation. Consequently, machine learning applications have been applied in various areas of information security and digital forensics, and have improved over time. Moreover, feature selection is a crucial step in machine learning; in fact, selecting an optimal feature subset could enhance the accuracy and performance of the predictive model. To date, there has not been an adequate approach to select optimal features for the evidence of web attack. …


An Improved Tree Model Based On Ensemble Feature Selection For Classification, Chandralekha M, Shenbagavadivu N Jan 2019

An Improved Tree Model Based On Ensemble Feature Selection For Classification, Chandralekha M, Shenbagavadivu N

Turkish Journal of Electrical Engineering and Computer Sciences

Researchers train and build specific models to classify the presence and absence of a disease and the accuracy of such classification models is continuously improved. The process of building a model and training depends on the medical data utilized. Various machine learning techniques and tools are used to handle different data with respect to disease types and their clinical conditions. Classification is the most widely used technique to classify disease and the accuracy of the classifier largely depends on the attributes. The choice of the attribute largely affects the diagnosis and performance of the classifier. Due to growing large volumes …


Feature Set Selection For Improved Classification Of Static Analysis Alerts, Kathleen Goeschel Jan 2019

Feature Set Selection For Improved Classification Of Static Analysis Alerts, Kathleen Goeschel

CCE Theses and Dissertations

With the extreme growth in third party cloud applications, increased exposure of applications to the internet, and the impact of successful breaches, improving the security of software being produced is imperative. Static analysis tools can alert to quality and security vulnerabilities of an application; however, they present developers and analysts with a high rate of false positives and unactionable alerts. This problem may lead to the loss of confidence in the scanning tools, possibly resulting in the tools not being used. The discontinued use of these tools may increase the likelihood of insecure software being released into production. Insecure software …


Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez Jan 2019

Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez

Theses and Dissertations

This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …