Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Feature selection

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 31 - 60 of 132

Full-Text Articles in Physical Sciences and Mathematics

Comparative Study Of Machine Learning Models On Solar Flare Prediction Problem, Nikhil Sai Kurivella Aug 2021

Comparative Study Of Machine Learning Models On Solar Flare Prediction Problem, Nikhil Sai Kurivella

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Solar flare events are explosions of energy and radiation from the Sun’s surface. These events occur due to the tangling and twisting of magnetic fields associated with sunspots. When Coronal Mass ejections accompany solar flares, solar storms could travel towards earth at very high speeds, disrupting all earthly technologies and posing radiation hazards to astronauts. For this reason, the prediction of solar flares has become a crucial aspect of forecasting space weather. Our thesis utilized the time-series data consisting of active solar region magnetic field parameters acquired from SDO that span more than eight years. The classification models take AR …


Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian Jun 2021

Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian

USF Tampa Graduate Theses and Dissertations

With the proliferation of smart, handheld devices, there has been a multifold increase in the ability of firms to target and engage with customers through mobile advertising. Therefore, not surprisingly, mobile advertising campaigns have become an integral aspect of firms’ brand building activities, such as improving the awareness and overall visibility of firms' brands. In addition, retailers are increasingly using mobile advertising for targeted promotional activities that increase in-store visits and eventual sales conversions. However, in recent years, mobile or in general online advertising campaigns have been facing one major challenge and one major threat that can negatively impact the …


Decomposition Furnace Outlet Temperature Prediction Based On Elasticnet And Lstm, Guangyu Yu, Xueping Dong, Xiangmin Wang, Gan Min Jun 2021

Decomposition Furnace Outlet Temperature Prediction Based On Elasticnet And Lstm, Guangyu Yu, Xueping Dong, Xiangmin Wang, Gan Min

Journal of System Simulation

Abstract: The outlet temperature of the decomposition furnace is a key indicator in the cement production process. Aiming at the problem that traditional prediction methods only consider the influence of wind, coal, and materials, a temperature prediction model of ElasticNet combined with Long Short-Term Memory (LSTM) neural network is proposed. The ElasticNet-LSTM export temperature prediction model is constructed by using the ElasticNet method to estimate the parameters of different variables, fully considering the influencing factors and realizing the variable screening, and analyzing the influence of the number of hidden layers and nodes on the accuracy of the neural network. Simulation …


Vif-Regression Screening Ultrahigh Dimensional Feature Space, Hassan S. Uraibi Jun 2021

Vif-Regression Screening Ultrahigh Dimensional Feature Space, Hassan S. Uraibi

Journal of Modern Applied Statistical Methods

Iterative Sure Independent Screening (ISIS) was proposed for the problem of variable selection with ultrahigh dimensional feature space. Unfortunately, the ISIS method transforms the dimensionality of features from ultrahigh to ultra-low and may result in un-reliable inference when the number of important variables particularly is greater than the screening threshold. The proposed method has transformed the ultrahigh dimensionality of features to high dimension space in order to remedy of losing some information by ISIS method. The proposed method is compared with ISIS method by using real data and simulation. The results show this method is more efficient and more reliable …


Gene Expression Data Classification Using Genetic Algorithm-Basedfeature Selection, Öznur Si̇nem Sönmez, Mustafa Dağteki̇n, Tolga Ensari̇ Jan 2021

Gene Expression Data Classification Using Genetic Algorithm-Basedfeature Selection, Öznur Si̇nem Sönmez, Mustafa Dağteki̇n, Tolga Ensari̇

Turkish Journal of Electrical Engineering and Computer Sciences

In this study, hybrid methods are proposed for feature selection and classification of gene expression datasets. In the proposed genetic algorithm/support vector machine (GA-SVM) and genetic algorithm/k nearest neighbor (GA-KNN) hybrid methods, genetic algorithm is improved using Pearson's correlation coefficient, Relief-F, or mutual information. Crossover and selection operations of the genetic algorithm are specialized. Eight different gene expression datasets are used for classification process. The classification performances of the proposed methods are compared with the traditional GA-KNN and GA-SVM wrapper methods and other studies in the literature. Classification results demonstrate that higher accuracy rates are obtained with the proposed methods …


Feature Selection On Permissions, Intents And Apis For Android Malware Detection, Fred Guyton Jan 2021

Feature Selection On Permissions, Intents And Apis For Android Malware Detection, Fred Guyton

CCE Theses and Dissertations

Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is …


Infrequent Pattern Detection For Reliable Network Traffic Analysis Using Robust Evolutionary Computation, A. N. M. Bazlur Rashid, Mohiuddin Ahmed, Al-Sakib K. Pathan Jan 2021

Infrequent Pattern Detection For Reliable Network Traffic Analysis Using Robust Evolutionary Computation, A. N. M. Bazlur Rashid, Mohiuddin Ahmed, Al-Sakib K. Pathan

Research outputs 2014 to 2021

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have …


Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi Jan 2021

Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi

Theses and Dissertations (Comprehensive)

This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …


Automation Of Feature Selection And Generation Of Optimal Feature Subsets For Beehive Audio Sample Classification, Aditya Bhouraskar Dec 2020

Automation Of Feature Selection And Generation Of Optimal Feature Subsets For Beehive Audio Sample Classification, Aditya Bhouraskar

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The last couple of decades have witnessed an abnormal phenomenon of reduction in the bee population, this is a serious matter of concern as three out of four crops available globally have honey bee as their sole pollinator causing significant economic losses and an unbalance in the ecosystem. There have been many theories about the cause of bee colony collapses such as parasites, pesticides and poor nutrition however conclusive evidence of this phenomenon is yet to be identified.

Human inspection of beehives requires precision. It takes an experienced beekeeper to determine the health of a hive by the sounds generated …


Improving Binary Classification Using Filtering Based On K-Nn Proximity Graphs, Maher Ala’Raj, Munir Majdalawieh, Maysam F. Abbod Dec 2020

Improving Binary Classification Using Filtering Based On K-Nn Proximity Graphs, Maher Ala’Raj, Munir Majdalawieh, Maysam F. Abbod

All Works

© 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters …


The Impact Of Automated Feature Selection Techniques On The Interpretation Of Defect Models, Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude Sep 2020

The Impact Of Automated Feature Selection Techniques On The Interpretation Of Defect Models, Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude

Research Collection School Of Computing and Information Systems

The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models may be inconsistent if the selected metrics are inconsistent and correlated. In this paper, we systematically investigate 12 automated feature selection techniques with respect to the consistency, correlation, performance, computational cost, and the impact on the interpretation dimensions. Through an empirical investigation of 14 publicly-available defect datasets, we find that (1) 94–100% of the selected …


Sar Object Recognition Based On Multi-Band And Multi-Polarization Simulation Image, Gu Yu, Zhang Qin, Xu Ying Jun 2020

Sar Object Recognition Based On Multi-Band And Multi-Polarization Simulation Image, Gu Yu, Zhang Qin, Xu Ying

Journal of System Simulation

Abstract: The object model was built based on Creator, and object texture-material mapping was performed by Vega TMM tool. The multi-band and multi-polarization SAR image database was built by visual simulation technology. A hybrid intelligent optimization algorithm was designed to optimize combination of band and polarization by genetic algorithm and binary particle optimization. Zernike moment features, Gabor wavelet coefficients, etc were extracted from original image and rectified image to make up of feature candidates, and the feature selection experiments were carried out by using multi-band and multi-polarization SAR images. Simulation results demonstrate that, building SAR image database through simulation …


Feature Selection And Data Reconstruction Via Robust And Flexible Learning Models, Di Ming May 2020

Feature Selection And Data Reconstruction Via Robust And Flexible Learning Models, Di Ming

Computer Science and Engineering Dissertations

Feature selection and data reconstruction are very important topics in machine learning area. In today's big data environment, many data could have high dimensions and come with noise, corruption, etc. Thus, we develop robust and flexible learning models so as to select the relevant features from the high-dimensional data spaces and reconstruct the original clean data from the corrupted input data more efficiently and more effectively. To resolve the inflexibility of the widely used class-shared feature selection methods such as L21-norm, we derive LASSO from probabilistic selection on ridge regression which provides an independent point of view from the usual …


A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni Jan 2020

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank …


Sparsity And Weak Supervision In Quantum Machine Learning, Seyran Saeedi Jan 2020

Sparsity And Weak Supervision In Quantum Machine Learning, Seyran Saeedi

Theses and Dissertations

Quantum computing is an interdisciplinary field at the intersection of computer science, mathematics, and physics that studies information processing tasks on a quantum computer. A quantum computer is a device whose operations are governed by the laws of quantum mechanics. As building quantum computers is nearing the era of commercialization and quantum supremacy, it is essential to think of potential applications that we might benefit from. Among many applications of quantum computation, one of the emerging fields is quantum machine learning. We focus on predictive models for binary classification and variants of Support Vector Machines that we expect to be …


Towards Human Activity Recognition For Ubiquitous Health Care Using Data From Awaist-Mounted Smartphone, Umar Zia, Wajeeha Khalil, Salabat Khan, Iftikhar Ahmad, Naeem Khatak Jan 2020

Towards Human Activity Recognition For Ubiquitous Health Care Using Data From Awaist-Mounted Smartphone, Umar Zia, Wajeeha Khalil, Salabat Khan, Iftikhar Ahmad, Naeem Khatak

Turkish Journal of Electrical Engineering and Computer Sciences

Understanding human activities is a newly emerging paradigm that is greatly involved in developing ubiquitous health care (u-Health) systems. The aim of these systems is to seamlessly gather knowledge about the patient?s health and, after collecting knowledge, make suggestions to the patient according to his/her health profile. For this purpose, one of the most important ubiquitous communication trends is the smartphone, which has drawn the attention of both professionals and caregivers for monitoring the aging population, childcare, fall detection, and cognitive impairment. Recognizing human actions in a ubiquitous environment is very challenging and researchers have extensively investigated different methods to …


Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland Jan 2020

Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland

Research outputs 2014 to 2021

© 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because …


On A Yearly Basis Prediction Of Soil Water Content Utilizing Sar Data: A Machinelearning And Feature Selection Approach, Emrullah Acar, Mehmet Si̇raç Özerdem Jan 2020

On A Yearly Basis Prediction Of Soil Water Content Utilizing Sar Data: A Machinelearning And Feature Selection Approach, Emrullah Acar, Mehmet Si̇raç Özerdem

Turkish Journal of Electrical Engineering and Computer Sciences

Soil water content (SWC) performs an important role in many areas including agriculture, drought cases, usage of water resources, hydrology, crop diseases and aerology. However, the measurement of the SWC over large terrains with standard computational techniques is very hard. In order to overcome this situation, remote sensing tools are preferred, which can produce much more successful results in less time than standard calculation techniques. Among all remote sensing tools, synthetic aperture radar (SAR) has a significant impact on determining SWC over large terrains. The main objective of this study is to predict SWC on a yearly basis over the …


Event-Based Summarization Of News Articles, Feri̇de Savaroğlu Tabak, Vesi̇le Evri̇m Jan 2020

Event-Based Summarization Of News Articles, Feri̇de Savaroğlu Tabak, Vesi̇le Evri̇m

Turkish Journal of Electrical Engineering and Computer Sciences

In recent years, with the increase of available digital information on the Web, the time needed to find relevant information is also increased. Therefore, to reduce the time spent on searching, research on automatic text summarization has gained importance. The proposed summarization process is based on event extraction methods and is called an event-based extractive single-document summarization. In this method, the important features of event extraction and summarization methods are analyzed and combined together to extract the summaries from single-source news documents. Among the tested features, six features are found to be the most effective in constructing good summaries. The …


Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper Jan 2020

Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper

All Master's Theses

Tuberculosis (TB) is a respiratory disease which affects millions of people each year, accounting for the tenth leading cause of death worldwide, and is especially prevalent in underdeveloped regions where access to adequate medical care may be limited. Analysis of digital chest radiographs (CXRs) is a common and inexpensive method for the diagnosis of TB; however, a trained radiologist is required to interpret the results, and is subject to human error. Computer-Aided Detection (CAD) systems are a promising machine-learning based solution to automate the diagnosis of TB from CXR images. As the dimensionality of a high-resolution CXR image is very …


Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin Jan 2020

Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin

Research Collection School Of Computing and Information Systems

Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment …


Noise Clipping Algorithm Based On Relative Contribution Rate, Shuoyu Liu, Yueming Dai Dec 2019

Noise Clipping Algorithm Based On Relative Contribution Rate, Shuoyu Liu, Yueming Dai

Journal of System Simulation

Abstract: This paper presents a class noise cutting algorithm (Class noise cutting, CNC) based on relative contribution rate. The algorithm calculates the relative contribution rate of features to the theme. The most valuable feature set is selected by using features distinguish rating. The corresponding candidate categories for each feature are selected, to reduece the candidate category set, improves the classification accuracy, and speed up the response speed of the classifier. Compared with another ECN noise cutting algorithm (Eliminating the class whose), CNC-has higher accuracy and because of its simpler feature dimension dictionary and better candidate category set, the response …


Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang Aug 2019

Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang

Research Collection School Of Computing and Information Systems

The current two-step clustering methods separately learn the similarity matrix and conduct k means clustering. Moreover, the similarity matrix is learnt from the original data, which usually contain noise. As a consequence, these clustering methods cannot achieve good clustering results. To address these issues, this paper proposes a new graph clustering methods (namely Low-rank Sparse Subspace clustering (LSS)) to simultaneously learn the similarity matrix and conduct the clustering from the low-dimensional feature space of the original data. Specifically, the proposed LSS integrates the learning of similarity matrix of the original feature space, the learning of similarity matrix of the low-dimensional …


Sensor - Based Human Activity Recognition Using Smartphones, Mustafa Badshah May 2019

Sensor - Based Human Activity Recognition Using Smartphones, Mustafa Badshah

Master's Projects

It is a significant technical and computational task to provide precise information regarding the activity performed by a human and find patterns of their behavior. Countless applications can be molded and various problems in domains of virtual reality, health and medical, entertainment and security can be solved with advancements in human activity recognition (HAR) systems. HAR is an active field for research for more than a decade, but certain aspects need to be addressed to improve the system and revolutionize the way humans interact with smartphones. This research provides a holistic view of human activity recognition system architecture and discusses …


Selecting Maximally-Predictive Deep Features To Explain What Drives Fixations In Free-Viewing, Matthias Kümmerer, Thomas S.A. Wallis, Matthias Bethge May 2019

Selecting Maximally-Predictive Deep Features To Explain What Drives Fixations In Free-Viewing, Matthias Kümmerer, Thomas S.A. Wallis, Matthias Bethge

MODVIS Workshop

No abstract provided.


Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang Apr 2019

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang

Biostatistics Faculty Publications

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …


Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang Mar 2019

Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang

Biostatistics Faculty Publications

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi Mar 2019

Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi

Dissertations

Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes …


Coal Mine Water Inrush Prediction Based On Lstm Neural Network, Dong Lili, Fei Cheng, Zhang Xiang, Cao Chaofan Feb 2019

Coal Mine Water Inrush Prediction Based On Lstm Neural Network, Dong Lili, Fei Cheng, Zhang Xiang, Cao Chaofan

Coal Geology & Exploration

According to the prediction of water inrush from coal seam floor, based on the summarization of existing water inrush prediction methods and theories, the feature selection experiment shows that water pressure, distance from the working surface, sandstone section thickness, coal thickness, coal seam inclination, fault throw, fissure zone, mining area, mining height and strike length are the main factors affecting the occurrence of water inrush. These factors are complex and non-linear. A water inrush prediction model based on long short-term memory(LSTM) neural network was proposed. The data of the coal mine water inrush case was used as sample data to …