Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Physical Sciences and Mathematics

Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu Jan 2024

Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu

Computer Science Faculty Publications

Online streaming feature selection (OSFS), as an online learning manner to handle streaming features, is critical in addressing high-dimensional data. In real big data-related applications, the patterns and distributions of streaming features constantly change over time due to dynamic data generation environments. However, existing OSFS methods rely on presented and fixed hyperparameters, which undoubtedly lead to poor selection performance when encountering dynamic features. To make up for the existing shortcomings, the authors propose a novel OSFS algorithm based on vague set, named OSFS-Vague. Its main idea is to combine uncertainty and three-way decision theories to improve feature selection from the …


Using Feature Selection Enhancement To Evaluate Attack Detection In The Internet Of Things Environment, Khawlah Harahsheh, Rami Al-Naimat, Chung-Hao Chen Jan 2024

Using Feature Selection Enhancement To Evaluate Attack Detection In The Internet Of Things Environment, Khawlah Harahsheh, Rami Al-Naimat, Chung-Hao Chen

Electrical & Computer Engineering Faculty Publications

The rapid evolution of technology has given rise to a connected world where billions of devices interact seamlessly, forming what is known as the Internet of Things (IoT). While the IoT offers incredible convenience and efficiency, it presents a significant challenge to cybersecurity and is characterized by various power, capacity, and computational process limitations. Machine learning techniques, particularly those encompassing supervised classification techniques, offer a systematic approach to training models using labeled datasets. These techniques enable intrusion detection systems (IDSs) to discern patterns indicative of potential attacks amidst the vast amounts of IoT data. Our investigation delves into various aspects …


Feature Selection From Clinical Surveys Using Semantic Textual Similarity, Benjamin Warner May 2023

Feature Selection From Clinical Surveys Using Semantic Textual Similarity, Benjamin Warner

McKelvey School of Engineering Theses & Dissertations

Survey data collected from human subjects can contain a high number of features while having a comparatively low quantity of examples. Machine learning models that attempt to predict outcomes from survey data under these conditions can overfit and result in poor generalizability. One remedy to this issue is feature selection, which attempts to select an optimal subset of features to learn upon. A relatively unexplored source of information in the feature selection process is the usage of textual names of features, which may be semantically indicative of which features are relevant to a target outcome. The relationships between feature names …


An Explainable Artificial Intelligence Framework For The Predictive Analysis Of Hypo And Hyper Thyroidism Using Machine Learning Algorithms, Md. Bipul Hossain, Anika Shama, Apurba Adhikary, Avi Deb Raha, K. M. Aslam Uddin, Mohammad Amzad Hossain, Imtia Islam, Saydul Akbar Murad, Md. Shirajum Munir, Anupam Kumur Bairagi Jan 2023

An Explainable Artificial Intelligence Framework For The Predictive Analysis Of Hypo And Hyper Thyroidism Using Machine Learning Algorithms, Md. Bipul Hossain, Anika Shama, Apurba Adhikary, Avi Deb Raha, K. M. Aslam Uddin, Mohammad Amzad Hossain, Imtia Islam, Saydul Akbar Murad, Md. Shirajum Munir, Anupam Kumur Bairagi

Electrical & Computer Engineering Faculty Publications

The thyroid gland is the crucial organ in the human body, secreting two hormones that help to regulate the human body's metabolism. Thyroid disease is a severe medical complaint that could be developed by high Thyroid Stimulating Hormone (TSH) levels or an infection in the thyroid tissues. Hypothyroidism and hyperthyroidism are two critical conditions caused by insufficient thyroid hormone production and excessive thyroid hormone production, respectively. Machine learning models can be used to precisely process the data generated from different medical sectors and to build a model to predict several diseases. In this paper, we use different machine-learning algorithms to …


Local Feature Selection For Multiple Instance Learning With Applications., Aliasghar Shahrjooihaghighi Dec 2021

Local Feature Selection For Multiple Instance Learning With Applications., Aliasghar Shahrjooihaghighi

Electronic Theses and Dissertations

Feature selection is a data processing approach that has been successfully and effectively used in developing machine learning algorithms for various applications. It has been proven to effectively reduce the dimensionality of the data and increase the accuracy and interpretability of machine learning algorithms. Conventional feature selection algorithms assume that there is an optimal global subset of features for the whole sample space. Thus, only one global subset of relevant features is learned. An alternative approach is based on the concept of Local Feature Selection (LFS), where each training sample can have its own subset of relevant features. Multiple Instance …


Sparsity And Weak Supervision In Quantum Machine Learning, Seyran Saeedi Jan 2020

Sparsity And Weak Supervision In Quantum Machine Learning, Seyran Saeedi

Theses and Dissertations

Quantum computing is an interdisciplinary field at the intersection of computer science, mathematics, and physics that studies information processing tasks on a quantum computer. A quantum computer is a device whose operations are governed by the laws of quantum mechanics. As building quantum computers is nearing the era of commercialization and quantum supremacy, it is essential to think of potential applications that we might benefit from. Among many applications of quantum computation, one of the emerging fields is quantum machine learning. We focus on predictive models for binary classification and variants of Support Vector Machines that we expect to be …


Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin Jan 2020

Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin

Research Collection School Of Computing and Information Systems

Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment …


Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez Jan 2019

Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez

Theses and Dissertations

This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes …


Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu Aug 2017

Large-Scale Online Feature Selection For Ultra-High Dimensional Sparse Data, Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu

Research Collection School Of Computing and Information Systems

Feature selection (FS) is an important technique in machine learning and data mining, especially for large scale high-dimensional data. Most existing studies have been restricted to batch learning, which is often inefficient and poorly scalable when handling big data in real world. As real data may arrive sequentially and continuously, batch learning has to retrain the model for the new coming data, which is very computationally intensive. Online feature selection (OFS) is a promising new paradigm that is more efficient and scalable than batch learning algorithms. However, existing online algorithms usually fall short in their inferior efficacy. In this article, …


Detection Of Seagrass Scars Using Sparse Coding And Morphological Filter, Ender Oguslu, Sertan Erkanli, Victoria J. Hill, W. Paul Bissett, Richard C. Zimmerman, Jiang Li, Charles R. Bostater Jr. (Ed.), Stelios P. Mertikas (Ed.), Xavier Neyt (Ed.) Jan 2014

Detection Of Seagrass Scars Using Sparse Coding And Morphological Filter, Ender Oguslu, Sertan Erkanli, Victoria J. Hill, W. Paul Bissett, Richard C. Zimmerman, Jiang Li, Charles R. Bostater Jr. (Ed.), Stelios P. Mertikas (Ed.), Xavier Neyt (Ed.)

OES Faculty Publications

We present a two-step algorithm for the detection of seafloor propeller seagrass scars in shallow water using panchromatic images. The first step is to classify image pixels into scar and non-scar categories based on a sparse coding algorithm. The first step produces an initial scar map in which false positive scar pixels may be present. In the second step, local orientation of each detected scar pixel is computed using the morphological directional profile, which is defined as outputs of a directional filter with a varying orientation parameter. The profile is then utilized to eliminate false positives and generate the final …


Hyperspectral Image Classification Using A Spectral-Spatial Sparse Coding Model, Ender Oguslu, Guoqing Zhou, Jiang Li, Lorenzo Bruzzone (Ed.) Jan 2013

Hyperspectral Image Classification Using A Spectral-Spatial Sparse Coding Model, Ender Oguslu, Guoqing Zhou, Jiang Li, Lorenzo Bruzzone (Ed.)

Electrical & Computer Engineering Faculty Publications

We present a sparse coding based spectral-spatial classification model for hyperspectral image (HSI) datasets. The proposed method consists of an efficient sparse coding method in which the l1/lq regularized multi-class logistic regression technique was utilized to achieve a compact representation of hyperspectral image pixels for land cover classification. We applied the proposed algorithm to a HSI dataset collected at the Kennedy Space Center and compared our algorithm to a recently proposed method, Gaussian process maximum likelihood (GP-ML) classifier. Experimental results show that the proposed method can achieve significantly better performances than the GP-ML classifier when training data …


Vegetation Identification Based On Satellite Imagery, Vamsi K.R. Mantena, Ramu Pedada, Srinivas Jakkula, Yuzhong Shen, Jiang Li, Hamid R. Arabnia (Ed.) Jan 2008

Vegetation Identification Based On Satellite Imagery, Vamsi K.R. Mantena, Ramu Pedada, Srinivas Jakkula, Yuzhong Shen, Jiang Li, Hamid R. Arabnia (Ed.)

Electrical & Computer Engineering Faculty Publications

Automatic vegetation identification plays an important role in many applications including remote sensing and high performance flight simulations. This paper presents a method to automatically identify vegetation based upon satellite imagery. First, we utilize the ISODATA algorithm to cluster pixels in the images where the number of clusters is determined by the algorithm. We then apply morphological operations to the clustered images to smooth the boundaries between clusters and to fill holes inside clusters. After that, we compute six features for each cluster. These six features then go through a feature selection algorithm and three of them are determined to …