Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

Feature selection

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 49

Full-Text Articles in Physical Sciences and Mathematics

Time-Series Feature Selection For Solar Flare Forecasting, Yagnashree Velanki, Pouya Hosseinzadeh, Soukaina Filali Boubrahimi, Shah Muhammad Hamdi Sep 2024

Time-Series Feature Selection For Solar Flare Forecasting, Yagnashree Velanki, Pouya Hosseinzadeh, Soukaina Filali Boubrahimi, Shah Muhammad Hamdi

Computer Science Student Research

Solar flares are significant occurrences in solar physics, impacting space weather and terrestrial technologies. Accurate classification of solar flares is essential for predicting space weather and minimizing potential disruptions to communication, navigation, and power systems. This study addresses the challenge of selecting the most relevant features from multivariate time-series data, specifically focusing on solar flares. We employ methods such as Mutual Information (MI), Minimum Redundancy Maximum Relevance (mRMR), and Euclidean Distance to identify key features for classification. Recognizing the performance variability of different feature selection techniques, we introduce an ensemble approach to compute feature weights. By combining outputs from multiple …


Improved Binary Differential Evolution With Dimensionality Reduction Mechanism And Binary Stochastic Search For Feature Selection, Behrouz Ahadzadeh, Moloud Abdar, Fatemeh Safara, Leyla Aghaei, Seyedali Mirjalili, Abbas Khosravi, Salvador García, Fakhri Karray, U. Rajendra Acharya Jan 2024

Improved Binary Differential Evolution With Dimensionality Reduction Mechanism And Binary Stochastic Search For Feature Selection, Behrouz Ahadzadeh, Moloud Abdar, Fatemeh Safara, Leyla Aghaei, Seyedali Mirjalili, Abbas Khosravi, Salvador García, Fakhri Karray, U. Rajendra Acharya

Machine Learning Faculty Publications

Computer systems store massive amounts of data with numerous features, leading to the need to extract the most important features for better classification in a wide variety of applications. Poor performance of various machine learning algorithms may be caused by unimportant features that increase the time and memory required to build a classifier. Feature selection (FS) is one of the efficient approaches to reducing the unimportant features. This paper, therefore, presents a new FS, named BDE-BSS-DR, that utilizes Binary Differential Evolution (BDE), Binary Stochastic Search (BSS) algorithm, and Dimensionality Reduction (DR) mechanism. The BSS algorithm increases the search capability of …


Using Feature Selection Enhancement To Evaluate Attack Detection In The Internet Of Things Environment, Khawlah Harahsheh, Rami Al-Naimat, Chung-Hao Chen Jan 2024

Using Feature Selection Enhancement To Evaluate Attack Detection In The Internet Of Things Environment, Khawlah Harahsheh, Rami Al-Naimat, Chung-Hao Chen

Electrical & Computer Engineering Faculty Publications

The rapid evolution of technology has given rise to a connected world where billions of devices interact seamlessly, forming what is known as the Internet of Things (IoT). While the IoT offers incredible convenience and efficiency, it presents a significant challenge to cybersecurity and is characterized by various power, capacity, and computational process limitations. Machine learning techniques, particularly those encompassing supervised classification techniques, offer a systematic approach to training models using labeled datasets. These techniques enable intrusion detection systems (IDSs) to discern patterns indicative of potential attacks amidst the vast amounts of IoT data. Our investigation delves into various aspects …


Selecting And Evaluating Key Mds-Updrs Activities Using Wearable Devices For Parkinson's Disease Self-Assessment, Yuting Zhao, Xulong Wang, Xiyang Peng, Ziheng Li, Fengtao Nan, Menghui Zhuo, Jun Qi, Yun Yang, Zhong Zhao, Lida Xu, Po Yang Jan 2024

Selecting And Evaluating Key Mds-Updrs Activities Using Wearable Devices For Parkinson's Disease Self-Assessment, Yuting Zhao, Xulong Wang, Xiyang Peng, Ziheng Li, Fengtao Nan, Menghui Zhuo, Jun Qi, Yun Yang, Zhong Zhao, Lida Xu, Po Yang

Information Technology & Decision Sciences Faculty Publications

Parkinson's disease (PD) is a complex neurodegenerative disease in the elderly. This disease has no cure, but assessing these motor symptoms will help slow down that progression. Inertial sensing-based wearable devices (ISWDs) such as mobile phones and smartwatches have been widely employed to analyse the condition of PD patients. However, most studies purely focused on a single activity or symptom, which may ignore the correlation between activities and complementary characteristics. In this paper, a novel technical pipeline is proposed for fine-grained classification of PD severity grades, which identify the most representative activities. We also propose a multi-activities combination scheme based …


Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu Jan 2024

Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu

Computer Science Faculty Publications

Online streaming feature selection (OSFS), as an online learning manner to handle streaming features, is critical in addressing high-dimensional data. In real big data-related applications, the patterns and distributions of streaming features constantly change over time due to dynamic data generation environments. However, existing OSFS methods rely on presented and fixed hyperparameters, which undoubtedly lead to poor selection performance when encountering dynamic features. To make up for the existing shortcomings, the authors propose a novel OSFS algorithm based on vague set, named OSFS-Vague. Its main idea is to combine uncertainty and three-way decision theories to improve feature selection from the …


Malware Detection With Artificial Intelligence: A Systematic Literature Review, Matthew G. Gaber, Mohiuddin Ahmed, Helge Janicke Jan 2024

Malware Detection With Artificial Intelligence: A Systematic Literature Review, Matthew G. Gaber, Mohiuddin Ahmed, Helge Janicke

Research outputs 2022 to 2026

In this survey, we review the key developments in the field of malware detection using AI and analyze core challenges. We systematically survey state-of-the-art methods across five critical aspects of building an accurate and robust AI-powered malware-detection model: malware sophistication, analysis techniques, malware repositories, feature selection, and machine learning vs. deep learning. The effectiveness of an AI model is dependent on the quality of the features it is trained with. In turn, the quality and authenticity of these features is dependent on the quality of the dataset and the suitability of the analysis tool. Static analysis is fast but is …


An Improved Dandelion Optimizer Algorithm For Spam Detection: Next-Generation Email Filtering System, Mohammad Tubishat, Feras Al-Obeidat, Ali Safaa Sadiq, Seyedali Mirjalili Sep 2023

An Improved Dandelion Optimizer Algorithm For Spam Detection: Next-Generation Email Filtering System, Mohammad Tubishat, Feras Al-Obeidat, Ali Safaa Sadiq, Seyedali Mirjalili

All Works

Spam emails have become a pervasive issue in recent years, as internet users receive increasing amounts of unwanted or fake emails. To combat this issue, automatic spam detection methods have been proposed, which aim to classify emails into spam and non-spam categories. Machine learning techniques have been utilized for this task with considerable success. In this paper, we introduce a novel approach to spam email detection by presenting significant advancements to the Dandelion Optimizer (DO) algorithm. The DO is a relatively new nature-inspired optimization algorithm inspired by the flight of dandelion seeds. While the DO shows promise, it faces challenges, …


A Study On Feature Selection Using Multi-Domain Feature Extraction For Automated K-Complex Detection, Yabing Li, Xinglong Dong, Kun Song, Xiangyun Bai, Hongye Li, Fakhreddine Karray Sep 2023

A Study On Feature Selection Using Multi-Domain Feature Extraction For Automated K-Complex Detection, Yabing Li, Xinglong Dong, Kun Song, Xiangyun Bai, Hongye Li, Fakhreddine Karray

Machine Learning Faculty Publications

Background: K-complex detection plays a significant role in the field of sleep research. However, manual annotation for electroencephalography (EEG) recordings by visual inspection from experts is time-consuming and subjective. Therefore, there is a necessity to implement automatic detection methods based on classical machine learning algorithms. However, due to the complexity of EEG signal, current feature extraction methods always produce low relevance to k-complex detection, which leads to a great performance loss for the detection. Hence, finding compact yet effective integrated feature vectors becomes a crucially core task in k-complex detection. Method: In this paper, we first extract multi-domain features based …


Network Intrusion Detection With Two-Phased Hybrid Ensemble Learning And Automatic Feature Selection, Asanka Kavinda Mananayaka, Sunnie S. Chung Jan 2023

Network Intrusion Detection With Two-Phased Hybrid Ensemble Learning And Automatic Feature Selection, Asanka Kavinda Mananayaka, Sunnie S. Chung

Electrical and Computer Engineering Faculty Publications

The use of network connected devices has grown exponentially in recent years revolutionizing our daily lives. However, it has also attracted the attention of cybercriminals making the attacks targeted towards these devices increase not only in numbers but also in sophistication. To detect such attacks, a Network Intrusion Detection System (NIDS) has become a vital component in network applications. However, network devices produce large scale high-dimensional data which makes it difficult to accurately detect various known and unknown attacks. Moreover, the complex nature of network data makes the feature selection process of a NIDS a challenging task. In this study, …


An Explainable Artificial Intelligence Framework For The Predictive Analysis Of Hypo And Hyper Thyroidism Using Machine Learning Algorithms, Md. Bipul Hossain, Anika Shama, Apurba Adhikary, Avi Deb Raha, K. M. Aslam Uddin, Mohammad Amzad Hossain, Imtia Islam, Saydul Akbar Murad, Md. Shirajum Munir, Anupam Kumur Bairagi Jan 2023

An Explainable Artificial Intelligence Framework For The Predictive Analysis Of Hypo And Hyper Thyroidism Using Machine Learning Algorithms, Md. Bipul Hossain, Anika Shama, Apurba Adhikary, Avi Deb Raha, K. M. Aslam Uddin, Mohammad Amzad Hossain, Imtia Islam, Saydul Akbar Murad, Md. Shirajum Munir, Anupam Kumur Bairagi

Electrical & Computer Engineering Faculty Publications

The thyroid gland is the crucial organ in the human body, secreting two hormones that help to regulate the human body's metabolism. Thyroid disease is a severe medical complaint that could be developed by high Thyroid Stimulating Hormone (TSH) levels or an infection in the thyroid tissues. Hypothyroidism and hyperthyroidism are two critical conditions caused by insufficient thyroid hormone production and excessive thyroid hormone production, respectively. Machine learning models can be used to precisely process the data generated from different medical sectors and to build a model to predict several diseases. In this paper, we use different machine-learning algorithms to …


Predicting The Level Of Respiratory Support In Covid-19 Patients Using Machine Learning, Hisham Abdeltawab, Fahmi Khalifa, Yaser Elnakieb, Ahmed Elnakib, Fatma Taher, Norah Saleh Alghamdi, Harpal Singh Sandhu, Ayman El-Baz Oct 2022

Predicting The Level Of Respiratory Support In Covid-19 Patients Using Machine Learning, Hisham Abdeltawab, Fahmi Khalifa, Yaser Elnakieb, Ahmed Elnakib, Fatma Taher, Norah Saleh Alghamdi, Harpal Singh Sandhu, Ayman El-Baz

All Works

In this paper, a machine learning-based system for the prediction of the required level of respiratory support in COVID-19 patients is proposed. The level of respiratory support is divided into three classes: class 0 which refers to minimal support, class 1 which refers to non-invasive support, and class 2 which refers to invasive support. A two-stage classification system is built. First, the classification between class 0 and others is performed. Then, the classification between class 1 and class 2 is performed. The system is built using a dataset collected retrospectively from 3491 patients admitted to tertiary care hospitals at the …


A Gpu-Based Machine Learning Approach For Detection Of Botnet Attacks, Michal Motylinski, Áine Macdermott, Farkhund Iqbal, Babar Shah Sep 2022

A Gpu-Based Machine Learning Approach For Detection Of Botnet Attacks, Michal Motylinski, Áine Macdermott, Farkhund Iqbal, Babar Shah

All Works

Rapid development and adaptation of the Internet of Things (IoT) has created new problems for securing these interconnected devices and networks. There are hundreds of thousands of IoT devices with underlying security vulnerabilities, such as insufficient device authentication/authorisation making them vulnerable to malware infection. IoT botnets are designed to grow and compete with one another over unsecure devices and networks. Once infected, the device will monitor a Command-and-Control (C&C) server indicating the target of an attack via Distributed Denial of Service (DDoS) attack. These security issues, coupled with the continued growth of IoT, presents a much larger attack surface for …


Distinctive Features Of Nonverbal Behavior And Mimicry In Application Interviews Through Data Analysis And Machine Learning, Sanne Rogiers, Elias Corneillie, Filip Lievens, Frederik Anseel, Peter Veelaert, Wilfried Philips Sep 2022

Distinctive Features Of Nonverbal Behavior And Mimicry In Application Interviews Through Data Analysis And Machine Learning, Sanne Rogiers, Elias Corneillie, Filip Lievens, Frederik Anseel, Peter Veelaert, Wilfried Philips

Research Collection Lee Kong Chian School Of Business

This paper reveals the characteristics and effects of nonverbal behavior and human mimicry in the context of application interviews. It discloses a novel analyzation method for psychological research by utilizing machine learning. In comparison to traditional manual data analysis, machine learning proves to be able to analyze the data more deeply and to discover connections in the data invisible to the human eye. The paper describes an experiment to measure and analyze the reactions of evaluators to job applicants who adopt specific behaviors: mimicry, suppress, immediacy and natural behavior. First, evaluation of the applicant qualifications by the interviewer reveals …


Wrapper And Hybrid Feature Selection Methods Using Metaheuristic Algorithms For English Text Classification: A Systematic Review, Osamah Mohammed Alyasiri, Yu N. Cheah, Ammar Kamal Abasi, Omar Mustafa Al-Janabi Apr 2022

Wrapper And Hybrid Feature Selection Methods Using Metaheuristic Algorithms For English Text Classification: A Systematic Review, Osamah Mohammed Alyasiri, Yu N. Cheah, Ammar Kamal Abasi, Omar Mustafa Al-Janabi

Machine Learning Faculty Publications

Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. …


Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever Jan 2022

Using Feature Selection With Machine Learning For Generation Of Insurance Insights, Ayman Taha, Bernard Cosgrave, Susan Mckeever

Articles

Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector …


Hybrid Feature Selection Approach To Identify Optimal Features Of Profile Metadata To Detect Social Bots In Twitter, Eiman Alothali, Kadhim Hayawi, Hany Alashwal Dec 2021

Hybrid Feature Selection Approach To Identify Optimal Features Of Profile Metadata To Detect Social Bots In Twitter, Eiman Alothali, Kadhim Hayawi, Hany Alashwal

All Works

The last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naïve Bayes, support vector …


Feature Engineering Vs Feature Selection Vs Hyperparameter Optimization In The Spotify Song Popularity Dataset, Alan Cueva Mora, Brendan Tierney Oct 2021

Feature Engineering Vs Feature Selection Vs Hyperparameter Optimization In The Spotify Song Popularity Dataset, Alan Cueva Mora, Brendan Tierney

Conference Papers

Research in Featuring Engineering has been part of the data pre-processing phase of machine learning projects for many years. It can be challenging for new people working with machine learning to understand its importance along with various approaches to find an optimized model. This work uses the Spotify Song Popularity dataset to compare and evaluate Feature Engineering, Feature Selection and Hyperparameter Optimization. The result of this work will demonstrate Feature Engineering has a greater effect on model efficiency when compared to the alternative approaches.


Infrequent Pattern Detection For Reliable Network Traffic Analysis Using Robust Evolutionary Computation, A. N. M. Bazlur Rashid, Mohiuddin Ahmed, Al-Sakib K. Pathan Jan 2021

Infrequent Pattern Detection For Reliable Network Traffic Analysis Using Robust Evolutionary Computation, A. N. M. Bazlur Rashid, Mohiuddin Ahmed, Al-Sakib K. Pathan

Research outputs 2014 to 2021

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have …


Improving Binary Classification Using Filtering Based On K-Nn Proximity Graphs, Maher Ala’Raj, Munir Majdalawieh, Maysam F. Abbod Dec 2020

Improving Binary Classification Using Filtering Based On K-Nn Proximity Graphs, Maher Ala’Raj, Munir Majdalawieh, Maysam F. Abbod

All Works

© 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters …


The Impact Of Automated Feature Selection Techniques On The Interpretation Of Defect Models, Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude Sep 2020

The Impact Of Automated Feature Selection Techniques On The Interpretation Of Defect Models, Jirayus Jiarpakdee, Chakkrit Tantithamthavorn, Christoph Treude

Research Collection School Of Computing and Information Systems

The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models may be inconsistent if the selected metrics are inconsistent and correlated. In this paper, we systematically investigate 12 automated feature selection techniques with respect to the consistency, correlation, performance, computational cost, and the impact on the interpretation dimensions. Through an empirical investigation of 14 publicly-available defect datasets, we find that (1) 94–100% of the selected …


A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni Jan 2020

A Xgboost Risk Model Via Feature Selection And Bayesian Hyper-Parameter Optimization, Yan Wang, Sherry Ni

Published and Grey Literature from PhD Candidates

This paper aims to explore models based on the extreme gradient boosting (XGBoost) approach for business risk classification. Feature selection (FS) algorithms and hyper-parameter optimizations are simultaneously considered during model training. The five most commonly used FS methods including weight by Gini, weight by Chi-square, hierarchical variable clustering, weight by correlation, and weight by information are applied to alleviate the effect of redundant features. Two hyper-parameter optimization approaches, random search (RS) and Bayesian tree-structuredParzen Estimator (TPE), are applied in XGBoost. The effect of different FS and hyper-parameter optimization methods on the model performance are investigated by the Wilcoxon Signed Rank …


Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland Jan 2020

Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland

Research outputs 2014 to 2021

© 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because …


Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin Jan 2020

Optimal Feature Selection For Learning-Based Algorithms For Sentiment Classification, Zhaoxia Wang, Zhiping Lin

Research Collection School Of Computing and Information Systems

Sentiment classification is an important branch of cognitive computation—thus the further studies of properties of sentiment analysis is important. Sentiment classification on text data has been an active topic for the last two decades and learning-based methods are very popular and widely used in various applications. For learning-based methods, a lot of enhanced technical strategies have been used to improve the performance of the methods. Feature selection is one of these strategies and it has been studied by many researchers. However, an existing unsolved difficult problem is the choice of a suitable number of features for obtaining the best sentiment …


Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang Aug 2019

Low-Rank Sparse Subspace For Spectral Clustering, Xiaofeng Zhu, Shichao Zhang, Yonggang Li, Jilian Zhang, Lifeng Yang, Yue Fang

Research Collection School Of Computing and Information Systems

The current two-step clustering methods separately learn the similarity matrix and conduct k means clustering. Moreover, the similarity matrix is learnt from the original data, which usually contain noise. As a consequence, these clustering methods cannot achieve good clustering results. To address these issues, this paper proposes a new graph clustering methods (namely Low-rank Sparse Subspace clustering (LSS)) to simultaneously learn the similarity matrix and conduct the clustering from the low-dimensional feature space of the original data. Specifically, the proposed LSS integrates the learning of similarity matrix of the original feature space, the learning of similarity matrix of the low-dimensional …


Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang Apr 2019

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang

Biostatistics Faculty Publications

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …


Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang Mar 2019

Feature Selection For Longitudinal Data By Using Sign Averages To Summarize Gene Expression Values Over Time, Suyan Tian, Chi Wang

Biostatistics Faculty Publications

With the rapid evolution of high-throughput technologies, time series/longitudinal high-throughput experiments have become possible and affordable. However, the development of statistical methods dealing with gene expression profiles across time points has not kept up with the explosion of such data. The feature selection process is of critical importance for longitudinal microarray data. In this study, we proposed aggregating a gene’s expression values across time into a single value using the sign average method, thereby degrading a longitudinal feature selection process into a classic one. Regularized logistic regression models with pseudogenes (i.e., the sign average of genes across time as predictors) …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


An Investigation Into The Predictive Capability Of Customer Spending In Modelling Mortgage Default, Donal Finn [Thesis] Jan 2019

An Investigation Into The Predictive Capability Of Customer Spending In Modelling Mortgage Default, Donal Finn [Thesis]

Dissertations

The mortgage arrears crisis in Ireland was and is among the most severe experienced on record and although there has been a decreasing trend in the number of mortgages in default in the past four years, it still continues to cause distress to borrowers and vulnerabilities to lenders. There are indications that one of the main factors associated with mortgage default is loan affordability, of which the level of disposable income is a driver. Additionally, guidelines set out by the European Central Bank instructed financial institutions to adopt measures to further reduce and prevent loans defaulting, including the implementation and …


A Logitudinal Feature Selection Method Identifies Relevant Genes To Distinguish Complicated Injury And Uncomplicated Injury Over Time, Suyan Tian, Chi Wang, Howard H. Chang Dec 2018

A Logitudinal Feature Selection Method Identifies Relevant Genes To Distinguish Complicated Injury And Uncomplicated Injury Over Time, Suyan Tian, Chi Wang, Howard H. Chang

Biostatistics Faculty Publications

Background: Feature selection and gene set analysis are of increasing interest in the field of bioinformatics. While these two approaches have been developed for different purposes, we describe how some gene set analysis methods can be utilized to conduct feature selection.

Methods: We adopted a gene set analysis method, the significance analysis of microarray gene set reduction (SAMGSR) algorithm, to carry out feature selection for longitudinal gene expression data.

Results: Using a real-world application and simulated data, it is demonstrated that the proposed SAMGSR extension outperforms other relevant methods. In this study, we illustrate that a gene’s expression profiles over …


Compressive Representation For Device-Free Activity Recognition With Passive Rfid Signal Strength, Lina Yao, Quan Z. Sheng, Xue Li, Tao Gu, Mingkui Tan, Xianzhi Wang, Sen Wang, Wenjie Ruan Feb 2018

Compressive Representation For Device-Free Activity Recognition With Passive Rfid Signal Strength, Lina Yao, Quan Z. Sheng, Xue Li, Tao Gu, Mingkui Tan, Xianzhi Wang, Sen Wang, Wenjie Ruan

Research Collection School Of Computing and Information Systems

Understanding and recognizing human activities is a fundamental research topic for a wide range of important applications such as fall detection and remote health monitoring and intervention. Despite active research in human activity recognition over the past years, existing approaches based on computer vision or wearable sensor technologies present several significant issues such as privacy (e.g., using video camera to monitor the elderly at home) and practicality (e.g., not possible for an older person with dementia to remember wearing devices). In this paper, we present a low-cost, unobtrusive, and robust system that supports independent living of older people. The system …