Open Access. Powered by Scholars. Published by Universities.®
![Digital Commons Network](http://assets.bepress.com/20200205/img/dcn/DCsunburst.png)
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Computer Sciences (33)
- Artificial Intelligence and Robotics (13)
- Data Science (9)
- Mathematics (8)
- Databases and Information Systems (5)
-
- Medicine and Health Sciences (4)
- Theory and Algorithms (4)
- Business (2)
- Engineering (2)
- Medical Specialties (2)
- Other Computer Sciences (2)
- Statistics and Probability (2)
- Anesthesiology (1)
- Applied Statistics (1)
- Biomedical Informatics (1)
- Biostatistics (1)
- Business Administration, Management, and Operations (1)
- Electrical and Computer Engineering (1)
- Environmental Public Health (1)
- Epidemiology (1)
- Graphics and Human Computer Interfaces (1)
- Health Information Technology (1)
- Health and Medical Administration (1)
- Human Resources Management (1)
- Industrial Engineering (1)
- Information Security (1)
- Library and Information Science (1)
- Multivariate Analysis (1)
- Numerical Analysis and Scientific Computing (1)
- Institution
-
- Indian Statistical Institute (8)
- Nova Southeastern University (4)
- Air Force Institute of Technology (2)
- Brigham Young University (2)
- Clemson University (2)
-
- Georgia State University (2)
- New Jersey Institute of Technology (2)
- University of Texas at Arlington (2)
- Utah State University (2)
- Virginia Commonwealth University (2)
- Western University (2)
- Central Washington University (1)
- City University of New York (CUNY) (1)
- Georgia Southern University (1)
- Louisiana Tech University (1)
- Old Dominion University (1)
- San Jose State University (1)
- United Arab Emirates University (1)
- University of Louisville (1)
- University of Massachusetts Amherst (1)
- University of Nevada, Las Vegas (1)
- University of New Orleans (1)
- University of South Florida (1)
- Washington University in St. Louis (1)
- Wayne State University (1)
- Wilfrid Laurier University (1)
- Publication Year
- Publication
-
- Doctoral Theses (8)
- Theses and Dissertations (6)
- CCE Theses and Dissertations (4)
- Dissertations (3)
- All Dissertations (2)
-
- All Graduate Theses and Dissertations, Spring 1920 to Summer 2023 (2)
- Computer Science Dissertations (2)
- Computer Science and Engineering Dissertations (2)
- Doctoral Dissertations (2)
- Electronic Theses and Dissertations (2)
- Electronic Thesis and Dissertation Repository (2)
- All Master's Theses (1)
- Dissertations, Theses, and Capstone Projects (1)
- Electrical & Computer Engineering Theses & Dissertations (1)
- Master's Projects (1)
- McKelvey School of Engineering Theses & Dissertations (1)
- Theses and Dissertations (Comprehensive) (1)
- UNLV Theses, Dissertations, Professional Papers, and Capstones (1)
- USF Tampa Graduate Theses and Dissertations (1)
- University of New Orleans Theses and Dissertations (1)
- Wayne State University Theses (1)
Articles 1 - 30 of 45
Full-Text Articles in Physical Sciences and Mathematics
Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang
Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang
Electronic Thesis and Dissertation Repository
This research investigates the mortality risk of COVID-19 patients across different variant waves, using the data from Centers for Disease Control and Prevention (CDC) websites. By analyzing the available data, including patient medical records, vaccination rates, and hospital capacities, we aim to discern patterns and factors associated with COVID-19-related deaths.
To explore features linked to COVID-19 mortality, we employ different techniques such as Filter, Wrapper, and Embedded methods for feature selection. Furthermore, we apply various machine learning methods, including support vector machines, decision trees, random forests, logistic regression, K-nearest neighbours, na¨ıve Bayes methods, and artificial neural networks, to uncover underlying …
Feature Selection From Clinical Surveys Using Semantic Textual Similarity, Benjamin Warner
Feature Selection From Clinical Surveys Using Semantic Textual Similarity, Benjamin Warner
McKelvey School of Engineering Theses & Dissertations
Survey data collected from human subjects can contain a high number of features while having a comparatively low quantity of examples. Machine learning models that attempt to predict outcomes from survey data under these conditions can overfit and result in poor generalizability. One remedy to this issue is feature selection, which attempts to select an optimal subset of features to learn upon. A relatively unexplored source of information in the feature selection process is the usage of textual names of features, which may be semantically indicative of which features are relevant to a target outcome. The relationships between feature names …
Tempering The Adversary: An Exploration Into The Applications Of Game Theoretic Feature Selection And Regression, Stephen Mcgee
Tempering The Adversary: An Exploration Into The Applications Of Game Theoretic Feature Selection And Regression, Stephen Mcgee
All Dissertations
Most modern machine learning algorithms tend to focus on an "average-case" approach, where every data point contributes the same amount of influence towards calculating the fit of a model. This "per-data point" error (or loss) is averaged together into an overall loss and typically minimized with an objective function. However, this can be insensitive to valuable outliers. Inspired by game theory, the goal of this work is to explore the utility of incorporating an optimally-playing adversary into feature selection and regression frameworks. The adversary assigns weights to the data elements so as to degrade the modeler's performance in an optimal …
Efficient Algorithms And Human-In-The-Loop Approaches For Attribute Design And Selection, Md Abdus Salam
Efficient Algorithms And Human-In-The-Loop Approaches For Attribute Design And Selection, Md Abdus Salam
Computer Science and Engineering Dissertations
Feature engineering and feature selection are two important aspects of data science pipeline. Due to the advancement of data collection techniques in recent years, huge amount of data is becoming available in different industries. Consequently, the importance of data science is increasing for business analytic purpose. Different tools and techniques are being developed to assist data scientists to complete their tasks efficiently. One of the main human involvements in the data science task is for feature engineering and selection. These pre-processing steps will prepare the data in the format desired to be fed into various machine learning algorithms to accomplish …
Generalized Robust Feature Selection, Bradford L. Lott
Generalized Robust Feature Selection, Bradford L. Lott
Theses and Dissertations
Feature selection may be summarized as identifying salient features to a given response. Understanding which features affect the response enables, in the future, only collecting consequential data; hence, the feature selection algorithm may lead to saving effort spent collecting data, storage resources, as well as computational resources for making predictions. We propose a generalized approach to select the salient features of data sets. Our approach may also be applied to unsupervised datasets to understand which data streams provide unique information. We contend our approach identifies salient features robust to the sub-sequent predictive model applied. The proposed algorithm considers all provided …
Local Feature Selection For Multiple Instance Learning With Applications., Aliasghar Shahrjooihaghighi
Local Feature Selection For Multiple Instance Learning With Applications., Aliasghar Shahrjooihaghighi
Electronic Theses and Dissertations
Feature selection is a data processing approach that has been successfully and effectively used in developing machine learning algorithms for various applications. It has been proven to effectively reduce the dimensionality of the data and increase the accuracy and interpretability of machine learning algorithms. Conventional feature selection algorithms assume that there is an optimal global subset of features for the whole sample space. Thus, only one global subset of relevant features is learned. An alternative approach is based on the concept of Local Feature Selection (LFS), where each training sample can have its own subset of relevant features. Multiple Instance …
Enhancing The Performance Of Text Mining, Farah Mahmoud Al Shanik
Enhancing The Performance Of Text Mining, Farah Mahmoud Al Shanik
All Dissertations
The amount of text data produced in science, finance, social media, and medicine is growing at an unprecedented pace. The raw text data typically introduces major computational and analytical obstacles (e.g., extremely high dimensionality) to data mining and machine learning algorithms. Besides, the growth in the size of text data makes the search process more difficult for information retrieval systems, making retrieving relevant results to match the users’ search queries challenging. Moreover, the availability of text data in different languages creates the need to develop new methods to analyze multilingual topics to help policymakers in governmental and health systems to …
High-Dimensional Feature Selection And Multi-Level Causal Mediation Analysis With Applications To Human Aging And Cluster-Based Intervention Studies, Hachem Saddiki
Doctoral Dissertations
Many questions in public health and medicine are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome of interest. As a result, causal inference frameworks and methodologies have gained interest as a promising tool to reliably answer scientific questions. However, the tasks of identifying and efficiently estimating causal effects from observed data still pose significant challenges under complex data generating scenarios. We focus on (1) high-dimensional settings where the number of variables is orders of magnitude higher than the number of observations; and (2) multi-level settings, where study participants …
Comparative Study Of Machine Learning Models On Solar Flare Prediction Problem, Nikhil Sai Kurivella
Comparative Study Of Machine Learning Models On Solar Flare Prediction Problem, Nikhil Sai Kurivella
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Solar flare events are explosions of energy and radiation from the Sun’s surface. These events occur due to the tangling and twisting of magnetic fields associated with sunspots. When Coronal Mass ejections accompany solar flares, solar storms could travel towards earth at very high speeds, disrupting all earthly technologies and posing radiation hazards to astronauts. For this reason, the prediction of solar flares has become a crucial aspect of forecasting space weather. Our thesis utilized the time-series data consisting of active solar region magnetic field parameters acquired from SDO that span more than eight years. The classification models take AR …
Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian
Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian
USF Tampa Graduate Theses and Dissertations
With the proliferation of smart, handheld devices, there has been a multifold increase in the ability of firms to target and engage with customers through mobile advertising. Therefore, not surprisingly, mobile advertising campaigns have become an integral aspect of firms’ brand building activities, such as improving the awareness and overall visibility of firms' brands. In addition, retailers are increasingly using mobile advertising for targeted promotional activities that increase in-store visits and eventual sales conversions. However, in recent years, mobile or in general online advertising campaigns have been facing one major challenge and one major threat that can negatively impact the …
Feature Selection On Permissions, Intents And Apis For Android Malware Detection, Fred Guyton
Feature Selection On Permissions, Intents And Apis For Android Malware Detection, Fred Guyton
CCE Theses and Dissertations
Malicious applications pose an enormous security threat to mobile computing devices. Currently 85% of all smartphones run Android, Google’s open-source operating system, making that platform the primary threat vector for malware attacks. Android is a platform that hosts roughly 99% of known malware to date, and is the focus of most research efforts in mobile malware detection due to its open source nature. One of the main tools used in this effort is supervised machine learning. While a decade of work has made a lot of progress in detection accuracy, there is an obstacle that each stream of research is …
Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi
Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi
Theses and Dissertations (Comprehensive)
This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …
Automation Of Feature Selection And Generation Of Optimal Feature Subsets For Beehive Audio Sample Classification, Aditya Bhouraskar
Automation Of Feature Selection And Generation Of Optimal Feature Subsets For Beehive Audio Sample Classification, Aditya Bhouraskar
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
The last couple of decades have witnessed an abnormal phenomenon of reduction in the bee population, this is a serious matter of concern as three out of four crops available globally have honey bee as their sole pollinator causing significant economic losses and an unbalance in the ecosystem. There have been many theories about the cause of bee colony collapses such as parasites, pesticides and poor nutrition however conclusive evidence of this phenomenon is yet to be identified.
Human inspection of beehives requires precision. It takes an experienced beekeeper to determine the health of a hive by the sounds generated …
Feature Selection And Data Reconstruction Via Robust And Flexible Learning Models, Di Ming
Feature Selection And Data Reconstruction Via Robust And Flexible Learning Models, Di Ming
Computer Science and Engineering Dissertations
Feature selection and data reconstruction are very important topics in machine learning area. In today's big data environment, many data could have high dimensions and come with noise, corruption, etc. Thus, we develop robust and flexible learning models so as to select the relevant features from the high-dimensional data spaces and reconstruct the original clean data from the corrupted input data more efficiently and more effectively. To resolve the inflexibility of the widely used class-shared feature selection methods such as L21-norm, we derive LASSO from probabilistic selection on ridge regression which provides an independent point of view from the usual …
Sparsity And Weak Supervision In Quantum Machine Learning, Seyran Saeedi
Sparsity And Weak Supervision In Quantum Machine Learning, Seyran Saeedi
Theses and Dissertations
Quantum computing is an interdisciplinary field at the intersection of computer science, mathematics, and physics that studies information processing tasks on a quantum computer. A quantum computer is a device whose operations are governed by the laws of quantum mechanics. As building quantum computers is nearing the era of commercialization and quantum supremacy, it is essential to think of potential applications that we might benefit from. Among many applications of quantum computation, one of the emerging fields is quantum machine learning. We focus on predictive models for binary classification and variants of Support Vector Machines that we expect to be …
Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper
Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper
All Master's Theses
Tuberculosis (TB) is a respiratory disease which affects millions of people each year, accounting for the tenth leading cause of death worldwide, and is especially prevalent in underdeveloped regions where access to adequate medical care may be limited. Analysis of digital chest radiographs (CXRs) is a common and inexpensive method for the diagnosis of TB; however, a trained radiologist is required to interpret the results, and is subject to human error. Computer-Aided Detection (CAD) systems are a promising machine-learning based solution to automate the diagnosis of TB from CXR images. As the dimensionality of a high-resolution CXR image is very …
Sensor - Based Human Activity Recognition Using Smartphones, Mustafa Badshah
Sensor - Based Human Activity Recognition Using Smartphones, Mustafa Badshah
Master's Projects
It is a significant technical and computational task to provide precise information regarding the activity performed by a human and find patterns of their behavior. Countless applications can be molded and various problems in domains of virtual reality, health and medical, entertainment and security can be solved with advancements in human activity recognition (HAR) systems. HAR is an active field for research for more than a decade, but certain aspects need to be addressed to improve the system and revolutionize the way humans interact with smartphones. This research provides a holistic view of human activity recognition system architecture and discusses …
Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi
Streaming Feature Grouping And Selection (Sfgs) For Big Data Classification, Noura Helal Hamad Al Nuaimi
Dissertations
Real-time data has always been an essential element for organizations when the quickness of data delivery is critical to their businesses. Today, organizations understand the importance of real-time data analysis to maintain benefits from their generated data. Real-time data analysis is also known as real-time analytics, streaming analytics, real-time streaming analytics, and event processing. Stream processing is the key to getting results in real-time. It allows us to process the data stream in real-time as it arrives. The concept of streaming data means the data are generated dynamically, and the full stream is unknown or even infinite. This data becomes …
Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez
Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez
Theses and Dissertations
This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes …
Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis
Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis
Electronic Theses and Dissertations
Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …
Feature Set Selection For Improved Classification Of Static Analysis Alerts, Kathleen Goeschel
Feature Set Selection For Improved Classification Of Static Analysis Alerts, Kathleen Goeschel
CCE Theses and Dissertations
With the extreme growth in third party cloud applications, increased exposure of applications to the internet, and the impact of successful breaches, improving the security of software being produced is imperative. Static analysis tools can alert to quality and security vulnerabilities of an application; however, they present developers and analysts with a high rate of false positives and unactionable alerts. This problem may lead to the loss of confidence in the scanning tools, possibly resulting in the tools not being used. The discontinued use of these tools may increase the likelihood of insecure software being released into production. Insecure software …
Improving K-Nn Search And Subspace Clustering Based On Local Intrinsic Dimensionality, Arwa M. Wali
Improving K-Nn Search And Subspace Clustering Based On Local Intrinsic Dimensionality, Arwa M. Wali
Dissertations
In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called "curse of dimensionality". As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering …
The Impact Of Cost On Feature Selection For Classifiers, Richard Clyde Mccrae
The Impact Of Cost On Feature Selection For Classifiers, Richard Clyde Mccrae
CCE Theses and Dissertations
Supervised machine learning models are increasingly being used for medical diagnosis. The diagnostic problem is formulated as a binary classification task in which trained classifiers make predictions based on a set of input features. In diagnosis, these features are typically procedures or tests with associated costs. The cost of applying a trained classifier for diagnosis may be estimated as the total cost of obtaining values for the features that serve as inputs for the classifier. Obtaining classifiers based on a low cost set of input features with acceptable classification accuracy is of interest to practitioners and researchers. What makes this …
Feature Selection From Large Acoustic Feature Sets In Computational Paralinguistics, Dara Pir
Feature Selection From Large Acoustic Feature Sets In Computational Paralinguistics, Dara Pir
Dissertations, Theses, and Capstone Projects
The burgeoning field of computational paralinguistics deals with the ways in which spoken words are uttered and attempts to recognize the states and traits of the speakers. Many areas of current scientific research, including computational paralinguistics, have started to employ datasets with ever increasing number of features. Using large feature sets has helped improve recognition performances. However, processing these large sets has given rise to various problems. Feature selection methods, which reduce the dimensionality of the original feature sets by removing irrelevant and/or redundant features, could be used to address these problems.
The two main methods for feature selection are …
Using Machine Learning To Predict Chemotherapy Response In Cell Lines And Patients Based On Genetic Expression, Dimo Angelov
Using Machine Learning To Predict Chemotherapy Response In Cell Lines And Patients Based On Genetic Expression, Dimo Angelov
Electronic Thesis and Dissertation Repository
The goal of this thesis was to examine different machine learning techniques for predicting chemotherapy response in cell lines and patients based on genetic expression. After trying regression, multi-class classification techniques and binary classification it was concluded that binary classification was the best method for training models due to the limited size of available cell line data. We found support vector machine classifiers trained on cell line data were easier to use and produced better results compared to neural networks. Sequential backward feature selection was able to select genes for the models that produced good results, however the greedy algorithm …
Some Issues In Unsupervised Feature Selection Using Similarity., Partha Pratim Kundu Dr.
Some Issues In Unsupervised Feature Selection Using Similarity., Partha Pratim Kundu Dr.
Doctoral Theses
Pattern recognition is what humans do most of the time, without any conscious effort, and fortunately excel in. Information is received through various sensory organs, processed simultaneously in the brain, and its source is instantaneously identified without any perceptible effort. The interesting issue is that recognition occurs even under non-ideal conditions, i.e., when information is vague, imprecise or incomplete. In reality, most human activities depend on the success in performing various pattern recognition tasks. Let us consider an example. Before boarding a train or bus, we first select the appropriate one by identifying either the route number or its destination …
On Supervised And Unsupervised Methodologies For Mining Of Text Data., Tanmay Basu Dr.
On Supervised And Unsupervised Methodologies For Mining Of Text Data., Tanmay Basu Dr.
Doctoral Theses
The supervised and unsupervised methodologies of text mining using the plain text data of English language have been discussed. Some new supervised and unsupervised methodologies have been developed for effective mining of the text data after successfully overcoming some limitations of the existing techniques.The problems of unsupervised techniques of text mining, i.e., document clustering methods are addressed. A new similarity measure between documents has been designed to improve the accuracy of measuring the content similarity between documents. Further, a hierarchical document clustering technique is designed using this similarity measure. The main significance of the clustering algorithm is that the number …
Local Selection Of Features And Its Applications To Image Search And Annotation, Jichao Sun
Local Selection Of Features And Its Applications To Image Search And Annotation, Jichao Sun
Dissertations
In multimedia applications, direct representations of data objects typically involve hundreds or thousands of features. Given a query object, the similarity between the query object and a database object can be computed as the distance between their feature vectors. The neighborhood of the query object consists of those database objects that are close to the query object. The semantic quality of the neighborhood, which can be measured as the proportion of neighboring objects that share the same class label as the query object, is crucial for many applications, such as content-based image retrieval and automated image annotation. However, due to …
Predictive Analytics For Disease Condition Of Patients In Emergency Department, Azade Tabaie
Predictive Analytics For Disease Condition Of Patients In Emergency Department, Azade Tabaie
Wayne State University Theses
Emergency Departments (EDs) in hospitals are experiencing severe crowding and prolonged patient waiting times. The reported crowding in hospitals shows patients in hospital hallways, long waiting times and full occupancy of ED beds. ED crowding has several potential unfavorable effects including patients and staff frustration, lower patient satisfaction and poor health outcomes. The primary motivations behind this study are shortening the patients’ waiting time and improving patient satisfaction and level of care.
The very initial interaction between clinicians and a patient is recorded on nurse triage notes which contain details of the reason for patient’s visit including specific symptoms and …
Use Of Entropy For Feature Selection With Intrusion Detection System Parameters, Frank Acker
Use Of Entropy For Feature Selection With Intrusion Detection System Parameters, Frank Acker
CCE Theses and Dissertations
The metric of entropy provides a measure about the randomness of data and a measure of information gained by comparing different attributes. Intrusion detection systems can collect very large amounts of data, which are not necessarily manageable by manual means. Collected intrusion detection data often contains redundant, duplicate, and irrelevant entries, which makes analysis computationally intensive likely leading to unreliable results. Reducing the data to what is relevant and pertinent to the analysis requires the use of data mining techniques and statistics. Identifying patterns in the data is part of analysis for intrusion detections in which the patterns are categorized …