Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Automated Machine Learning: Intellient Binning Data Preparation And Regularized Regression Classfier, Jianbin Zhu Jan 2023

Automated Machine Learning: Intellient Binning Data Preparation And Regularized Regression Classfier, Jianbin Zhu

Electronic Theses and Dissertations, 2020-

Automated machine learning (AutoML) has become a new trend which is the process of automating the complete pipeline from the raw dataset to the development of machine learning model. It not only can relief data scientists' works but also allows non-experts to finish the jobs without solid knowledge and understanding of statistical inference and machine learning. One limitation of AutoML framework is the data quality differs significantly batch by batch. Consequently, fitted model quality for some batches of data can be very poor due to distribution shift for some numerical predictors. In this dissertation, we develop an intelligent binning to …


Graph Neural Networks For Improved Interpretability And Efficiency, Patrick Pho Jan 2022

Graph Neural Networks For Improved Interpretability And Efficiency, Patrick Pho

Electronic Theses and Dissertations, 2020-

Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation …


Change Point Detection For Streaming Data Using Support Vector Methods, Charles Harrison Jan 2022

Change Point Detection For Streaming Data Using Support Vector Methods, Charles Harrison

Electronic Theses and Dissertations, 2020-

Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound …


An Evaluation Of The Performance Of Proc Arima's Identify Statement: A Data-Driven Approach Using Covid-19 Cases And Deaths In Florida, Fahmida Akter Shahela Jan 2021

An Evaluation Of The Performance Of Proc Arima's Identify Statement: A Data-Driven Approach Using Covid-19 Cases And Deaths In Florida, Fahmida Akter Shahela

Electronic Theses and Dissertations, 2020-

Understanding data on novel coronavirus (COVID-19) pandemic, and modeling such data over time are crucial for decision making at managing, fighting, and controlling the spread of this emerging disease. This thesis work looks at some aspects of exploratory analysis and modeling of COVID-19 data obtained from the Florida Department of Health (FDOH). In particular, the present work is devoted to data collection, preparation, description, and modeling of COVID-19 cases and deaths reported by FDOH between March 12, 2020, and April 30, 2021. For modeling data on both cases and deaths, this thesis utilized an autoregressive integrated moving average (ARIMA) times …


Time Series Forecasting And Analysis: A Study Of American Clothing Retail Sales Data, Weijun Huang Jan 2019

Time Series Forecasting And Analysis: A Study Of American Clothing Retail Sales Data, Weijun Huang

Honors Undergraduate Theses

This paper serves to address the effect of time on the sales of clothing retail, from 2010 to May 2019. The data was retrieved from the US Census, where N=113 observations were used, which were plotted to observe their trends. Once outliers and transformations were performed, the best model was fit, and diagnostic review occurred. Inspections for seasonality and forecasting was also conducted. The final model came out to be an ARIMA (2,0,1). Slight seasonality was present, but not enough to drastically influence the trends. Our results serve to highlight the economic growth of clothing retail sales for the past …


A Simulation-Based Task Analysis Using Agent-Based, Discrete Event And System Dynamics Simulation, Anastasia Angelopoulou Jan 2015

A Simulation-Based Task Analysis Using Agent-Based, Discrete Event And System Dynamics Simulation, Anastasia Angelopoulou

Electronic Theses and Dissertations

Recent advances in technology have increased the need for using simulation models to analyze tasks and obtain human performance data. A variety of task analysis approaches and tools have been proposed and developed over the years. Over 100 task analysis methods have been reported in the literature. However, most of the developed methods and tools allow for representation of the static aspects of the tasks performed by expert system-driven human operators, neglecting aspects of the work environment, i.e. physical layout, and dynamic aspects of the task. The use of simulation can help face the new challenges in the field of …


Data Mining Methods For Malware Detection, Muazzam Siddiqui Jan 2008

Data Mining Methods For Malware Detection, Muazzam Siddiqui

Electronic Theses and Dissertations

This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval …


Session-Based Intrusion Detection System To Map Anomalous Network Traffic, Bruce Caulkins Jan 2005

Session-Based Intrusion Detection System To Map Anomalous Network Traffic, Bruce Caulkins

Electronic Theses and Dissertations

Computer crime is a large problem (CSI, 2004; Kabay, 2001a; Kabay, 2001b). Security managers have a variety of tools at their disposal -- firewalls, Intrusion Detection Systems (IDSs), encryption, authentication, and other hardware and software solutions to combat computer crime. Many IDS variants exist which allow security managers and engineers to identify attack network packets primarily through the use of signature detection; i.e., the IDS recognizes attack packets due to their well-known "fingerprints" or signatures as those packets cross the network's gateway threshold. On the other hand, anomaly-based ID systems determine what is normal traffic within a network and reports …