Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Physical Sciences and Mathematics

Automated Machine Learning: Intellient Binning Data Preparation And Regularized Regression Classfier, Jianbin Zhu Jan 2023

Automated Machine Learning: Intellient Binning Data Preparation And Regularized Regression Classfier, Jianbin Zhu

Electronic Theses and Dissertations, 2020-

Automated machine learning (AutoML) has become a new trend which is the process of automating the complete pipeline from the raw dataset to the development of machine learning model. It not only can relief data scientists' works but also allows non-experts to finish the jobs without solid knowledge and understanding of statistical inference and machine learning. One limitation of AutoML framework is the data quality differs significantly batch by batch. Consequently, fitted model quality for some batches of data can be very poor due to distribution shift for some numerical predictors. In this dissertation, we develop an intelligent binning to …


Graph Neural Networks For Improved Interpretability And Efficiency, Patrick Pho Jan 2022

Graph Neural Networks For Improved Interpretability And Efficiency, Patrick Pho

Electronic Theses and Dissertations, 2020-

Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation …


Change Point Detection For Streaming Data Using Support Vector Methods, Charles Harrison Jan 2022

Change Point Detection For Streaming Data Using Support Vector Methods, Charles Harrison

Electronic Theses and Dissertations, 2020-

Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound …


An Evaluation Of The Performance Of Proc Arima's Identify Statement: A Data-Driven Approach Using Covid-19 Cases And Deaths In Florida, Fahmida Akter Shahela Jan 2021

An Evaluation Of The Performance Of Proc Arima's Identify Statement: A Data-Driven Approach Using Covid-19 Cases And Deaths In Florida, Fahmida Akter Shahela

Electronic Theses and Dissertations, 2020-

Understanding data on novel coronavirus (COVID-19) pandemic, and modeling such data over time are crucial for decision making at managing, fighting, and controlling the spread of this emerging disease. This thesis work looks at some aspects of exploratory analysis and modeling of COVID-19 data obtained from the Florida Department of Health (FDOH). In particular, the present work is devoted to data collection, preparation, description, and modeling of COVID-19 cases and deaths reported by FDOH between March 12, 2020, and April 30, 2021. For modeling data on both cases and deaths, this thesis utilized an autoregressive integrated moving average (ARIMA) times …


Time Series Forecasting And Analysis: A Study Of American Clothing Retail Sales Data, Weijun Huang Jan 2019

Time Series Forecasting And Analysis: A Study Of American Clothing Retail Sales Data, Weijun Huang

Honors Undergraduate Theses

This paper serves to address the effect of time on the sales of clothing retail, from 2010 to May 2019. The data was retrieved from the US Census, where N=113 observations were used, which were plotted to observe their trends. Once outliers and transformations were performed, the best model was fit, and diagnostic review occurred. Inspections for seasonality and forecasting was also conducted. The final model came out to be an ARIMA (2,0,1). Slight seasonality was present, but not enough to drastically influence the trends. Our results serve to highlight the economic growth of clothing retail sales for the past …


Systematic Review And Meta-Analysis: Tuberculosis, Tnfα Inhibitors, And Crohn's Disease, Brent L. Cao Jan 2018

Systematic Review And Meta-Analysis: Tuberculosis, Tnfα Inhibitors, And Crohn's Disease, Brent L. Cao

Honors Undergraduate Theses

Inflammation is often a protective reaction against harmful foreign agents. However, in many disease conditions, the mechanisms behind the inflammatory response are poorly understood. Often times, the inflammation causes adverse effects, such as joint pain, abdominal pain, fever, fatigue, and loss of appetite. Thus, many treatments aim to inhibit the inflammatory response in order to control adverse symptoms. Such treatments include TNFα inhibitors. However, a major risk associated with drugs inhibiting tumor necrosis factor alpha (TNFα) is serious infection, including tuberculosis (TB).

Anti-TNFα therapy is used to treat patients with Crohn’s disease, for which the risk of tuberculosis may be …


Psychometric Properties Of A Working Memory Span Task, Juan M. Alzate Vanegas Jan 2018

Psychometric Properties Of A Working Memory Span Task, Juan M. Alzate Vanegas

Honors Undergraduate Theses

The intent of this thesis is to examine the psychometric properties of a complex span task (CST) developed to measure working memory capacity (WMC) using measurements obtained from a sample of 68 undergraduate students at the University of Central Florida. The Grocery List Task (GLT) promises several design improvements over traditional CSTs in a prior study about individual differences in WMC and distraction effects on driving performance, and it offers potential benefits for studying WMC as well as the serial-position effect. Currently, the working memory system is composed of domain-general memorial storage processes and information-processing, which involves the use of …


To Hydrate Or Chlorinate: A Regression Analysis Of The Levels Of Chlorine In The Public Water Supply, Drew A. Doyle Dec 2015

To Hydrate Or Chlorinate: A Regression Analysis Of The Levels Of Chlorine In The Public Water Supply, Drew A. Doyle

HIM 1990-2015

Public water supplies contain disease-causing microorganisms in the water or distribution ducts. In order to kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all U.S. water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. In the interest of obtaining a better understanding of what variables affect the levels of chlorine in the water, this thesis will analyze a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water …


A Simulation-Based Task Analysis Using Agent-Based, Discrete Event And System Dynamics Simulation, Anastasia Angelopoulou Jan 2015

A Simulation-Based Task Analysis Using Agent-Based, Discrete Event And System Dynamics Simulation, Anastasia Angelopoulou

Electronic Theses and Dissertations

Recent advances in technology have increased the need for using simulation models to analyze tasks and obtain human performance data. A variety of task analysis approaches and tools have been proposed and developed over the years. Over 100 task analysis methods have been reported in the literature. However, most of the developed methods and tools allow for representation of the static aspects of the tasks performed by expert system-driven human operators, neglecting aspects of the work environment, i.e. physical layout, and dynamic aspects of the task. The use of simulation can help face the new challenges in the field of …


Mahalanobis Kernel-Based Support Vector Data Description For Detection Of Large Shifts In Mean Vector, Vu Nguyen Jan 2015

Mahalanobis Kernel-Based Support Vector Data Description For Detection Of Large Shifts In Mean Vector, Vu Nguyen

Electronic Theses and Dissertations

Statistical process control (SPC) applies the science of statistics to various process control in order to provide higher-quality products and better services. The K chart is one among the many important tools that SPC offers. Creation of the K chart is based on Support Vector Data Description (SVDD), a popular data classifier method inspired by Support Vector Machine (SVM). As any methods associated with SVM, SVDD benefits from a wide variety of choices of kernel, which determines the effectiveness of the whole model. Among the most popular choices is the Euclidean distance-based Gaussian kernel, which enables SVDD to obtain a …


Statistical Analysis Of Depression And Social Support Change In Arab Immigrant Women In Usa, Hazhar Blbas Jan 2014

Statistical Analysis Of Depression And Social Support Change In Arab Immigrant Women In Usa, Hazhar Blbas

Electronic Theses and Dissertations

Arab Muslim immigrant women encounter many stressors and are at risk for depression. Social supports from husbands, family and friends are generally considered mitigating resources for depression. However, changes in social support over time and the effects of such supports on depression at a future time period have not been fully addressed in the literature This thesis investigated the relationship between demographic characteristics, changes in social support, and depression in Arab Muslim immigrant women to the USA. A sample of 454 married Arab Muslim immigrant women provided demographic data, scores on social support variables and depression at three time periods …


How Many Are Out There? A Novel Approach For Open And Closed Systems, Zia Rehman Jan 2014

How Many Are Out There? A Novel Approach For Open And Closed Systems, Zia Rehman

Electronic Theses and Dissertations

We propose a ratio estimator to determine population estimates using capture-recapture sampling. It's different than traditional approaches in the following ways: (1) Ordering of recaptures: Currently data sets do not take into account the "ordering" of the recaptures, although this crucial information is available to them at no cost. (2) Dependence of trials and cluster sampling: Our model explicitly considers trials to be dependent and improves existing literature which assumes independence. (3) Rate of convergence: The percentage sampled has an inverse relationship with population size, for a chosen degree of accuracy. (4) Asymptotic Attainment of Minimum Variance (Open Systems: (=population …


Sparse Ridge Fusion For Linear Regression, Nozad Mahmood Jan 2013

Sparse Ridge Fusion For Linear Regression, Nozad Mahmood

Electronic Theses and Dissertations

For a linear regression, the traditional technique deals with a case where the number of observations n more than the number of predictor variables p (n > p). In the case n < p, the classical method fails to estimate the coefficients. A solution of the problem is the case of correlated predictors is provided in this thesis. A new regularization and variable selection is proposed under the name of Sparse Ridge Fusion (SRF). In the case of highly correlated predictor, the simulated examples and a real data show that the SRF always outperforms the lasso, eleastic net, and the S-Lasso, and the results show that the SRF selects more predictor variables than the sample size n while the maximum selected variables by lasso is n size.


An Analysis Of The Relationship Between Economic Development And Demographic Characteristics In The United States, Chad M. Heyne May 2011

An Analysis Of The Relationship Between Economic Development And Demographic Characteristics In The United States, Chad M. Heyne

HIM 1990-2015

Over the past several decades there has been extensive research done in an attempt to determine what demographic characteristics affect economic growth, measured in GDP per capita. Understanding what influences the growth of a country will vastly help policy makers enact policies to lead the country in a positive direction. This research focuses on isolating a new variable, women in the work force. As well as isolating a new variable, this research will modify a preexisting variable that was shown to be significant in order to make the variable more robust and sensitive to recessions. The intent of this thesis …


Data Mining Methods For Malware Detection, Muazzam Siddiqui Jan 2008

Data Mining Methods For Malware Detection, Muazzam Siddiqui

Electronic Theses and Dissertations

This research investigates the use of data mining methods for malware (malicious programs) detection and proposed a framework as an alternative to the traditional signature detection methods. The traditional approaches using signatures to detect malicious programs fails for the new and unknown malwares case, where signatures are not available. We present a data mining framework to detect malicious programs. We collected, analyzed and processed several thousand malicious and clean programs to find out the best features and build models that can classify a given program into a malware or a clean class. Our research is closely related to information retrieval …


Modeling And Characterizations Of New Notions In Life Testing With Statistical Applications, Mohammad Sepehrifar Jan 2006

Modeling And Characterizations Of New Notions In Life Testing With Statistical Applications, Mohammad Sepehrifar

Electronic Theses and Dissertations

Knowing the class to which a life distribution belongs gives us an idea about the aging of the device or system the life distribution represents, and enables us to compare the aging properties of different systems. This research intends to establish several new nonparametric classes of life distributions defined by the concept of inactivity time of a unit with a guaranteed minimum life length. These classes play an important role in the study of reliability theory, survival analysis, maintenance policies, economics, actuarial sciences and many other applied areas.


Session-Based Intrusion Detection System To Map Anomalous Network Traffic, Bruce Caulkins Jan 2005

Session-Based Intrusion Detection System To Map Anomalous Network Traffic, Bruce Caulkins

Electronic Theses and Dissertations

Computer crime is a large problem (CSI, 2004; Kabay, 2001a; Kabay, 2001b). Security managers have a variety of tools at their disposal -- firewalls, Intrusion Detection Systems (IDSs), encryption, authentication, and other hardware and software solutions to combat computer crime. Many IDS variants exist which allow security managers and engineers to identify attack network packets primarily through the use of signature detection; i.e., the IDS recognizes attack packets due to their well-known "fingerprints" or signatures as those packets cross the network's gateway threshold. On the other hand, anomaly-based ID systems determine what is normal traffic within a network and reports …


A Subset Selection Rule For Three Normal Populations, Bert Culpepper Jul 1982

A Subset Selection Rule For Three Normal Populations, Bert Culpepper

Retrospective Theses and Dissertations

No abstract provided.