Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Machine learning

Information Security

Theses/Dissertations

Articles 1 - 30 of 34

Full-Text Articles in Physical Sciences and Mathematics

Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna Nov 2023

Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna

Doctoral Dissertations

Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to …


Quantifying And Enhancing The Security Of Federated Learning, Virat Vishnu Shejwalkar Nov 2023

Quantifying And Enhancing The Security Of Federated Learning, Virat Vishnu Shejwalkar

Doctoral Dissertations

Federated learning is an emerging distributed learning paradigm that allows multiple users to collaboratively train a joint machine learning model without having to share their private data with any third party. Due to many of its attractive properties, federated learning has received significant attention from academia as well as industry and now powers major applications, e.g., Google's Gboard and Assistant, Apple's Siri, Owkin's health diagnostics, etc. However, federated learning is yet to see widespread adoption due to a number of challenges. One such challenge is its susceptibility to poisoning by malicious users who aim to manipulate the joint machine learning …


Cyber Attack Surface Mapping For Offensive Security Testing, Douglas Everson Aug 2023

Cyber Attack Surface Mapping For Offensive Security Testing, Douglas Everson

All Dissertations

Security testing consists of automated processes, like Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST), as well as manual offensive security testing, like Penetration Testing and Red Teaming. This nonautomated testing is frequently time-constrained and difficult to scale. Previous literature suggests that most research is spent in support of improving fully automated processes or in finding specific vulnerabilities, with little time spent improving the interpretation of the scanned attack surface critical to nonautomated testing. In this work, agglomerative hierarchical clustering is used to compress the Internet-facing hosts of 13 representative companies as collected by the Shodan search …


On Phishing: Proposing A Host-Based Multi-Layer Passive/Active Anti-Phishing Approach Combating Counterfeit Websites, Wesam Harbi Fadheel Aug 2023

On Phishing: Proposing A Host-Based Multi-Layer Passive/Active Anti-Phishing Approach Combating Counterfeit Websites, Wesam Harbi Fadheel

Dissertations

Phishing is the starting point of most cyberattacks, mainly categorized as Email, Websites, Social Networks, Phone calls (Vishing), and SMS messaging (Smishing). Phishing refers to an attempt to collect sensitive data, typically in the form of usernames, passwords, credit card numbers, bank account information, etc., or other crucial facts, intending to use or sell the information obtained. Similar to how a fisherman uses bait to catch a fish, an attacker will pose as a trustworthy source to attract and deceive the victim.

This study explores the efficacy of host-side APT (Anti-Phishing Techniques) based onWebsite features like Lexical, Host-Based, or Content-Based …


Unlocking User Identity: A Study On Mouse Dynamics In Dual Gaming Environments For Continuous Authentication, Marcho Setiawan Handoko Jan 2023

Unlocking User Identity: A Study On Mouse Dynamics In Dual Gaming Environments For Continuous Authentication, Marcho Setiawan Handoko

All Graduate Theses, Dissertations, and Other Capstone Projects

With the surge in information management technology reliance and the looming presence of cyber threats, user authentication has become paramount in computer security. Traditional static or one-time authentication has its limitations, prompting the emergence of continuous authentication as a frontline approach for enhanced security. Continuous authentication taps into behavior-based metrics for ongoing user identity validation, predominantly utilizing machine learning techniques to continually model user behaviors. This study elucidates the potential of mouse movement dynamics as a key metric for continuous authentication. By examining mouse movement patterns across two contrasting gaming scenarios - the high-intensity "Team Fortress" and the low-intensity strategic …


Divide-And-Conquer Distributed Learning: Privacy-Preserving Offloading Of Neural Network Computations, Lewis C.L. Brown Dec 2022

Divide-And-Conquer Distributed Learning: Privacy-Preserving Offloading Of Neural Network Computations, Lewis C.L. Brown

Graduate Theses and Dissertations

Machine learning has become a highly utilized technology to perform decision making on high dimensional data. As dataset sizes have become increasingly large so too have the neural networks to learn the complex patterns hidden within. This expansion has continued to the degree that it may be infeasible to train a model from a singular device due to computational or memory limitations of underlying hardware. Purpose built computing clusters for training large models are commonplace while access to networks of heterogeneous devices is still typically more accessible. In addition, with the rise of 5G networks, computation at the edge becoming …


Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo May 2022

Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo

Senior Honors Papers / Undergraduate Theses

Supervised machine learning suffers from the ``garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation …


A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo Jan 2022

A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo

Theses, Dissertations and Capstones

Cyberattack is a never-ending war that has greatly threatened secured information systems. The development of automated and intelligent systems provides more computing power to hackers to steal information, destroy data or system resources, and has raised global security issues. Statistical and Data mining tools have received continuous research and improvements. These tools have been adopted to create sophisticated intrusion detection systems that help information systems mitigate and defend against cyberattacks. However, the advancement in technology and accessibility of information makes more identifiable elements that can be used to gain unauthorized access to systems and resources. Data mining and classification tools …


Faking Sensor Noise Information, Justin Chang Jan 2022

Faking Sensor Noise Information, Justin Chang

Master's Projects

Noise residue detection in digital images has recently been used as a method to classify images based on source camera model type. The meteoric rise in the popularity of using Neural Network models has also been used in conjunction with the concept of noise residuals to classify source camera models. However, many papers gloss over the details on the methods of obtaining noise residuals and instead rely on the self- learning aspect of deep neural networks to implicitly discover this themselves. For this project I propose a method of obtaining noise residuals (“noiseprints”) and denoising an image, as well as …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Achieving Differential Privacy And Fairness In Machine Learning, Depeng Xu May 2021

Achieving Differential Privacy And Fairness In Machine Learning, Depeng Xu

Graduate Theses and Dissertations

Machine learning algorithms are used to make decisions in various applications, such as recruiting, lending and policing. These algorithms rely on large amounts of sensitive individual information to work properly. Hence, there are sociological concerns about machine learning algorithms on matters like privacy and fairness. Currently, many studies only focus on protecting individual privacy or ensuring fairness of algorithms separately without taking consideration of their connection. However, there are new challenges arising in privacy preserving and fairness-aware machine learning. On one hand, there is fairness within the private model, i.e., how to meet both privacy and fairness requirements simultaneously in …


A Methodology For Detecting Credit Card Fraud, Kayode Ayorinde Jan 2021

A Methodology For Detecting Credit Card Fraud, Kayode Ayorinde

All Graduate Theses, Dissertations, and Other Capstone Projects

Fraud detection has appertained to many industries such as banking, retails, financial services, healthcare, etc. As we know, fraud detection is a set of campaigns undertaken to avert the acquisition of illegal means to obtain money or property under false pretense. With an unlimited and growing number of ways fraudsters commit fraud crimes, detecting online fraud was so tricky to achieve. This research work aims to examine feasible ways to identify credit card fraudulent activities that negatively impact financial institutes. In the United States, an average of U.S consumers lost a median of $429 from credit card fraud in 2017, …


Improving A Wireless Localization System Via Machine Learning Techniques And Security Protocols, Zachary Yorio Dec 2020

Improving A Wireless Localization System Via Machine Learning Techniques And Security Protocols, Zachary Yorio

Masters Theses, 2020-current

The recent advancements made in Internet of Things (IoT) devices have brought forth new opportunities for technologies and systems to be integrated into our everyday life. In this work, we investigate how edge nodes can effectively utilize 802.11 wireless beacon frames being broadcast from pre-existing access points in a building to achieve room-level localization. We explain the needed hardware and software for this system and demonstrate a proof of concept with experimental data analysis. Improvements to localization accuracy are shown via machine learning by implementing the random forest algorithm. Using this algorithm, historical data can train the model and make …


The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung Jul 2020

The Limits Of Location Privacy In Mobile Devices, Keen Yuun Sung

Doctoral Dissertations

Mobile phones are widely adopted by users across the world today. However, the privacy implications of persistent connectivity are not well understood. This dissertation focuses on one important concern of mobile phone users: location privacy. I approach this problem from the perspective of three adversaries that users are exposed to via smartphone apps: the mobile advertiser, the app developer, and the cellular service provider. First, I quantify the proportion of mobile users who use location permissive apps and are able to be tracked through their advertising identifier, and demonstrate a mark and recapture attack that allows continued tracking of users …


Superb: Superior Behavior-Based Anomaly Detection Defining Authorized Users' Traffic Patterns, Daniel Karasek May 2020

Superb: Superior Behavior-Based Anomaly Detection Defining Authorized Users' Traffic Patterns, Daniel Karasek

Master of Science in Computer Science Theses

Network anomalies are correlated to activities that deviate from regular behavior patterns in a network, and they are undetectable until their actions are defined as malicious. Current work in network anomaly detection includes network-based and host-based intrusion detection systems. However, network anomaly detection schemes can suffer from high false detection rates due to the base rate fallacy. When the detection rate is less than the false positive rate, which is found in network anomaly detection schemes working with live data, a high false detection rate can occur. To overcome such a drawback, this paper proposes a superior behavior-based anomaly detection …


Data Mining Of Chinese Social Networks: Factors That Indicate Post Deletion, Meisam Navaki Arefi Mar 2020

Data Mining Of Chinese Social Networks: Factors That Indicate Post Deletion, Meisam Navaki Arefi

Computer Science ETDs

Widespread Chinese social media applications such as Sina Weibo (Chinese Twitter), the most popular social network in China, are widely known for monitoring and deleting posts to conform to Chinese government requirements. Censorship of Chinese social media is a complex process that involves many factors. There are multiple stakeholders and many different interests: economic, political, legal, personal, etc., which means that there is not a single strategy dictated by a single government authority. Moreover, sometimes Chinese social media do not follow the directives of government, out of concern that they are more strictly censoring than their competitors.

One crucial question …


Countering Cybersecurity Vulnerabilities In The Power System, Fengli Zhang Dec 2019

Countering Cybersecurity Vulnerabilities In The Power System, Fengli Zhang

Graduate Theses and Dissertations

Security vulnerabilities in software pose an important threat to power grid security, which can be exploited by attackers if not properly addressed. Every month, many vulnerabilities are discovered and all the vulnerabilities must be remediated in a timely manner to reduce the chance of being exploited by attackers. In current practice, security operators have to manually analyze each vulnerability present in their assets and determine the remediation actions in a short time period, which involves a tremendous amount of human resources for electric utilities. To solve this problem, we propose a machine learning-based automation framework to automate vulnerability analysis and …


Intelligent Log Analysis For Anomaly Detection, Steven Yen May 2019

Intelligent Log Analysis For Anomaly Detection, Steven Yen

Master's Projects

Computer logs are a rich source of information that can be analyzed to detect various issues. The large volumes of logs limit the effectiveness of manual approaches to log analysis. The earliest automated log analysis tools take a rule-based approach, which can only detect known issues with existing rules. On the other hand, anomaly detection approaches can detect new or unknown issues. This is achieved by looking for unusual behavior different from the norm, often utilizing machine learning (ML) or deep learning (DL) models. In this project, we evaluated various ML and DL techniques used for log anomaly detection. We …


Machine Learning Versus Deep Learning For Malware Detection, Parth Jain May 2019

Machine Learning Versus Deep Learning For Malware Detection, Parth Jain

Master's Projects

It is often claimed that the primary advantage of deep learning is that such models can continue to learn as more data is available, provided that sufficient computing power is available for training. In contrast, for other forms of machine learning it is claimed that models ‘‘saturate,’’ in the sense that no additional learning can occur beyond some point, regardless of the amount of data or computing power available. In this research, we compare the accuracy of deep learning to other forms of machine learning for malware detection, as a function of the training dataset size. We experiment with a …


Multifamily Malware Models, Samanvitha Basole May 2019

Multifamily Malware Models, Samanvitha Basole

Master's Projects

When training a machine learning model, there is likely to be a tradeoff between the accuracy of the model and the generality of the dataset. Previous research has shown that if we train a model to detect one specific malware family, we obtain stronger results as compared to a case where we train a single model on multiple diverse families. During the detection phase, it would be more efficient to have a single model that could detect multiple families, rather than having to score each sample against multiple models. In this research, we conduct experiments to quantify the relationship between …


Evaluating Machine Learning Techniques For Smart Home Device Classification, Angelito E. Aragon Jr. Mar 2019

Evaluating Machine Learning Techniques For Smart Home Device Classification, Angelito E. Aragon Jr.

Theses and Dissertations

Smart devices in the Internet of Things (IoT) have transformed the management of personal and industrial spaces. Leveraging inexpensive computing, smart devices enable remote sensing and automated control over a diverse range of processes. Even as IoT devices provide numerous benefits, it is vital that their emerging security implications are studied. IoT device design typically focuses on cost efficiency and time to market, leading to limited built-in encryption, questionable supply chains, and poor data security. In a 2017 report, the United States Government Accountability Office recommended that the Department of Defense investigate the risks IoT devices pose to operations security, …


Confidence Inference In Defensive Cyber Operator Decision Making, Graig S. Ganitano Mar 2019

Confidence Inference In Defensive Cyber Operator Decision Making, Graig S. Ganitano

Theses and Dissertations

Cyber defense analysts face the challenge of validating machine generated alerts regarding network-based security threats. Operations tempo and systematic manpower issues have increased the importance of these individual analyst decisions, since they typically are not reviewed or changed. Analysts may not always be confident in their decisions. If confidence can be accurately assessed, then analyst decisions made under low confidence can be independently reviewed and analysts can be offered decision assistance or additional training. This work investigates the utility of using neurophysiological and behavioral correlates of decision confidence to train machine learning models to infer confidence in analyst decisions. Electroencephalography …


The Benefits Of Artificial Intelligence In Cybersecurity, Ricardo Calderon Jan 2019

The Benefits Of Artificial Intelligence In Cybersecurity, Ricardo Calderon

Economic Crime Forensics Capstones

Cyberthreats have increased extensively during the last decade. Cybercriminals have become more sophisticated. Current security controls are not enough to defend networks from the number of highly skilled cybercriminals. Cybercriminals have learned how to evade the most sophisticated tools, such as Intrusion Detection and Prevention Systems (IDPS), and botnets are almost invisible to current tools. Fortunately, the application of Artificial Intelligence (AI) may increase the detection rate of IDPS systems, and Machine Learning (ML) techniques are able to mine data to detect botnets’ sources. However, the implementation of AI may bring other risks, and cybersecurity experts need to find a …


Learning-Based Analysis On The Exploitability Of Security Vulnerabilities, Adam Bliss Dec 2018

Learning-Based Analysis On The Exploitability Of Security Vulnerabilities, Adam Bliss

Computer Science and Computer Engineering Undergraduate Honors Theses

The purpose of this thesis is to develop a tool that uses machine learning techniques to make predictions about whether or not a given vulnerability will be exploited. Such a tool could help organizations such as electric utilities to prioritize their security patching operations. Three different models, based on a deep neural network, a random forest, and a support vector machine respectively, are designed and implemented. Training data for these models is compiled from a variety of sources, including the National Vulnerability Database published by NIST and the Exploit Database published by Offensive Security. Extensive experiments are conducted, including testing …


Malware Image Classification Using Machine Learning With Local Binary Pattern, Jhu-Sin Luo, Dan Lo May 2018

Malware Image Classification Using Machine Learning With Local Binary Pattern, Jhu-Sin Luo, Dan Lo

Master of Science in Computer Science Theses

Malware classification is a critical part in the cybersecurity.

Traditional methodologies for the malware classification

typically use static analysis and dynamic analysis to identify malware.

In this paper, a malware classification methodology based

on its binary image and extracting local binary pattern (LBP)

features are proposed. First, malware images are reorganized into

3 by 3 grids which is mainly used to extract LBP feature. Second,

the LBP is implemented on the malware images to extract features

in that it is useful in pattern or texture classification. Finally,

Tensorflow, a library for machine learning, is applied to classify

malware images with …


Applying Machine Learning To Advance Cyber Security: Network Based Intrusion Detection Systems, Hassan Hadi Latheeth Al-Maksousy Jan 2018

Applying Machine Learning To Advance Cyber Security: Network Based Intrusion Detection Systems, Hassan Hadi Latheeth Al-Maksousy

Computer Science Theses & Dissertations

Many new devices, such as phones and tablets as well as traditional computer systems, rely on wireless connections to the Internet and are susceptible to attacks. Two important types of attacks are the use of malware and exploiting Internet protocol vulnerabilities in devices and network systems. These attacks form a threat on many levels and therefore any approach to dealing with these nefarious attacks will take several methods to counter. In this research, we utilize machine learning to detect and classify malware, visualize, detect and classify worms, as well as detect deauthentication attacks, a form of Denial of Service (DoS). …


Dynamic Adversarial Mining - Effectively Applying Machine Learning In Adversarial Non-Stationary Environments., Tegjyot Singh Sethi Aug 2017

Dynamic Adversarial Mining - Effectively Applying Machine Learning In Adversarial Non-Stationary Environments., Tegjyot Singh Sethi

Electronic Theses and Dissertations

While understanding of machine learning and data mining is still in its budding stages, the engineering applications of the same has found immense acceptance and success. Cybersecurity applications such as intrusion detection systems, spam filtering, and CAPTCHA authentication, have all begun adopting machine learning as a viable technique to deal with large scale adversarial activity. However, the naive usage of machine learning in an adversarial setting is prone to reverse engineering and evasion attacks, as most of these techniques were designed primarily for a static setting. The security domain is a dynamic landscape, with an ongoing never ending arms race …


Problems In Graph-Structured Modeling And Learning, James Atwood Jul 2017

Problems In Graph-Structured Modeling And Learning, James Atwood

Doctoral Dissertations

This thesis investigates three problems in graph-structured modeling and learning. We first present a method for efficiently generating large instances from nonlinear preferential attachment models of network structure. This is followed by a description of diffusion-convolutional neural networks, a new model for graph-structured data which is able to outperform probabilistic relational models and kernel-on-graph methods at node classification tasks. We conclude with an optimal privacy-protection method for users of online services that remains effective when users have poor knowledge of an adversary's behavior.


Image Spam Detection, Aneri Chavda May 2017

Image Spam Detection, Aneri Chavda

Master's Projects

Email is one of the most common forms of digital communication. Spam can be de ned as unsolicited bulk email, while image spam includes spam text embedded inside images. Image spam is used by spammers so as to evade text-based spam lters and hence it poses a threat to email based communication. In this research, we analyze image spam detection methods based on various combinations of image processing and machine learning techniques.


Malware Detection Using The Index Of Coincidence, Bhavna Gurnani Jan 2017

Malware Detection Using The Index Of Coincidence, Bhavna Gurnani

Master's Projects

In this research, we apply the Index of Coincidence (IC) to problems in malware analysis. The IC, which is often used in cryptanalysis of classic ciphers, is a technique for measuring the repeat rate in a string of symbols. A score based on the IC is applied to a variety of challenging malware families. We nd that this relatively simple IC score performs surprisingly well, with superior results in comparison to various machine learning based scores, at least in some cases.