Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Machine learning

Theses/Dissertations

2021

Discipline
Institution
Publication

Articles 1 - 25 of 25

Full-Text Articles in Computer Engineering

Statistics-Based Anomaly Detection And Correction Method For Amazon Customer Reviews, Ishani Chatterjee Dec 2021

Statistics-Based Anomaly Detection And Correction Method For Amazon Customer Reviews, Ishani Chatterjee

Dissertations

People nowadays use the Internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source of gathering information for data analytics, sentiment analysis, natural language processing, etc. The most critical challenge is interpreting this data and capturing the sentiment behind these expressions. Sentiment analysis is analyzing, processing, concluding, and inferencing subjective texts with the views. Companies use sentiment analysis to understand public opinions, perform market research, analyze brand reputation, recognize customer experiences, and study social media influence. According to the different needs for aspect granularity, …


On Resource-Efficiency And Performance Optimization In Big Data Computing And Networking Using Machine Learning, Wuji Liu Dec 2021

On Resource-Efficiency And Performance Optimization In Big Data Computing And Networking Using Machine Learning, Wuji Liu

Dissertations

Due to the rapid transition from traditional experiment-based approaches to large-scale, computational intensive simulations, next-generation scientific applications typically involve complex numerical modeling and extreme-scale simulations. Such model-based simulations oftentimes generate colossal amounts of data, which must be transferred over high-performance network (HPN) infrastructures to remote sites and analyzed against experimental or observation data on high-performance computing (HPC) facility. Optimizing the performance of both data transfer in HPN and simulation-based model development on HPC is critical to enabling and accelerating knowledge discovery and scientific innovation. However, such processes generally involve an enormous set of attributes including domain-specific model parameters, network transport …


Detecting Malware In Memory With Memory Object Relationships, Demarcus M. Thomas Sr. Dec 2021

Detecting Malware In Memory With Memory Object Relationships, Demarcus M. Thomas Sr.

Theses and Dissertations

Malware is a growing concern that not only affects large businesses but the basic consumer as well. As a result, there is a need to develop tools that can identify the malicious activities of malware authors. A useful technique to achieve this is memory forensics. Memory forensics is the study of volatile data and its structures in Random Access Memory (RAM). It can be utilized to pinpoint what actions have occurred on a computer system.

This dissertation utilizes memory forensics to extract relationships between objects and supervised machine learning as a novel method for identifying malicious processes in a system …


Network Management, Optimization And Security With Machine Learning Applications In Wireless Networks, Mariam Nabil Dec 2021

Network Management, Optimization And Security With Machine Learning Applications In Wireless Networks, Mariam Nabil

Theses and Dissertations

Wireless communication networks are emerging fast with a lot of challenges and ambitions. Requirements that are expected to be delivered by modern wireless networks are complex, multi-dimensional, and sometimes contradicting. In this thesis, we investigate several types of emerging wireless networks and tackle some challenges of these various networks. We focus on three main challenges. Those are Resource Optimization, Network Management, and Cyber Security. We present multiple views of these three aspects and propose solutions to probable scenarios. The first challenge (Resource Optimization) is studied in Wireless Powered Communication Networks (WPCNs). WPCNs are considered a very promising approach towards sustainable, …


Deepfakes Generated By Generative Adversarial Networks, Olympia A. Paul Nov 2021

Deepfakes Generated By Generative Adversarial Networks, Olympia A. Paul

Honors College Theses

Deep learning is a type of Artificial Intelligence (AI) that mimics the workings of the human brain in processing data such as speech recognition, visual object recognition, object detection, language translation, and making decisions. A Generative adversarial network (GAN) is a special type of deep learning, designed by Goodfellow et al. (2014), which is what we call convolution neural networks (CNN). How a GAN works is that when given a training set, they can generate new data with the same information as the training set, and this is often what we refer to as deep fakes. CNN takes an input …


Benchmarking Small-Dataset Structure-Activity-Relationship Models For Prediction Of Wnt Signaling Inhibition, Mahtab Kokabi Oct 2021

Benchmarking Small-Dataset Structure-Activity-Relationship Models For Prediction Of Wnt Signaling Inhibition, Mahtab Kokabi

Masters Theses

Quantitative structure-activity relationship (QSAR) models based on machine learning algorithms are powerful tools to expedite drug discovery processes and therapeutics development. Given the cost in acquiring large-sized training datasets, it is useful to examine if QSAR analysis can reasonably predict drug activity with only a small-sized dataset (size < 100) and benchmark these small-dataset QSAR models in application-specific studies. To this end, here we present a systematic benchmarking study on small-dataset QSAR models built for prediction of effective Wnt signaling inhibitors, which are essential to therapeutics development in prevalent human diseases (e.g., cancer). Specifically, we examined a total of 72 two-dimensional (2D) QSAR models based on 4 best-performing algorithms, 6 commonly used molecular fingerprints, and 3 typical fingerprint lengths. We trained these models using a training dataset (56 compounds), benchmarked their performance on 4 figures-of-merit (FOMs), and examined their prediction accuracy using an external validation dataset (14 compounds). Our data show that the model performance is maximized when: 1) molecular fingerprints are selected to provide sufficient, unique, and not overly detailed representations of the chemical structures of drug compounds; 2) algorithms are selected to reduce the number of false predictions due to class imbalance in the dataset; and 3) models are selected to reach balanced performance on all 4 FOMs. These results may provide general guidelines in developing high-performance small-dataset QSAR models for drug activity prediction.


Data-Driven Learning For Robot Physical Intelligence, Leidi Zhao Aug 2021

Data-Driven Learning For Robot Physical Intelligence, Leidi Zhao

Dissertations

The physical intelligence, which emphasizes physical capabilities such as dexterous manipulation and dynamic mobility, is essential for robots to physically coexist with humans. Much research on robot physical intelligence has achieved success on hyper robot motor capabilities, but mostly through heavily case-specific engineering. Meanwhile, in terms of robot acquiring skills in a ubiquitous manner, robot learning from human demonstration (LfD) has achieved great progress, but still has limitations handling dynamic skills and compound actions. In this dissertation, a composite learning scheme which goes beyond LfD and integrates robot learning from human definition, demonstration, and evaluation is proposed. This method tackles …


Machine Learning For Analog/Mixed-Signal Integrated Circuit Design Automation, Weidong Cao Aug 2021

Machine Learning For Analog/Mixed-Signal Integrated Circuit Design Automation, Weidong Cao

McKelvey School of Engineering Theses & Dissertations

Analog/mixed-signal (AMS) integrated circuits (ICs) play an essential role in electronic systems by processing analog signals and performing data conversion to bridge the analog physical world and our digital information world.Their ubiquitousness powers diverse applications ranging from smart devices and autonomous cars to crucial infrastructures. Despite such critical importance, conventional design strategies of AMS circuits still follow an expensive and time-consuming manual process and are unable to meet the exponentially-growing productivity demands from industry and satisfy the rapidly-changing design specifications from many emerging applications. Design automation of AMS IC is thus the key to tackling these challenges and has been …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Off-Chain Transaction Routing In Payment Channel Networks: A Machine Learning Approach, Heba Kadry Jun 2021

Off-Chain Transaction Routing In Payment Channel Networks: A Machine Learning Approach, Heba Kadry

Theses and Dissertations

Blockchain is a foundational technology that has the potential to create new prospects for our economic and social systems. However, the scalability problem limits the capability to deliver a target throughput and latency, compared to the traditional financial systems, with increasing workload. Layer-two is a collective term for solutions designed to help solve the scalability by handling transactions off the main chain, also known as layer one. These solutions have the capability to achieve high throughput, fast settlement, and cost efficiency without sacrificing network security. For example, bidirectional payment channels are utilized to allow the execution of fast transactions between …


Data Mining Of Unstructured Textual Information In Transportation Safety Domain: Exploring Methods, Opportunities And Limitations, Keneth Morgan Kwayu Jun 2021

Data Mining Of Unstructured Textual Information In Transportation Safety Domain: Exploring Methods, Opportunities And Limitations, Keneth Morgan Kwayu

Dissertations

The unprecedented increase in volume and influx of structured and unstructured data has overwhelmed conventional data management system capabilities in organizing, analyzing, and procuring useful information in a timely fashion. Structured data sources have a pre-defined pattern that makes data preprocessing and information retrieval tasks relatively easy for the current technologies that have been designed to handle structured and repeatable data. Unlike structured data, unstructured data usually exists in an unorganized format that offers no or little insight unless indexed and stored in an organized fashion. The inherent format of unstructured data exacerbates difficulties in data preprocessing and information extraction. …


Impact Assessment, Detection, And Mitigation Of False Data Attacks In Electrical Power Systems, Sagnik Basumallik May 2021

Impact Assessment, Detection, And Mitigation Of False Data Attacks In Electrical Power Systems, Sagnik Basumallik

Dissertations - ALL

The global energy market has seen a massive increase in investment and capital flow in the last few decades. This has completely transformed the way power grids operate - legacy systems are now being replaced by advanced smart grid infrastructures that attest to better connectivity and increased reliability. One popular example is the extensive deployment of phasor measurement units, which is referred to PMUs, that constantly provide time-synchronized phasor measurements at a high resolution compared to conventional meters. This enables system operators to monitor in real-time the vast electrical network spanning thousands of miles. However, a targeted cyber attack on …


Redai: A Machine Learning Approach To Cyber Threat Intelligence, Luke Noel May 2021

Redai: A Machine Learning Approach To Cyber Threat Intelligence, Luke Noel

Masters Theses, 2020-current

The world is continually demanding more effective and intelligent solutions and strategies to combat adversary groups across the cyber defense landscape. Cyber Threat Intelligence (CTI) is a field within the domain of cyber security that allows for organizations to utilize threat intelligence and serves as a tool for organizations to proactively harden their defense posture. However, there is a large volume of CTI and it is often a daunting task for organizations to effectively consume, utilize, and apply it to their defense strategies. In this thesis we develop a machine learning solution, named RedAI, to investigate whether open-source intelligence (OSINT) …


Human Fatigue Predictions In Complex Aviation Crew Operational Impact Conditions, Suresh Rangan May 2021

Human Fatigue Predictions In Complex Aviation Crew Operational Impact Conditions, Suresh Rangan

Doctoral Dissertations

In this last decade, several regulatory frameworks across the world in all modes of transportation had brought fatigue and its risk management in operations to the forefront. Of all transportation modes air travel has been the safest means of transportation. Still as part of continuous improvement efforts, regulators are insisting the operators to adopt strong fatigue science and its foundational principles to reinforce safety risk assessment and management. Fatigue risk management is a data driven system that finds a realistic balance between safety and productivity in an organization. This work discusses the effects of mathematical modeling of fatigue and its …


Multi-Style Explainable Matrix Factorization Techniques For Recommender Systems., Olurotimi Nugbepo Seton May 2021

Multi-Style Explainable Matrix Factorization Techniques For Recommender Systems., Olurotimi Nugbepo Seton

Electronic Theses and Dissertations

Black-box recommender system models are machine learning models that generate personalized recommendations without explaining how the recommendations were generated to the user or giving them a way to correct wrong assumptions made about them by the model. However, compared to white-box models, which are transparent and scrutable, black-box models are generally more accurate. Recent research has shown that accuracy alone is not sufficient for user satisfaction. One such black-box model is Matrix Factorization, a State of the Art recommendation technique that is widely used due to its ability to deal with sparse data sets and to produce accurate recommendations. Recent …


Machine Learning Approaches For Lung Cancer Diagnosis., Ahmed Mahmoud Ahmed Shaffie May 2021

Machine Learning Approaches For Lung Cancer Diagnosis., Ahmed Mahmoud Ahmed Shaffie

Electronic Theses and Dissertations

The enormity of changes and development in the field of medical imaging technology is hard to fathom, as it does not just represent the technique and process of constructing visual representations of the body from inside for medical analysis and to reveal the internal structure of different organs under the skin, but also it provides a noninvasive way for diagnosis of various disease and suggest an efficient ways to treat them. While data surrounding all of our lives are stored and collected to be ready for analysis by data scientists, medical images are considered a rich source that could provide …


An Inside Vs. Outside Classification System For Wi-Fi Iot Devices, Paul Gralla Apr 2021

An Inside Vs. Outside Classification System For Wi-Fi Iot Devices, Paul Gralla

Dartmouth College Undergraduate Theses

We are entering an era in which Smart Devices are increasingly integrated into our daily lives. Everyday objects are gaining computational power to interact with their environments and communicate with each other and the world via the Internet. While the integration of such devices offers many potential benefits to their users, it also gives rise to a unique set of challenges. One of those challenges is to detect whether a device belongs to one’s own ecosystem, or to a neighbor – or represents an unexpected adversary. An important part of determining whether a device is friend or adversary is to …


A Tiered Recommender System For Cost-Effective Cloud Instance Selection, Xusheng Ai Jan 2021

A Tiered Recommender System For Cost-Effective Cloud Instance Selection, Xusheng Ai

University of the Pacific Theses and Dissertations

Cloud computing has greatly impacted the scientific community and the end users. By leveraging cloud computing, small research institutions and undergraduate colleges are able to alleviate costs and achieve research goals without purchasing and maintaining all the hardware and software. In addition, cloud computing allows researchers to access resources as their teams require and allows real-time collaboration with team members across the globe. Nowadays however, users are easily overwhelmed by the wide range of cloud servers and instances. Due to differences between the cloud server platforms and between instances within the platform, users find it difficult to identify the right …


Iot Malicious Traffic Classification Using Machine Learning, Michael Austin Jan 2021

Iot Malicious Traffic Classification Using Machine Learning, Michael Austin

Graduate Theses, Dissertations, and Problem Reports

Although desktops and laptops have historically composed the bulk of botnet nodes, Internet of Things (IoT) devices have become more recent targets. Lightbulbs, outdoor cameras, watches, and many other small items are connected to WiFi and each other; and few have well-developed security or hardening. Research on botnets typically leverages honeypots, PCAPs, and network traffic analysis tools to develop detection models. The research questions addressed in this Problem Report are: (1) What machine learning algorithm performs the best in a binary classification task for a representative dataset of malicious and benign IoT traffic; and (2) What features have the most …


Unobtrusive Assessment Of Student Engagement Levels In Online Classroom Environment Using Emotion Analysis, Sasirekha Anbusegaran Jan 2021

Unobtrusive Assessment Of Student Engagement Levels In Online Classroom Environment Using Emotion Analysis, Sasirekha Anbusegaran

Electronic Theses and Dissertations

Measuring student engagement has emerged as a significant factor in the process of learning and a good indicator of the knowledge retention capacity of the student. As synchronous online classes have become more prevalent in recent years, gauging a student's attention level is more critical in validating the progress of every student in an online classroom environment. This paper details the study on profiling the student attentiveness to different gradients of engagement level using multiple machine learning models. Results from the high accuracy model and the confidence score obtained from the cloud-based computer vision platform - Amazon Rekognition were then …


Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger Jan 2021

Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger

Browse all Theses and Dissertations

The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network …


Analysis Of Classifier Weaknesses Based On Patterns And Corrective Methods, Nicholas Skapura Jan 2021

Analysis Of Classifier Weaknesses Based On Patterns And Corrective Methods, Nicholas Skapura

Browse all Theses and Dissertations

Classification is an important branch of machine learning that impacts many areas of modern life. Many classification algorithms (classifiers for short) have been developed. They have highly different levels of sophistication and classification accuracy. Classification problems often have highly different levels of hardness and complexity. Practitioners of classification modeling need better understanding of those algorithms in order to select the optimal algorithm for given classification problems. Researchers of classification need new insight on how given classifiers are weak and how they can be improved by correcting their classification errors. This dissertation introduces new tools and concepts to analyze classifier weakness …


Deep Learning For Compressive Sar Imaging With Train-Test Discrepancy, Morgan R. Mccamey Jan 2021

Deep Learning For Compressive Sar Imaging With Train-Test Discrepancy, Morgan R. Mccamey

Browse all Theses and Dissertations

We consider the problem of compressive synthetic aperture radar (SAR) imaging with the goal of reconstructing SAR imagery in the presence of under sampled phase history. While this problem is typically considered in compressive sensing (CS) literature, we consider a variety of deep learning approaches where a deep neural network (DNN) is trained to form SAR imagery from limited data. At the cost of computationally intensive offline training, on-line test-time DNN-SAR has demonstrated orders of magnitude faster reconstruction than standard CS algorithms. A limitation of the DNN approach is that any change to the operating conditions necessitates a costly retraining …


Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger Jan 2021

Texture-Driven Image Clustering In Laser Powder Bed Fusion, Alexander H. Groeger

Browse all Theses and Dissertations

The additive manufacturing (AM) field is striving to identify anomalies in laser powder bed fusion (LPBF) using multi-sensor in-process monitoring paired with machine learning (ML). In-process monitoring can reveal the presence of anomalies but creating a ML classifier requires labeled data. The present work approaches this problem by printing hundreds of Inconel-718 coupons with different processing parameters to capture a wide range of process monitoring imagery with multiple sensor types. Afterwards, the process monitoring images are encoded into feature vectors and clustered to isolate groups in each sensor modality. Four texture representations were learned by training two convolutional neural network …


Determination Of Hydrogel Degradation By Passive Mechanical Testing, Avery Rosh-Gorsky Jan 2021

Determination Of Hydrogel Degradation By Passive Mechanical Testing, Avery Rosh-Gorsky

Honors Theses

This paper details a new technique to measure the mechanical properties of ETTMP PEGDA hydrogels using Hertz Contact Theory and simultaneously analyze both the model drug release and gel erosion in situ. This method involves curing a drug loaded hydrogel in a standard cuvette and placing a glass bead and phosphate buffer solution (PBS). Over time, the cross-linked network of the hydrogel breaks down, and, as a result, the ball sinks into the hydrogel. This method provides a macroscopic and inexpensive way to continuously and passively measure properties of the hydrogel as the hydrogel degrades. By plotting both the …