Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Southern Methodist University

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 84

Full-Text Articles in Computer Sciences

Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma May 2024

Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma

SMU Data Science Review

This paper explores the intricate challenges log files pose from data science and machine learning perspectives. Drawing inspiration from existing methods, LAnoBERT, PULL, LLMs, and the breadth of recent research, this paper aims to push the boundaries of machine learning for log file systems. Our study comprehensively examines the unique challenges presented in our problem setup, delineates the limitations of existing methods, and introduces innovative solutions. These contributions are organized to offer valuable insights, predictions, and actionable recommendations tailored for Microsoft's engineers working on log data analysis.


Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn May 2024

Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn

SMU Data Science Review

As the digital music landscape continues to expand, the need for effective methods to understand and contextualize the diverse genres of lyrical content becomes increasingly critical. This research focuses on the application of transformer models in the domain of music analysis, specifically in the task of lyric genre classification. By leveraging the advanced capabilities of transformer architectures, this project aims to capture intricate linguistic nuances within song lyrics, thereby enhancing the accuracy and efficiency of genre classification. The relevance of this project lies in its potential to contribute to the development of automated systems for music recommendation and genre-based playlist …


Context Aware Music Recommendation And Playlist Generation, Elias Mann May 2024

Context Aware Music Recommendation And Playlist Generation, Elias Mann

SMU Journal of Undergraduate Research

There are many reasons people listen to music, and the type of music is largely determined by what the listener may be doing while they listen. For example, one may listen to one type of music while commuting, another while exercising, and yet another while relaxing. Without access to the physiological state of the user, current music recommendation methods rely on collaborative filtering - recommending music based on what other similar users listen to - and content based filtering - recommending songs based on their similarities to songs the user already prefers. With the rise in popularity of smart devices …


Predicting Biomolecular Properties And Interactions Using Numerical, Statistical And Machine Learning Methods, Elyssa Sliheet Apr 2024

Predicting Biomolecular Properties And Interactions Using Numerical, Statistical And Machine Learning Methods, Elyssa Sliheet

Mathematics Theses and Dissertations

We investigate machine learning and electrostatic methods to predict biophysical properties of proteins, such as solvation energy and protein ligand binding affinity, for the purpose of drug discovery/development. We focus on the Poisson-Boltzmann model and various high performance computing considerations such as parallelization schemes.


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam Dec 2023

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …


Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater Aug 2023

Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater

SMU Data Science Review

Static and dynamic analyses are the two primary approaches to analyzing malicious applications. The primary distinction between the two is that the application is analyzed without execution in static analysis, whereas the dynamic approach executes the malware and records the behavior exhibited during execution. Although each approach has advantages and disadvantages, dynamic analysis has been more widely accepted and utilized by the research community whereas static analysis has not seen the same attention. This study aims to apply advancements in static analysis techniques to demonstrate the identification of fine-grained functionality, and show, through clustering, how malicious applications may be grouped …


Visualized Algorithm Engineering On Two Graph Partitioning Problems, Zizhen Chen May 2023

Visualized Algorithm Engineering On Two Graph Partitioning Problems, Zizhen Chen

Computer Science and Engineering Theses and Dissertations

Concepts of graph theory are frequently used by computer scientists as abstractions when modeling a problem. Partitioning a graph (or a network) into smaller parts is one of the fundamental algorithmic operations that plays a key role in classifying and clustering. Since the early 1970s, graph partitioning rapidly expanded for applications in wide areas. It applies in both engineering applications, as well as research. Current technology generates massive data (“Big Data”) from business interactions and social exchanges, so high-performance algorithms of partitioning graphs are a critical need.

This dissertation presents engineering models for two graph partitioning problems arising from completely …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Data Leakage In Isolated Virtualized Enterprise Computing Systems, Zechariah D.J. Wolf Apr 2023

Data Leakage In Isolated Virtualized Enterprise Computing Systems, Zechariah D.J. Wolf

Computer Science and Engineering Theses and Dissertations

Virtualization and cloud computing have become critical parts of modern enterprise computing infrastructure. One of the benefits of using cloud infrastructure over in-house computing infrastructure is the offloading of security responsibilities. By hosting one’s services on the cloud, the responsibility for the security of the infrastructure is transferred to a trusted third party. As such, security of customer data in cloud environments is of critical importance. Side channels and covert channels have proven to be dangerous avenues for the leakage of sensitive information from computing systems. In this work, we propose and perform two experiments to investigate side and covert …


Practical Implementation Of The Immersed Interface Method With Triangular Meshes For 3d Rigid Solids In A Fluid Flow, Norah Hakami Apr 2023

Practical Implementation Of The Immersed Interface Method With Triangular Meshes For 3d Rigid Solids In A Fluid Flow, Norah Hakami

Mathematics Theses and Dissertations

When employing the immersed interface method (IIM) to simulate a fluid flow around a moving rigid object, the immersed object can be replaced by a virtual fluid enclosed by singular forces on the interface between the real and virtual fluids. These forces represent the impact of the rigid motion on the fluid flow and cause jump discontinuities across the interface in the whole flow field. Then, the IIM resolves the fluid flow on a fixed computational domain by directly incorporating the jump conditions across the interface into numerical schemes. Previous development of the method is limited to simple smooth boundaries. …


Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba Mar 2023

Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba

SMU Data Science Review

Non-Fungible Tokens (NFTs) enable ownership and transfer of digital assets using blockchain technology. As a relatively new financial asset class, NFTs lack robust oversight and regulations. These conditions create an environment that is susceptible to fraudulent activity and market manipulation schemes. This study examines the buyer-seller network transactional data from some of the most popular NFT marketplaces (e.g., AtomicHub, OpenSea) to identify and predict fraudulent activity. To accomplish this goal multiple features such as price, volume, and network metrics were extracted from NFT transactional data. These were fed into a Multiple-Scale Convolutional Neural Network that predicts suspected fraudulent activity based …


Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury Sep 2022

Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury

SMU Data Science Review

Phishing emails are a primary mode of entry for attackers into an organization. A successful phishing attempt leads to unauthorized access to sensitive information and systems. However, automatically identifying phishing emails is often difficult since many phishing emails have composite features such as body text and metadata that are nearly indistinguishable from valid emails. This paper presents a novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails. The framework analyses each composite feature independently utilizing a multi-faceted approach using Natural Language …


Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed Sep 2022

Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed

SMU Data Science Review

For English teachers and students who are dissatisfied with the one-size-fits-all approach of current Automated Essay Scoring (AES) systems, this research uses Natural Language Processing (NLP) techniques that provide a focus on configurability and interpretability. Unlike traditional AES models which are designed to provide an overall score based on pre-trained criteria, this tool allows teachers to tailor feedback based upon specific focus areas. The tool implements a user-interface that serves as a customizable rubric. Students’ essays are inputted into the tool either by the student or by the teacher via the application’s user-interface. Based on the rubric settings, the tool …


Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel Sep 2022

Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel

SMU Data Science Review

Since the pandemic started, researchers have been trying to find a way to detect COVID-19 which is a cost-effective, fast, and reliable way to keep the economy viable and running. This research details how chest X-ray radiography can be utilized to detect the infection. This can be for implementation in Airports, Schools, and places of business. Currently, Chest imaging is not a first-line test for COVID-19 due to low diagnostic accuracy and confounding with other viral pneumonia. Different pre-trained algorithms were fine-tuned and applied to the images to train the model and the best model obtained was fine-tuned InceptionV3 model …


Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler Sep 2022

Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler

SMU Data Science Review

Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …


Human Trafficking And Machine Learning: A Data Pipeline From Law Agencies To Research Groups, Nathaniel Hites May 2022

Human Trafficking And Machine Learning: A Data Pipeline From Law Agencies To Research Groups, Nathaniel Hites

Computer Science and Engineering Theses and Dissertations

Human trafficking is a form of modern-day slavery that, while highly illegal, is more dangerous with the advancements of modern technology (such as the Internet), which allows such a practice to spread more easily and quickly all over the world. While the number of victims of human trafficking is large (according to non-profit organization Safe House, there are estimated to be about 20.5 million human trafficking victims, worldwide (“Human Trafficking Statistics & Facts.” Safe Horizon)- co-erced or manipulated by traffickers into either forced labor, or sexual exploitation and encounters), the number of heard cases is proportionally low- several thousand successful …


Real-Time Voice Biometric Speaker Verification, Inderbir Dhillon, Jason Rupp, Aniketh Vankina, Robert Slater Dec 2021

Real-Time Voice Biometric Speaker Verification, Inderbir Dhillon, Jason Rupp, Aniketh Vankina, Robert Slater

SMU Data Science Review

Abstract. Automated speaker verification has been an area of increased research in the last few years, with a special interest in metric learning approaches that compute distances between speaker voiceprints. In this paper, three metric learning systems are built and compared in a one-shot speaker verification task using contrastive max-margin loss, triplet loss, and quadruplet loss. For all the models, spectrograms are created from speaker audio. Convolutional Neural Network embedding layers are trained to produce compact voiceprints that allow users to be distinguished using distance calculations. Performances of the three models were similar, but the model with the best EER …


A Fast Method For Computing Volume Potentials In The Galerkin Boundary Element Method In 3d Geometries, Sasan Mohyaddin Aug 2021

A Fast Method For Computing Volume Potentials In The Galerkin Boundary Element Method In 3d Geometries, Sasan Mohyaddin

Mathematics Theses and Dissertations

We discuss how the Fast Multipole Method (FMM) applied to a boundary concentrated mesh can be used to evaluate volume potentials that arise in the boundary element method. If $h$ is the meshwidth near the boundary, then the algorithm can compute the potential in nearly $\Ord(h^{-2})$ operations while maintaining an $\Ord(h^p)$ convergence of the error. The effectiveness of the algorithms are demonstrated by solving boundary integral equations of the Poisson equation.


Fast Multipole Methods For Wave And Charge Source Interactions In Layered Media And Deep Neural Network Algorithms For High-Dimensional Pdes, Wenzhong Zhang Aug 2021

Fast Multipole Methods For Wave And Charge Source Interactions In Layered Media And Deep Neural Network Algorithms For High-Dimensional Pdes, Wenzhong Zhang

Mathematics Theses and Dissertations

In this dissertation, we develop fast algorithms for large scale numerical computations, including the fast multipole method (FMM) in layered media, and the forward-backward stochastic differential equation (FBSDE) based deep neural network (DNN) algorithms for high-dimensional parabolic partial differential equations (PDEs), addressing the issues of real-world challenging computational problems in various computation scenarios.

We develop the FMM in layered media, by first studying analytical and numerical properties of the Green's functions in layered media for the 2-D and 3-D Helmholtz equation, the linearized Poisson--Boltzmann equation, the Laplace's equation, and the tensor Green's functions for the time-harmonic Maxwell's equations and the …


Electricity Market Operations With Massive Renewable Integration: New Designs, Shengfei Yin Jul 2021

Electricity Market Operations With Massive Renewable Integration: New Designs, Shengfei Yin

Electrical Engineering Theses and Dissertations

Electricity market has been transitioning from a conventional and deterministic operation to a stochastic operation under the increasing penetration of renewable energy. Industry-level solutions toward the future electricity market operation ask for both accuracy and efficiency while maintaining model interpretability. Hence, reliable stochastic optimization techniques come to the first place for such a complex and dynamic problem.

This work starts at proposing a solution strategy for the uncertainty-based power system planning problem, which acts as a preliminary and instructs the electricity market operation. Considering 100% renewable penetration in the future, it analyzes the cost-effectiveness of renewable energy from a long-term …


High-Order Flexible Multirate Integrators For Multiphysics Applications, Rujeko Chinomona May 2021

High-Order Flexible Multirate Integrators For Multiphysics Applications, Rujeko Chinomona

Mathematics Theses and Dissertations

Traditionally, time integration methods within multiphysics simulations have been chosen to cater to the most restrictive dynamics, sometimes at a great computational cost. Multirate integrators accurately and efficiently solve systems of ordinary differential equations that exhibit different time scales using two or more time steps. In this thesis, we explore three classes of time integrators that can be classified as one-step multi-stage multirate methods for which the slow dynamics are evolved using a traditional one step scheme and the fast dynamics are solved through a sequence of modified initial value problems. Practically, the fast dynamics are subcycled using a small …


Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman May 2021

Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman

SMU Data Science Review

Much progress has been made in text analysis, specifically within the statistical domain of Term Frequency (TF) and Inverse Document Frequency (IDF). However, there is much room for improvement especially within the area of discovering Emerging Trends. Emerging Trend Detection Systems (ETDS) depend on ingesting a collection of textual data and TF/IDF to identify new or up-trending topics within the Corpus. However, the tremendous rate of change and the amount of digital information presents a challenge that makes it almost impossible for a human expert to spot emerging trends without relying on an automated ETD system. Since the U.S. Government …


Multi-Modal Classification Using Images And Text, Stuart J. Miller, Justin Howard, Paul Adams, Mel Schwan, Robert Slater Jan 2021

Multi-Modal Classification Using Images And Text, Stuart J. Miller, Justin Howard, Paul Adams, Mel Schwan, Robert Slater

SMU Data Science Review

This paper proposes a method for the integration of natural language understanding in image classification to improve classification accuracy by making use of associated metadata. Traditionally, only image features have been used in the classification process; however, metadata accompanies images from many sources. This study implemented a multi-modal image classification model that combines convolutional methods with natural language understanding of descriptions, titles, and tags to improve image classification. The novelty of this approach was to learn from additional external features associated with the images using natural language understanding with transfer learning. It was found that the combination of ResNet-50 image …


Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels Jan 2021

Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels

SMU Data Science Review

Understanding diagnostic tests and examining important features of novel coronavirus (COVID-19) infection are essential steps for controlling the current pandemic of 2020. In this paper, we study the relationship between clinical diagnosis and analytical features of patient blood panels from the US, Mexico, and Brazil. Our analysis confirms that among adults, the risk of severe illness from COVID-19 increases with pre-existing conditions such as diabetes and immunosuppression. Although more than eight months into pandemic, more data have become available to indicate that more young adults were getting infected. In addition, we expand on the definition of COVID-19 test and discuss …


Analysis Of Github Pull Requests, Canon Ellis Dec 2020

Analysis Of Github Pull Requests, Canon Ellis

Computer Science and Engineering Theses and Dissertations

The popularity of the software repository site GitHub has created a rise in the Pull Based Development Models' use. An essential portion of pull-based development is the creation of Pull Requests. Pull Requests often have to be reviewed by an individual to be approved and accepted into the Master branch of a software repository. The reviewing process can often be time-consuming and introduce a relatively high level of lost development time. This paper examines thousands of pull requests to understand the most valuable metadata of pull requests. We then introduce metrics in comparing the metadata of pull requests to understand …


Analyzing Performance, Energy Consumption, And Reliability Of Mobile Applications, Osama Barack Dec 2020

Analyzing Performance, Energy Consumption, And Reliability Of Mobile Applications, Osama Barack

Computer Science and Engineering Theses and Dissertations

Mobile applications have become a high priority for software developers. Researchers and practitioners are working toward improving and optimizing the energy efficiency and performance of mobile applications due to the capacity limitation of mobile device processors and batteries. In addition, mobile applications have become popular among end-users, developers have introduced a wide range of features that increase the complexity of application code.

To improve and enhance the maintainability, extensibility, and understandability of application code, refactoring techniques were introduced. However, implementing such techniques to mobile applications affects energy efficiency and performance. To evaluate and categorize software implementation and optimization efficiency, several …


Deep Neural Network Based Student Response Modeling With Uncertainty, Multimodality And Attention, Xinyi Ding Dec 2020

Deep Neural Network Based Student Response Modeling With Uncertainty, Multimodality And Attention, Xinyi Ding

Computer Science and Engineering Theses and Dissertations

In this thesis, I investigate deep neural network based student response modeling, more specifically Knowledge Tracing (KT). Knowledge Tracing allows Intelligent Tutoring Systems to infer which topics or skills a student has mastered, thus adjusting curriculum accordingly. Deep neural network based knowledge tracing models like Deep Knowledge Tracing (DKT) and Dynamic Key-Value Memory Network (DKVMN) have achieved significant improvements compared with conventional probabilistic models. There are mainly two goals in this thesis: 1) To have a better understanding of existing deep neural network based models and their predictions through visualization and through incorporating uncertainties. 2) To improve the performance of …


Multigrid For The Nonlinear Power Flow Equations, Enrique Pereira Batista Dec 2020

Multigrid For The Nonlinear Power Flow Equations, Enrique Pereira Batista

Mathematics Theses and Dissertations

The continuously changing structure of power systems and the inclusion of renewable
energy sources are leading to changes in the dynamics of modern power grid,
which have brought renewed attention to the solution of the AC power flow equations.
In particular, development of fast and robust solvers for the power flow problem
continues to be actively investigated. A novel multigrid technique for coarse-graining
dynamic power grid models has been developed recently. This technique uses an
algebraic multigrid (AMG) coarsening strategy applied to the weighted
graph Laplacian that arises from the power network's topology for the construction
of coarse-grain approximations to …


Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed Sep 2020

Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed

SMU Data Science Review

Music is incorporated into our daily lives whether intentional or unintentional. It evokes responses and behavior so much so there is an entire study dedicated to the psychology of music. Music creates the mood for dancing, exercising, creative thought or even relaxation. It is a powerful tool that can be used in various venues and through advertisements to influence and guide human reactions. Music is also often "borrowed" in the industry today. The practices of sampling and remixing music in the digital age have made cover song identification an active area of research. While most of this research is focused …