Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Statistics and Probability (21)
- Artificial Intelligence and Robotics (20)
- Theory and Algorithms (17)
- Data Science (14)
- Other Computer Sciences (11)
-
- Statistical Models (11)
- Engineering (10)
- Applied Statistics (9)
- Numerical Analysis and Scientific Computing (9)
- Information Security (8)
- Databases and Information Systems (6)
- Business (5)
- Categorical Data Analysis (5)
- Computer Engineering (5)
- Programming Languages and Compilers (5)
- Applied Mathematics (4)
- Social and Behavioral Sciences (4)
- Finance and Financial Management (3)
- Medicine and Health Sciences (3)
- Numerical Analysis and Computation (3)
- Other Computer Engineering (3)
- Probability (3)
- Risk Analysis (3)
- Software Engineering (3)
- Statistical Methodology (3)
- Systems Architecture (3)
- Technology and Innovation (3)
- Digital Communications and Networking (2)
- Keyword
-
- Machine Learning (10)
- NLP (8)
- Data Science (7)
- Deep Learning (7)
- CNN (5)
-
- Classification (5)
- Machine learning (5)
- Neural network (5)
- Deep learning (3)
- NLU (3)
- Neural Networks (3)
- AES (2)
- Cloud Computing (2)
- Clustering (2)
- Computer vision (2)
- Convolutional neural network (2)
- LSTM (2)
- ML (2)
- NLG (2)
- Natural Language Processing (2)
- Natural language processing (2)
- Object Detection (2)
- Pattern (2)
- Prediction (2)
- Random Forest (2)
- Regression (2)
- SSD (2)
- Transaction (2)
- Transfer Learning (2)
- Twitter (2)
Articles 1 - 30 of 50
Full-Text Articles in Computer Sciences
Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma
Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma
SMU Data Science Review
This paper explores the intricate challenges log files pose from data science and machine learning perspectives. Drawing inspiration from existing methods, LAnoBERT, PULL, LLMs, and the breadth of recent research, this paper aims to push the boundaries of machine learning for log file systems. Our study comprehensively examines the unique challenges presented in our problem setup, delineates the limitations of existing methods, and introduces innovative solutions. These contributions are organized to offer valuable insights, predictions, and actionable recommendations tailored for Microsoft's engineers working on log data analysis.
Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn
Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn
SMU Data Science Review
As the digital music landscape continues to expand, the need for effective methods to understand and contextualize the diverse genres of lyrical content becomes increasingly critical. This research focuses on the application of transformer models in the domain of music analysis, specifically in the task of lyric genre classification. By leveraging the advanced capabilities of transformer architectures, this project aims to capture intricate linguistic nuances within song lyrics, thereby enhancing the accuracy and efficiency of genre classification. The relevance of this project lies in its potential to contribute to the development of automated systems for music recommendation and genre-based playlist …
Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam
Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam
SMU Data Science Review
Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …
Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater
Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater
SMU Data Science Review
Static and dynamic analyses are the two primary approaches to analyzing malicious applications. The primary distinction between the two is that the application is analyzed without execution in static analysis, whereas the dynamic approach executes the malware and records the behavior exhibited during execution. Although each approach has advantages and disadvantages, dynamic analysis has been more widely accepted and utilized by the research community whereas static analysis has not seen the same attention. This study aims to apply advancements in static analysis techniques to demonstrate the identification of fine-grained functionality, and show, through clustering, how malicious applications may be grouped …
Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba
Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba
SMU Data Science Review
Non-Fungible Tokens (NFTs) enable ownership and transfer of digital assets using blockchain technology. As a relatively new financial asset class, NFTs lack robust oversight and regulations. These conditions create an environment that is susceptible to fraudulent activity and market manipulation schemes. This study examines the buyer-seller network transactional data from some of the most popular NFT marketplaces (e.g., AtomicHub, OpenSea) to identify and predict fraudulent activity. To accomplish this goal multiple features such as price, volume, and network metrics were extracted from NFT transactional data. These were fed into a Multiple-Scale Convolutional Neural Network that predicts suspected fraudulent activity based …
Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn
Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn
SMU Data Science Review
Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …
Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury
Phishing Detection Using Natural Language Processing And Machine Learning, Apurv Mittal, Dr Daniel Engels, Harsha Kommanapalli, Ravi Sivaraman, Taifur Chowdhury
SMU Data Science Review
Phishing emails are a primary mode of entry for attackers into an organization. A successful phishing attempt leads to unauthorized access to sensitive information and systems. However, automatically identifying phishing emails is often difficult since many phishing emails have composite features such as body text and metadata that are nearly indistinguishable from valid emails. This paper presents a novel machine learning-based framework, the DARTH framework, that characterizes and combines multiple models, with one model for each composite feature, that enables the accurate identification of phishing emails. The framework analyses each composite feature independently utilizing a multi-faceted approach using Natural Language …
Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed
Using Natural Language Processing To Increase Modularity And Interpretability Of Automated Essay Evaluation And Student Feedback, Chris Roche, Nathan Deinlein, Darryl Dawkins, Faizan Javed
SMU Data Science Review
For English teachers and students who are dissatisfied with the one-size-fits-all approach of current Automated Essay Scoring (AES) systems, this research uses Natural Language Processing (NLP) techniques that provide a focus on configurability and interpretability. Unlike traditional AES models which are designed to provide an overall score based on pre-trained criteria, this tool allows teachers to tailor feedback based upon specific focus areas. The tool implements a user-interface that serves as a customizable rubric. Students’ essays are inputted into the tool either by the student or by the teacher via the application’s user-interface. Based on the rubric settings, the tool …
Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel
Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel
SMU Data Science Review
Since the pandemic started, researchers have been trying to find a way to detect COVID-19 which is a cost-effective, fast, and reliable way to keep the economy viable and running. This research details how chest X-ray radiography can be utilized to detect the infection. This can be for implementation in Airports, Schools, and places of business. Currently, Chest imaging is not a first-line test for COVID-19 due to low diagnostic accuracy and confounding with other viral pneumonia. Different pre-trained algorithms were fine-tuned and applied to the images to train the model and the best model obtained was fine-tuned InceptionV3 model …
Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler
Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler
SMU Data Science Review
Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …
Real-Time Voice Biometric Speaker Verification, Inderbir Dhillon, Jason Rupp, Aniketh Vankina, Robert Slater
Real-Time Voice Biometric Speaker Verification, Inderbir Dhillon, Jason Rupp, Aniketh Vankina, Robert Slater
SMU Data Science Review
Abstract. Automated speaker verification has been an area of increased research in the last few years, with a special interest in metric learning approaches that compute distances between speaker voiceprints. In this paper, three metric learning systems are built and compared in a one-shot speaker verification task using contrastive max-margin loss, triplet loss, and quadruplet loss. For all the models, spectrograms are created from speaker audio. Convolutional Neural Network embedding layers are trained to produce compact voiceprints that allow users to be distinguished using distance calculations. Performances of the three models were similar, but the model with the best EER …
Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman
Automated Analysis Of Rfps Using Natural Language Processing (Nlp) For The Technology Domain, Sterling Beason, William Hinton, Yousri A. Salamah, Jordan Salsman
SMU Data Science Review
Much progress has been made in text analysis, specifically within the statistical domain of Term Frequency (TF) and Inverse Document Frequency (IDF). However, there is much room for improvement especially within the area of discovering Emerging Trends. Emerging Trend Detection Systems (ETDS) depend on ingesting a collection of textual data and TF/IDF to identify new or up-trending topics within the Corpus. However, the tremendous rate of change and the amount of digital information presents a challenge that makes it almost impossible for a human expert to spot emerging trends without relying on an automated ETD system. Since the U.S. Government …
Multi-Modal Classification Using Images And Text, Stuart J. Miller, Justin Howard, Paul Adams, Mel Schwan, Robert Slater
Multi-Modal Classification Using Images And Text, Stuart J. Miller, Justin Howard, Paul Adams, Mel Schwan, Robert Slater
SMU Data Science Review
This paper proposes a method for the integration of natural language understanding in image classification to improve classification accuracy by making use of associated metadata. Traditionally, only image features have been used in the classification process; however, metadata accompanies images from many sources. This study implemented a multi-modal image classification model that combines convolutional methods with natural language understanding of descriptions, titles, and tags to improve image classification. The novelty of this approach was to learn from additional external features associated with the images using natural language understanding with transfer learning. It was found that the combination of ResNet-50 image …
Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels
Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels
SMU Data Science Review
Understanding diagnostic tests and examining important features of novel coronavirus (COVID-19) infection are essential steps for controlling the current pandemic of 2020. In this paper, we study the relationship between clinical diagnosis and analytical features of patient blood panels from the US, Mexico, and Brazil. Our analysis confirms that among adults, the risk of severe illness from COVID-19 increases with pre-existing conditions such as diabetes and immunosuppression. Although more than eight months into pandemic, more data have become available to indicate that more young adults were getting infected. In addition, we expand on the definition of COVID-19 test and discuss …
Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed
Cover Song Identification - A Novel Stem-Based Approach To Improve Song-To-Song Similarity Measurements, Lavonnia Newman, Dhyan Shah, Chandler Vaughn, Faizan Javed
SMU Data Science Review
Music is incorporated into our daily lives whether intentional or unintentional. It evokes responses and behavior so much so there is an entire study dedicated to the psychology of music. Music creates the mood for dancing, exercising, creative thought or even relaxation. It is a powerful tool that can be used in various venues and through advertisements to influence and guide human reactions. Music is also often "borrowed" in the industry today. The practices of sampling and remixing music in the digital age have made cover song identification an active area of research. While most of this research is focused …
Advancing Performance Of Retail Recommendation Systems, Lisa Leininger, Johnny Gipson, Kito Patterson, Brad Blanchard
Advancing Performance Of Retail Recommendation Systems, Lisa Leininger, Johnny Gipson, Kito Patterson, Brad Blanchard
SMU Data Science Review
This paper presents two recommendation models, one traditional and one novel, for a retail men's clothing company. J. Hilburn is a custom-fit, menswear clothing company headquartered in Dallas, Texas. J. Hilburn employs stylists across the United States, who engage directly with customers to assist in selecting clothes that fit their size and style. J. Hilburn tasked the authors of this paper to leverage data science techniques to the given data set to provide stylists with more insight into clients’ purchase patterns and increase overall sales. This paper presents two recommendation systems which provide stylists with automatic predictions about possible clothing …
Improving Syntactic Relationships Between Language And Objects, Benjamin Wilke, Tej Tenmattam, Anand Rajan, Andrew Pollock, Joel Lindsey
Improving Syntactic Relationships Between Language And Objects, Benjamin Wilke, Tej Tenmattam, Anand Rajan, Andrew Pollock, Joel Lindsey
SMU Data Science Review
This paper presents the integration of natural language processing and computer vision to improve the syntax of the language generated when describing objects in images. The goal was to not only understand the objects in an image, but the interactions and activities occurring between the objects. We implemented a multi-modal neural network combining convolutional and recurrent neural network architectures to create a model that can maximize the likelihood of word combinations given a training image. The outcome was an image captioning model that leveraged transfer learning techniques for architecture components. Our novelty was to quantify the effectiveness of transfer learning …
Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas
Personalized Detection Of Anxiety Provoking News Events Using Semantic Network Analysis, Jacquelyn Cheun Phd, Luay Dajani, Quentin B. Thomas
SMU Data Science Review
In the age of hyper-connectivity, 24/7 news cycles, and instant news alerts via social media, mental health researchers don't have a way to automatically detect news content which is associated with triggering anxiety or depression in mental health patients. Using the Associated Press news wire, a semantic network was built with 1,056 news articles containing over 500,000 connections across multiple topics to provide a personalized algorithm which detects problematic news content for a given reader. We make use of Semantic Network Analysis to surface the relationship between news article text and anxiety in readers who struggle with mental health disorders. …
A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater
A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater
SMU Data Science Review
In this paper, we present a common definition and list of skills for a Data Scientist using online job postings. The overlap and ambiguity of various roles such as data scientist, data engineer, data analyst, software engineer, database administrator, and statistician motivate the problem. To arrive at a single Data Scientist definition, we collect over 8,000 job postings from Indeed.com for the six job titles. Each corpus contains text on job qualifications, skills, responsibilities, educational preferences, and requirements. Our data science methodology and analysis rendered the single definition of a data scientist: A data scientist codes, collaborates, and communicates – …
A Data Driven Approach To Forecast Demand, Hannah Kosinovsky, Sita Daggubati, Kumar Ramasundaram, Brent Allen
A Data Driven Approach To Forecast Demand, Hannah Kosinovsky, Sita Daggubati, Kumar Ramasundaram, Brent Allen
SMU Data Science Review
Abstract. In this paper, we present a model and methodology for accurately predicting the following quarter’s sales volume of individual products given the previous five years of sales data. Forecasting product demand for a single supplier is complicated by seasonal demand variation, business cycle impacts, and customer churn. We developed a novel prediction using machine learning methodology, based upon a Dense neural network (DNN) model that implicitly considers cyclical demand variation and explicitly considers customer churn while minimizing the least absolute error between predicted demand and actual sales. Using parts sales data for a supplier to the oil and gas …
A Machine Learning Model For Clustering Securities, Vanessa Torres, Travis Deason, Michael Landrum, Nibhrat Lohria
A Machine Learning Model For Clustering Securities, Vanessa Torres, Travis Deason, Michael Landrum, Nibhrat Lohria
SMU Data Science Review
In this paper, we evaluate the self-declared industry classifications and industry relationships between companies listed on either the Nasdaq or the New York Stock Exchange (NYSE) markets. Large corporations typically operate in multiple industries simultaneously; however, for investment purposes they are classified as belonging to a single industry. This simple classification obscures the actual industries within which a company operates, and, therefore, the investment risks of that company.
By using Natural Language Processing (NLP) techniques on Security and Exchange Commission (SEC) filings, we obtained self-defined industry classifications per company. Using clustering techniques such as Hierarchical Agglomerative and k-means clustering we …
Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan
Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan
SMU Data Science Review
In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …
Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels
Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels
SMU Data Science Review
In this paper we find a pattern of aperiodic seismic signals that precede earthquakes at any time in a laboratory earthquake’s cycle using a small window of time. We use a data set that comes from a classic laboratory experiment having several stick-slip displacements (earthquakes), a type of experiment which has been studied as a simulation of seismologic faults for decades. This data exhibits similar behavior to natural earthquakes, so the same approach may work in predicting the timing of them. Here we show that by applying random forest machine learning technique to the acoustic signal emitted by a laboratory …
Longitudinal Analysis With Modes Of Operation For Aes, Dana Geislinger, Cory Thigpen, Daniel W. Engels
Longitudinal Analysis With Modes Of Operation For Aes, Dana Geislinger, Cory Thigpen, Daniel W. Engels
SMU Data Science Review
In this paper, we present an empirical evaluation of the randomness of the ciphertext blocks generated by the Advanced Encryption Standard (AES) cipher in Counter (CTR) mode and in Cipher Block Chaining (CBC) mode. Vulnerabilities have been found in the AES cipher that may lead to a reduction in the randomness of the generated ciphertext blocks that can result in a practical attack on the cipher. We evaluate the randomness of the AES ciphertext using the standard key length and NIST randomness tests. We evaluate the randomness through a longitudinal analysis on 200 billion ciphertext blocks using logistic regression and …
Pristine Sentence Translation: A New Approach To A Timeless Problem, Meenu Ahluwalia, Brian Coari, Ben Brock
Pristine Sentence Translation: A New Approach To A Timeless Problem, Meenu Ahluwalia, Brian Coari, Ben Brock
SMU Data Science Review
Abstract.
Pristine Sentence Translation (PST) is a new approach to language translation based upon sentence-level granularity. Traditional translation approaches, including those utilizing advanced machine learning or neural network-based approaches, translate on a word-by-word or phrase-by-phrase basis; thereby, potentially missing the context or meaning of the complete sentence. Instead of these piecewise translations, PST utilizes deep learning and predictive modeling techniques to translate complete sentences from their source language into their target language. With these approaches we were able to translate sentences that closely conveyed the meaning of the original sentences. Our results demonstrated that PST’s method of translating an entire …
Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater
Self-Driving Cars: Evaluation Of Deep Learning Techniques For Object Detection In Different Driving Conditions, Ramesh Simhambhatla, Kevin Okiah, Shravan Kuchkula, Robert Slater
SMU Data Science Review
Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection. A number of successful object detection systems have been proposed in recent years that are based on CNNs. In this paper, an empirical evaluation of three recent meta-architectures: SSD (Single Shot multi-box Detector), R-CNN (Region-based CNN) and R-FCN (Region-based Fully Convolutional Networks) was conducted to measure how fast and accurate they are in identifying objects on the road, such as vehicles, pedestrians, …
Identification And Classification Of Poultry Eggs: A Case Study Utilizing Computer Vision And Machine Learning, Jeremy Lubich, Kyle Thomas, Daniel W. Engels
Identification And Classification Of Poultry Eggs: A Case Study Utilizing Computer Vision And Machine Learning, Jeremy Lubich, Kyle Thomas, Daniel W. Engels
SMU Data Science Review
We developed a method to identify, count, and classify chickens and eggs inside nesting boxes of a chicken coop. Utilizing an IoT AWS Deep Lens Camera for data capture and inferences, we trained and deployed a custom single-shot multibox (SSD) object detection and classification model. This allows us to monitor a complex environment with multiple chickens and eggs moving and appearing simultaneously within the video frames. The models can label video frames with classifications for 8 breeds of chickens and/or 4 colors of eggs, with 98% accuracy on chickens or eggs alone and 82.5% accuracy while detecting both types of …
Analysis Of Computer Audit Data To Create Indicators Of Compromise For Intrusion Detection, Steven Millett, Michael Toolin, Justin Bates
Analysis Of Computer Audit Data To Create Indicators Of Compromise For Intrusion Detection, Steven Millett, Michael Toolin, Justin Bates
SMU Data Science Review
Network security systems are designed to identify and, if possible, prevent unauthorized access to computer and network resources. Today most network security systems consist of hardware and software components that work in conjunction with one another to present a layered line of defense against unauthorized intrusions. Software provides user interactive layers such as password authentication, and system level layers for monitoring network activity. This paper examines an application monitoring network traffic that attempts to identify Indicators of Compromise (IOC) by extracting patterns in the network traffic which likely corresponds to unauthorized access. Typical network log data and construct indicators are …
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia
SMU Data Science Review
In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …
Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi
Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi
SMU Data Science Review
Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …