Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Malware Classification Using Api Call Information And Word Embeddings, Sahil Aggarwal Jan 2023

Malware Classification Using Api Call Information And Word Embeddings, Sahil Aggarwal

Master's Projects

Malware classification is the process of classifying malware into recognizable categories and is an integral part of implementing computer security. In recent times, machine learning has emerged as one of the most suitable techniques to perform this task. Models can be trained on various malware features such as opcodes, and API calls among many others to deduce information that would be helpful in the classification.

Word embeddings are a key part of natural language processing and can be seen as a representation of text wherein similar words will have closer representations. These embeddings can be used to discover a quantifiable …


Advancing The Ability To Predict Cognitive Decline And Alzheimer’S Disease Based On Genetic Variants Beyond Amyloid-Beta And Tau, Naveen Rawat Jun 2021

Advancing The Ability To Predict Cognitive Decline And Alzheimer’S Disease Based On Genetic Variants Beyond Amyloid-Beta And Tau, Naveen Rawat

Master's Projects

A growing amount of neurodegenerative R&D is focused on identifying genomic- based explanations of AD that are beyond Amyloid-b and Tau. The proposed effort involves identifying some of the genomic variations, such as single nucleotide polymorphisms (SNPs), allele , chromosome, epigenetic contributors to MCI and AD that are beyond Aβ and Tau.

The project involves building a prediction model based on a support vector machine (SVM) classifier that takes into account the genomic variations and epigenetic factors to predict the early stage of mild cognitive impairment (MCI) and Alzheimer disease (AD). To achieve this, picking up important feature sets which …


Malware Classification Based On Hidden Markov Model And Word2vec Features, Aparna Sunil Kale May 2020

Malware Classification Based On Hidden Markov Model And Word2vec Features, Aparna Sunil Kale

Master's Projects

Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on a wide variety of features, including opcode sequences, API calls, and byte ��-grams, among many others. In this research, we implement hybrid machine learning techniques, where we train hidden Markov models (HMM) and compute Word2Vec encodings based on opcode sequences. The resulting trained HMMs and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), ��-nearest neighbor

(��-NN), random forest (RF), and deep neural network (DNN) classifiers. …


Javascript Metamorphic Malware Detection Using Machine Learning Techniques, Aakash Wadhwani May 2019

Javascript Metamorphic Malware Detection Using Machine Learning Techniques, Aakash Wadhwani

Master's Projects

Various factors like defects in the operating system, email attachments from unknown sources, downloading and installing a software from non-trusted sites make computers vulnerable to malware attacks. Current antivirus techniques lack the ability to detect metamorphic viruses, which vary the internal structure of the original malware code across various versions, but still have the exact same behavior throughout. Antivirus software typically relies on signature detection for identifying a virus, but code morphing evades signature detection quite effectively.

JavaScript is used to generate metamorphic malware by changing the code’s Abstract Syntax Tree without changing the actual functionality, making it very difficult …


Classification Of Malware Models, Akriti Sethi May 2019

Classification Of Malware Models, Akriti Sethi

Master's Projects

Automatically classifying similar malware families is a challenging problem. In this research, we attempt to classify malware families by applying machine learning to machine learning models. Specifically, we train hidden Markov models (HMM) for each malware family in our dataset. The resulting models are then compared in two ways. First, we treat the HMM matrices as images and experiment with convolutional neural networks (CNN) for image classification. Second, we apply support vector machines (SVM) to classify the HMMs. We analyze the results and discuss the relative advantages and disadvantages of each approach.


Topic Classification Using Hybrid Of Unsupervised And Supervised Learning, Jayant Shelke May 2019

Topic Classification Using Hybrid Of Unsupervised And Supervised Learning, Jayant Shelke

Master's Projects

There has been research around the idea of representing words in text as vectors and many models proposed that vary in performance as well as applications. Text processing is used for content recommendation, sentiment analysis, plagiarism detection, content creation, language translation, etc. to name a few. Specifically, we want to look at the problem of topic detection in text content of articles/blogs/summaries. With the humungous amount of text content published each and every minute on the internet, it is imperative that we have very good algorithms and approaches to analyze all the content and be able to classify most of …


Support Vector Machines For Image Spam Analysis, Aneri Chavda, Katerina Potika, Fabio Di Troia, Mark Stamp Jan 2018

Support Vector Machines For Image Spam Analysis, Aneri Chavda, Katerina Potika, Fabio Di Troia, Mark Stamp

Faculty Publications, Computer Science

Email is one of the most common forms of digital communication. Spam is unsolicited bulk email, while image spam consists of spam text embedded inside an image. Image spam is used as a means to evade text-based spam filters, and hence image spam poses a threat to email-based communication. In this research, we analyze image spam detection using support vector machines (SVMs), which we train on a wide variety of image features. We use a linear SVM to quantify the relative importance of the features under consideration. We also develop and analyze a realistic “challenge” dataset that illustrates the limitations …


Predicting Pancreatic Cancer Using Support Vector Machine, Akshay Bodkhe May 2017

Predicting Pancreatic Cancer Using Support Vector Machine, Akshay Bodkhe

Master's Projects

This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that …