Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Artificial Intelligence and Robotics

San Jose State University

Word2Vec

Publication Year

Articles 1 - 5 of 5

Full-Text Articles in Physical Sciences and Mathematics

Malware Classification Using Api Call Information And Word Embeddings, Sahil Aggarwal Jan 2023

Malware Classification Using Api Call Information And Word Embeddings, Sahil Aggarwal

Master's Projects

Malware classification is the process of classifying malware into recognizable categories and is an integral part of implementing computer security. In recent times, machine learning has emerged as one of the most suitable techniques to perform this task. Models can be trained on various malware features such as opcodes, and API calls among many others to deduce information that would be helpful in the classification.

Word embeddings are a key part of natural language processing and can be seen as a representation of text wherein similar words will have closer representations. These embeddings can be used to discover a quantifiable …


Malware Classification Using Graph Neural Networks, Manasa Mananjaya Jan 2023

Malware Classification Using Graph Neural Networks, Manasa Mananjaya

Master's Projects

Word embeddings are widely recognized as important in natural language pro- cessing for capturing semantic relationships between words. In this study, we conduct experiments to explore the effectiveness of word embedding techniques in classifying malware. Specifically, we evaluate the performance of Graph Neural Network (GNN) applied to knowledge graphs constructed from opcode sequences of malware files. In the first set of experiments, Graph Convolution Network (GCN) is applied to knowledge graphs built with different word embedding techniques such as Bag-of-words, TF-IDF, and Word2Vec. Our results indicate that Word2Vec produces the most effective word embeddings, serving as a baseline for comparison …


Malware Classification With Bert, Joel Lawrence Alvares May 2021

Malware Classification With Bert, Joel Lawrence Alvares

Master's Projects

Malware Classification is used to distinguish unique types of malware from each other.

This project aims to carry out malware classification using word embeddings which are used in Natural Language Processing (NLP) to identify and evaluate the relationship between words of a sentence. Word embeddings generated by BERT and Word2Vec for malware samples to carry out multi-class classification. BERT is a transformer based pre- trained natural language processing (NLP) model which can be used for a wide range of tasks such as question answering, paraphrase generation and next sentence prediction. However, the attention mechanism of a pre-trained BERT model can …


Word Embedding Techniques For Malware Classification, Aniket Chandak May 2020

Word Embedding Techniques For Malware Classification, Aniket Chandak

Master's Projects

Word embeddings are often used in natural language processing as a means to quantify relationships between words. More generally, these same word embedding techniques can be used to quantify relationships between features. In this paper, we conduct a series of experiments that are designed to determine the effectiveness of word embedding in the context of malware classification. First, we conduct experiments where hidden Markov models (HMM) are directly applied to opcode sequences. These results serve to establish a baseline for comparison with our subsequent word embedding experiments. We then experiment with word embedding vectors derived from HMMs— a technique that …


Comparison Of Word2vec With Hash2vec For Machine Translation, Neha Gaikwad May 2020

Comparison Of Word2vec With Hash2vec For Machine Translation, Neha Gaikwad

Master's Projects

Machine Translation is the study of computer translation of a text written in one human language into text in a different language. Within this field, a word embedding is a mapping from terms in a language into small dimensional vectors which can be processed using mathematical operations. Two traditional word embedding approaches are word2vec, which uses a Neural Network, and hash2vec, which is based on a simpler hashing algorithm. In this project, we have explored the relative suitability of each approach to sequence to sequence text translation using a Recurrent Neural Network (RNN). We also carried out experiments to test …