Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

San Jose State University

2017

Machine learning

Articles 1 - 3 of 3

Full-Text Articles in Physical Sciences and Mathematics

Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu May 2017

Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu

Master's Projects

Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets …


Image Spam Detection, Aneri Chavda May 2017

Image Spam Detection, Aneri Chavda

Master's Projects

Email is one of the most common forms of digital communication. Spam can be de ned as unsolicited bulk email, while image spam includes spam text embedded inside images. Image spam is used by spammers so as to evade text-based spam lters and hence it poses a threat to email based communication. In this research, we analyze image spam detection methods based on various combinations of image processing and machine learning techniques.


Malware Detection Using The Index Of Coincidence, Bhavna Gurnani Jan 2017

Malware Detection Using The Index Of Coincidence, Bhavna Gurnani

Master's Projects

In this research, we apply the Index of Coincidence (IC) to problems in malware analysis. The IC, which is often used in cryptanalysis of classic ciphers, is a technique for measuring the repeat rate in a string of symbols. A score based on the IC is applied to a variety of challenging malware families. We nd that this relatively simple IC score performs surprisingly well, with superior results in comparison to various machine learning based scores, at least in some cases.