Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
Articles 1 - 3 of 3
Full-Text Articles in Physical Sciences and Mathematics
Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu
Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu
Master's Projects
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets …
Image Spam Detection, Aneri Chavda
Image Spam Detection, Aneri Chavda
Master's Projects
Email is one of the most common forms of digital communication. Spam can be de ned as unsolicited bulk email, while image spam includes spam text embedded inside images. Image spam is used by spammers so as to evade text-based spam lters and hence it poses a threat to email based communication. In this research, we analyze image spam detection methods based on various combinations of image processing and machine learning techniques.
Malware Detection Using The Index Of Coincidence, Bhavna Gurnani
Malware Detection Using The Index Of Coincidence, Bhavna Gurnani
Master's Projects
In this research, we apply the Index of Coincidence (IC) to problems in malware analysis. The IC, which is often used in cryptanalysis of classic ciphers, is a technique for measuring the repeat rate in a string of symbols. A score based on the IC is applied to a variety of challenging malware families. We nd that this relatively simple IC score performs surprisingly well, with superior results in comparison to various machine learning based scores, at least in some cases.