Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 86

Full-Text Articles in Physical Sciences and Mathematics

Image-Based Malware Classification With Convolutional Neural Networks And Extreme Learning Machines, Mugdha Jain Dec 2019

Image-Based Malware Classification With Convolutional Neural Networks And Extreme Learning Machines, Mugdha Jain

Master's Projects

Research in the field of malware classification often relies on machine learning models that are trained on high level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this research, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or execution of code. Specifically, we visualize malware samples as images and employ image analysis techniques. In this context, we focus on two machine learning models, namely, Convolutional Neural Networks (CNN) and Extreme …


Hot Fusion Vs Cold Fusion For Malware Detection, Snehal Bichkar Dec 2019

Hot Fusion Vs Cold Fusion For Malware Detection, Snehal Bichkar

Master's Projects

A fundamental problem in malware research consists of malware detection, that is, dis- tinguishing malware samples from benign samples. This problem becomes more challeng- ing when we consider multiple malware families. A typical approach to this multi-family detection problem is to train a machine learning model for each malware family and score each sample against all models. The resulting scores are then used for classification. We refer to this approach as “cold fusion,” since we combine previously-trained models—no retraining of these base models is required when additional malware families are considered. An alternative approach is to train a single model …


Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur Dec 2019

Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur

Master's Projects

Myocardial Infarction (MI), commonly known as a heart attack, occurs when one of the three major blood vessels carrying blood to the heart get blocked, causing the death of myocardial (heart) cells. If not treated immediately, MI may cause cardiac arrest, which can ultimately cause death. Risk factors for MI include diabetes, family history, unhealthy diet and lifestyle. Medical treatments include various types of drugs and surgeries which can prove very expensive for patients due to high healthcare costs. Therefore, it is imperative that MI is diagnosed at the right time. Electrocardiography (ECG) is commonly used to detect MI. ECG …


Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


Assessing Wildfire Damage From High Resolution Satellite Imagery Using Classification Algorithms, Ai-Linh Alten Dec 2019

Assessing Wildfire Damage From High Resolution Satellite Imagery Using Classification Algorithms, Ai-Linh Alten

Master's Projects

Wildfire damage assessments are important information for first responders, govern- ment agencies, and insurance companies to estimate the cost of damages and to help provide relief to those affected by a wildfire. With the help of Earth Observation satellite technology, determining the burn area extent of a fire can be done with traditional remote sensing methods like Normalized Burn Ratio. Using Very High Resolution satellites can help give even more accurate damage assessments but will come with some tradeoffs; these satellites can provide higher spatial and temporal resolution at the expense of better spectral resolution. As a wildfire burn area …


Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach, Omid Sharagi Dec 2019

Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach, Omid Sharagi

Master's Projects

This study observes how an evidential reasoning approach can be used as a diagnostic tool for early detection of pancreatic cancer. The evidential reasoning model combines the output of a linear Support Vector Classifier (SVC) with factors such as smoking history, health history, biopsy location, NGS technology used, and more to predict the likelihood of the disease. The SVC was trained using genomic data of pancreatic cancer patients derived from the National Cancer Institute (NIH) Genomic Data Commons (GDC). To test the evidential reasoning model, a variety of synthetic data was compiled to test the impact of combinations of different …


Image-Based Localization Of User-Interfaces, Riti Gupta Dec 2019

Image-Based Localization Of User-Interfaces, Riti Gupta

Master's Projects

Image localization corresponds to translating the text present in the images from one language to other language. The aim of the project is to develop a methodology to translate the text in image captions from English to Hindi by taking context of the images into account. A lot of work has been done in this field [22], but our aim was to explore if the accuracy can be further improved by consideration of the additional information imparted by the images apart from the text. We have explored Deep Learning using neural networks for this project. In particular, Recurrent Neural Networks …


3d Shape Prediction On Convolutional Deep Belief Networks, Gregory Y. Enriquez Dec 2019

3d Shape Prediction On Convolutional Deep Belief Networks, Gregory Y. Enriquez

Master's Projects

The field of image recognition software has grown immensely in recent years with the emergence of new deep learning techniques. Deep belief networks inspired by Hinton [11] were one of the earliest methodologies of deep learning in the late 2000s. More recently, convolutional neural networks have been used in deep learning techniques, architecture, and software to identify patterns in imagery in order to make predictions such as classification, image segmentation, etc. Traditional two-dimensional, or 2D, images stored as picture files, typically contain red, green, and blue color data for each individual pixel in the picture. However, more recent commercial 2.5D …


Music Retrieval System Using Query-By-Humming, Parth Patel Dec 2019

Music Retrieval System Using Query-By-Humming, Parth Patel

Master's Projects

Music Information Retrieval (MIR) is a particular research area of great interest because there are various strategies to retrieve music. To retrieve music, it is important to find a similarity between the input query and the matching music. Several solutions have been proposed that are currently being used in the application domain(s) such as Query- by-Example (QBE) which takes a sample of an audio recording playing in the background and retrieves the result. However, there is no efficient approach to solve this problem in a Query-by-Humming (QBH) application. In a Query-by-Humming application, the aim is to retrieve music that is …


A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma Dec 2019

A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma

Master's Projects

Text summarization has been a long studied topic in the field of natural language processing. There have been various approaches for both extractive text summarization as well as abstractive text summarization. Summarizing texts for a single document is a methodical task. But summarizing multiple documents poses as a greater challenge. This thesis explores the application of Latent Semantic Analysis, Text-Rank, Lex-Rank and Reduction algorithms for single document text summarization and compares it with the proposed approach of creating a hybrid system combining each of the above algorithms, individually, with Restricted Boltzmann Machines for multi-document text summarization and analyzing how all …


Predicting Switch-Like Behavior In Proteins Using Logistic Regression On Sequence-Based Descriptors, Benjamin Strauss Jul 2019

Predicting Switch-Like Behavior In Proteins Using Logistic Regression On Sequence-Based Descriptors, Benjamin Strauss

Master's Projects

Ligands can bind at specific protein locations, inducing conformational changes such as those involving secondary structure. Identifying these possible switches from sequence, including homology, is an important ongoing area of research. We attempt to predict possible secondary structure switches from sequence in proteins using machine learning, specifically a logistic regression approach with 48 N-acetyltransferases as our learning set and 5 sirtuins as our test set. Validated residue binary assignments of 0 (no change in secondary structure) and 1 (change in secondary structure) were determined (DSSP) from 3D X-ray structures for sets of virtually identical chains crystallized under different conditions. Our …


Fast High Resolution Image Completion, Chinmay Mishra May 2019

Fast High Resolution Image Completion, Chinmay Mishra

Master's Projects

This paper presents a method for image completion, an active research area in the field of computer vision. The method described in the paper aims at achieving comparable results to other state of the art methods with approximately four and a half times reduction in training time. It is a two step procedure which involves image completion and enhancing the resolution of the completed image. We use the SSIM metric to evaluate the quality of the completed image and to also time our model against other image completion models.


Learning For Free – Object Detectors Trained On Synthetic Data, Charles Thane Mackay May 2019

Learning For Free – Object Detectors Trained On Synthetic Data, Charles Thane Mackay

Master's Projects

A picture is worth a thousand words, or if you want it labeled, it’s worth about four cents per bounding box. Data is the fuel that powers modern technologies run by artificial intelligence engines which is increasingly valuable in today’s industry. High quality labeled data is the most important factor in producing accurate machine learning models which can be used to make powerful predictions and identify patterns humans may not see. Acquiring high quality labeled data however, can be expensive and time consuming. For small companies, academic researchers, or machine learning hobbyists, gathering large datasets for a specific task that …


Detecting Crispr Arrays Using Long-Short Term Memory Network, Shantanu Deshmukh May 2019

Detecting Crispr Arrays Using Long-Short Term Memory Network, Shantanu Deshmukh

Master's Projects

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) is a se- quence found in the DNA sequence of an organism. It provides provides immunity to the organism. Recently, it was found that the CRISPR-based immunity mechanism can be manipulated to perform genome editing. The problem is, it is hard to know the specificity of this system and in turn, making it highly specific is difficult. More re- search is required to improve this CRISPR-based genome editing. Detecting CRISPR arrays in the DNA sequence is the first step towards this research. In this work, a CRISPR array detection pipeline, CRISPRLstm, is proposed. …


Designing Single Guide Rnas For Crispr/Cas9, Neha Atul Bhagwat May 2019

Designing Single Guide Rnas For Crispr/Cas9, Neha Atul Bhagwat

Master's Projects

Researchers have been working towards development of tools to facilitate regular use genome engineering techniques. In recent years, the focus of these efforts has been the Clustered Regularly Interspaced Short Palindromic Repeats(CRISPR)/CRISPR associated(Cas) systems. These systems, while found naturally in bacteria and archaea as an immunity mechanism, can be used for genome engineering in eukaryotes.

There are three major computational challenges associated with the use of CRISPR/Cas9 in genome engineering for mammals - identification of CRISPR arrays, single guide RNA design and minimizing off-target effects. This project attempts to solve the problem of single guide RNA design using a novel …


Machine Learning In Crop Classification Of Temporal Multispectral Satellite Image, Ravali Koppaka May 2019

Machine Learning In Crop Classification Of Temporal Multispectral Satellite Image, Ravali Koppaka

Master's Projects

Recently, there has been a remarkable growth in Artificial Intelligence (AI) with

the development of efficient AI models and high-power computational resources for processing complex datasets. There has been a growing number of applications of machine learning in satellite remote sensing image data processing. In this work, machine learning methods were applied for crop classification of temporal multi- spectral satellite image to achieve better prediction of crop-wise area statistics. In India, agriculture has a huge impact on the national economy and most of the critical decisions are dependent on agricultural statistics. Sentinel-2 satellite image data for the Guntur district region …


Randition: Random Blockchain Partitioning For Write Throughput, David Nguyen May 2019

Randition: Random Blockchain Partitioning For Write Throughput, David Nguyen

Master's Projects

This paper proposes to support dynamic runtime partitioning of Tendermint, which is an in-development state machine replication algorithm that uses the blockchain model to provide Byzantine-fault tolerance. We call this variation Randition. We incorporate recent research from blockchain consensus and replicated state machine partitioning to allow Randition users to partition their blockchain for improved write performance at the cost of some Byzantine fault tolerance. We conduct an experiment to compare the raw write throughput of Randition and Tendermint. Finally, we discuss the experiment results and discuss further improvements to Randition.


Context-Based Multi-Stage Offline Handwritten Mathematical Symbol Recognition Using Deep Learning, Sui Kun Guan May 2019

Context-Based Multi-Stage Offline Handwritten Mathematical Symbol Recognition Using Deep Learning, Sui Kun Guan

Master's Projects

We propose a multi-stage machine learning (ML) architecture to improve the accuracy of offline handwritten mathematical symbol recognition. In the first stage, we train and assemble multiple deep convolutional neural networks to classify isolated mathematical symbols. However, certain ambiguous symbols are hard to classify without the context information of the mathematical expressions where the symbols belong. In the second stage, we train a deep convolutional neural network that further classifies the ambiguous symbols based on the context information of the symbols. To further improve the classification accuracy, in the third stage, we develop a set of rules to classify the …


Music Mood Classification Using Convolutional Neural Networks, Revanth Akella May 2019

Music Mood Classification Using Convolutional Neural Networks, Revanth Akella

Master's Projects

Grouping music into moods is useful as music is migrating from to online streaming services as it can help in recommendations. To establish the connection between music and mood we develop an end-to-end, open source approach for mood classification using lyrics. We develop a pipeline for tag extraction, lyric extraction, and establishing classification models for classifying music into moods. We investigate techniques to classify music into moods using lyrics and audio features. Using various natural language processing methods with machine learning and deep learning we perform a comparative study across different classification and mood models. The results infer that features …


An Industry Driven Genre Classification Application Using Natural Language Processing, Sharan Duggirala May 2019

An Industry Driven Genre Classification Application Using Natural Language Processing, Sharan Duggirala

Master's Projects

With the advent of digitized music, many online streaming companies such as Spotify have capitalized on a listener’s need for a common stream platform. An essential component of such a platform is the recommender systems that suggest to the constituent user base, related tracks, albums and artists. In order to sustain such a recommender system, labeling data to indicate which genre it belongs to is essential. Most recent academic publications that deal with music genre classification focus on the use of deep neural networks developed and applied within the music genre classification domain. This thesis attempts to use some of …


Sql Injection Detection Using Machine Learning, Sonali Mishra May 2019

Sql Injection Detection Using Machine Learning, Sonali Mishra

Master's Projects

Sharing information over the Internet over multiple platforms and web-applications has become a quite common phenomenon in the recent times. The web-based applications that accept critical information from users store this information in databases. These applications and the databases connected to them are susceptible to all kinds of information security threats due to being accessible through the Internet. The threats include attacks such as Cross Side Scripting (CSS), Denial of Service Attack (DoS0, and Structured Query Language (SQL) Injection attacks. SQL Injection attacks fall under the top ten vulnerabilities when we talk about web-based applications. Through this kind of attack, …


A Webrtc Video Chat Implementation Within The Yioop Search Engine, Yangcha Ho May 2019

A Webrtc Video Chat Implementation Within The Yioop Search Engine, Yangcha Ho

Master's Projects

Web real-time communication (abbreviated as WebRTC) is one of the latest Web application technologies that allows voice, video, and data to work collectively in a browser without a need for third-party plugins or proprietary software installation. When two browsers from different locations communicate with each other, they must know how to locate each other,

bypass security and firewall protections, and transmit all multimedia communications in real time. This project not only illustrates how WebRTC technology works but also walks through a real example of video chat-style application. The application communicates between two remote users using WebSocket and the data encryption …


Learning To Play The Trading Game, Neeraj Kulkarni May 2019

Learning To Play The Trading Game, Neeraj Kulkarni

Master's Projects

Can we train a stock trading bot that can take decisions in high-entropy envi- ronments like stock markets to generate profits based on some optimal policy? Can we further extend this learning for any general trading problem? Quantitative Al- gorithms are responsible for more than 75% of the stock trading around the world. Creating a stock market prediction model is comparatively easy. But creating a prof- itable prediction model is still considered as a challenging task in the field of machine learning and deep learning due to the unpredictability of the financial markets. Us- ing biologically inspired computing techniques of …


Intelligent Log Analysis For Anomaly Detection, Steven Yen May 2019

Intelligent Log Analysis For Anomaly Detection, Steven Yen

Master's Projects

Computer logs are a rich source of information that can be analyzed to detect various issues. The large volumes of logs limit the effectiveness of manual approaches to log analysis. The earliest automated log analysis tools take a rule-based approach, which can only detect known issues with existing rules. On the other hand, anomaly detection approaches can detect new or unknown issues. This is achieved by looking for unusual behavior different from the norm, often utilizing machine learning (ML) or deep learning (DL) models. In this project, we evaluated various ML and DL techniques used for log anomaly detection. We …


Improving Steering Ability Of An Autopilot In A Fully Autonomous Car, Shivanku Mahna May 2019

Improving Steering Ability Of An Autopilot In A Fully Autonomous Car, Shivanku Mahna

Master's Projects

The world we live in is developing at a really rapid pace and along with it is developing the technology that we use. We have clearly come a long way from calling a car modern because it had a touch screen infotainment system to calling it modern because it drives on its own. The progress has been so rapid that it demands for us to analyze this and try to improvise a small part of this journey. With the same thought in mind, this project focuses on improvising the steering ability of an autonomous car. In order to make more …


Breaking Audio Captcha Using Machine Learning/Deep Learning And Related Defense Mechanism, Heemany Shekhar May 2019

Breaking Audio Captcha Using Machine Learning/Deep Learning And Related Defense Mechanism, Heemany Shekhar

Master's Projects

CAPTCHA is a web-based authentication method used by websites to distinguish between humans (valid users) and bots(attackers). Audio captcha is an accessible captcha meant for the visually disabled section of users such as color-blind, blind, near-sighted users. In this project, I analyzed the security of audio captchas from attacks that employ machine learning and deep learning models. Audio captchas of varying lengths (5, 7 and 10) and varying background noise (no noise, medium noise or high noise) were analyzed. I found that audio captchas with no background noise or medium background noise were easily attacked with 99% - 100% accuracy. …


Deep Learning Based Real Time Devanagari Character Recognition, Aseem Chhabra May 2019

Deep Learning Based Real Time Devanagari Character Recognition, Aseem Chhabra

Master's Projects

The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are …


Graph Classification Using Machine Learning Algorithms, Monica Golahalli Seenappa May 2019

Graph Classification Using Machine Learning Algorithms, Monica Golahalli Seenappa

Master's Projects

In the Graph classification problem, given is a family of graphs and a group of different categories, and we aim to classify all the graphs (of the family) into the given categories. Earlier approaches, such as graph kernels and graph embedding techniques have focused on extracting certain features by processing the entire graph. However, real world graphs are complex and noisy and these traditional approaches are computationally intensive. With the introduction of the deep learning framework, there have been numerous attempts to create more efficient classification approaches.

For this project, we will be focusing on modifying an existing kernel graph …


Tsar : A System For Defending Hate Speech Detection Models Against Adversaries, Brian Tuan Khieu May 2019

Tsar : A System For Defending Hate Speech Detection Models Against Adversaries, Brian Tuan Khieu

Master's Projects

Although current state-of-the-art hate speech detection models achieve praiseworthy results, these models have shown themselves to be vulnerable to attack. Easy to execute lexical manipulations such as the removal of whitespace from a given text create significant issues for word-based hate speech detection models. In this paper, we reproduce the results of five cutting edge models as well as four significant evasion schemes from prior work. Only a limited amount of evasion schemes that also maintain readability exists, and this works to our advantage in the recreation of the original data. Furthermore, we demonstrate that each lexical attack or evasion …


Influence Analysis Based On Political Twitter Data, Jace Rose May 2019

Influence Analysis Based On Political Twitter Data, Jace Rose

Master's Projects

Studies of online behavior often consider how users interact online, their posting behaviors, what they are tweeting about, and how likely they are to follow other people. The problem is there is that no deeper study on the people that a user has interacted with and how these other users affect them. This study examines if it is possible to draw similar sentiment from users with whom the target user has interacted with. The data collection process gathers data from Twitter users posting to popular political hashtags, which the highest at the time published were #MAGA and #TRUMP, as well …