Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 70

Full-Text Articles in Physical Sciences and Mathematics

Image-Based Malware Classification With Convolutional Neural Networks And Extreme Learning Machines, Mugdha Jain Dec 2019

Image-Based Malware Classification With Convolutional Neural Networks And Extreme Learning Machines, Mugdha Jain

Master's Projects

Research in the field of malware classification often relies on machine learning models that are trained on high level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this research, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or execution of code. Specifically, we visualize malware samples as images and employ image analysis techniques. In this context, we focus on two machine learning models, namely, Convolutional Neural Networks (CNN) and Extreme …


Hot Fusion Vs Cold Fusion For Malware Detection, Snehal Bichkar Dec 2019

Hot Fusion Vs Cold Fusion For Malware Detection, Snehal Bichkar

Master's Projects

A fundamental problem in malware research consists of malware detection, that is, dis- tinguishing malware samples from benign samples. This problem becomes more challeng- ing when we consider multiple malware families. A typical approach to this multi-family detection problem is to train a machine learning model for each malware family and score each sample against all models. The resulting scores are then used for classification. We refer to this approach as “cold fusion,” since we combine previously-trained models—no retraining of these base models is required when additional malware families are considered. An alternative approach is to train a single model …


Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur Dec 2019

Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur

Master's Projects

Myocardial Infarction (MI), commonly known as a heart attack, occurs when one of the three major blood vessels carrying blood to the heart get blocked, causing the death of myocardial (heart) cells. If not treated immediately, MI may cause cardiac arrest, which can ultimately cause death. Risk factors for MI include diabetes, family history, unhealthy diet and lifestyle. Medical treatments include various types of drugs and surgeries which can prove very expensive for patients due to high healthcare costs. Therefore, it is imperative that MI is diagnosed at the right time. Electrocardiography (ECG) is commonly used to detect MI. ECG …


Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


Assessing Wildfire Damage From High Resolution Satellite Imagery Using Classification Algorithms, Ai-Linh Alten Dec 2019

Assessing Wildfire Damage From High Resolution Satellite Imagery Using Classification Algorithms, Ai-Linh Alten

Master's Projects

Wildfire damage assessments are important information for first responders, govern- ment agencies, and insurance companies to estimate the cost of damages and to help provide relief to those affected by a wildfire. With the help of Earth Observation satellite technology, determining the burn area extent of a fire can be done with traditional remote sensing methods like Normalized Burn Ratio. Using Very High Resolution satellites can help give even more accurate damage assessments but will come with some tradeoffs; these satellites can provide higher spatial and temporal resolution at the expense of better spectral resolution. As a wildfire burn area …


Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach, Omid Sharagi Dec 2019

Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach, Omid Sharagi

Master's Projects

This study observes how an evidential reasoning approach can be used as a diagnostic tool for early detection of pancreatic cancer. The evidential reasoning model combines the output of a linear Support Vector Classifier (SVC) with factors such as smoking history, health history, biopsy location, NGS technology used, and more to predict the likelihood of the disease. The SVC was trained using genomic data of pancreatic cancer patients derived from the National Cancer Institute (NIH) Genomic Data Commons (GDC). To test the evidential reasoning model, a variety of synthetic data was compiled to test the impact of combinations of different …


Image-Based Localization Of User-Interfaces, Riti Gupta Dec 2019

Image-Based Localization Of User-Interfaces, Riti Gupta

Master's Projects

Image localization corresponds to translating the text present in the images from one language to other language. The aim of the project is to develop a methodology to translate the text in image captions from English to Hindi by taking context of the images into account. A lot of work has been done in this field [22], but our aim was to explore if the accuracy can be further improved by consideration of the additional information imparted by the images apart from the text. We have explored Deep Learning using neural networks for this project. In particular, Recurrent Neural Networks …


A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma Dec 2019

A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma

Master's Projects

Text summarization has been a long studied topic in the field of natural language processing. There have been various approaches for both extractive text summarization as well as abstractive text summarization. Summarizing texts for a single document is a methodical task. But summarizing multiple documents poses as a greater challenge. This thesis explores the application of Latent Semantic Analysis, Text-Rank, Lex-Rank and Reduction algorithms for single document text summarization and compares it with the proposed approach of creating a hybrid system combining each of the above algorithms, individually, with Restricted Boltzmann Machines for multi-document text summarization and analyzing how all …


3d Shape Prediction On Convolutional Deep Belief Networks, Gregory Y. Enriquez Dec 2019

3d Shape Prediction On Convolutional Deep Belief Networks, Gregory Y. Enriquez

Master's Projects

The field of image recognition software has grown immensely in recent years with the emergence of new deep learning techniques. Deep belief networks inspired by Hinton [11] were one of the earliest methodologies of deep learning in the late 2000s. More recently, convolutional neural networks have been used in deep learning techniques, architecture, and software to identify patterns in imagery in order to make predictions such as classification, image segmentation, etc. Traditional two-dimensional, or 2D, images stored as picture files, typically contain red, green, and blue color data for each individual pixel in the picture. However, more recent commercial 2.5D …


Music Retrieval System Using Query-By-Humming, Parth Patel Dec 2019

Music Retrieval System Using Query-By-Humming, Parth Patel

Master's Projects

Music Information Retrieval (MIR) is a particular research area of great interest because there are various strategies to retrieve music. To retrieve music, it is important to find a similarity between the input query and the matching music. Several solutions have been proposed that are currently being used in the application domain(s) such as Query- by-Example (QBE) which takes a sample of an audio recording playing in the background and retrieves the result. However, there is no efficient approach to solve this problem in a Query-by-Humming (QBH) application. In a Query-by-Humming application, the aim is to retrieve music that is …


Designing Single Guide Rnas For Crispr/Cas9, Neha Atul Bhagwat May 2019

Designing Single Guide Rnas For Crispr/Cas9, Neha Atul Bhagwat

Master's Projects

Researchers have been working towards development of tools to facilitate regular use genome engineering techniques. In recent years, the focus of these efforts has been the Clustered Regularly Interspaced Short Palindromic Repeats(CRISPR)/CRISPR associated(Cas) systems. These systems, while found naturally in bacteria and archaea as an immunity mechanism, can be used for genome engineering in eukaryotes.

There are three major computational challenges associated with the use of CRISPR/Cas9 in genome engineering for mammals - identification of CRISPR arrays, single guide RNA design and minimizing off-target effects. This project attempts to solve the problem of single guide RNA design using a novel …


Context-Based Multi-Stage Offline Handwritten Mathematical Symbol Recognition Using Deep Learning, Sui Kun Guan May 2019

Context-Based Multi-Stage Offline Handwritten Mathematical Symbol Recognition Using Deep Learning, Sui Kun Guan

Master's Projects

We propose a multi-stage machine learning (ML) architecture to improve the accuracy of offline handwritten mathematical symbol recognition. In the first stage, we train and assemble multiple deep convolutional neural networks to classify isolated mathematical symbols. However, certain ambiguous symbols are hard to classify without the context information of the mathematical expressions where the symbols belong. In the second stage, we train a deep convolutional neural network that further classifies the ambiguous symbols based on the context information of the symbols. To further improve the classification accuracy, in the third stage, we develop a set of rules to classify the …


Fast High Resolution Image Completion, Chinmay Mishra May 2019

Fast High Resolution Image Completion, Chinmay Mishra

Master's Projects

This paper presents a method for image completion, an active research area in the field of computer vision. The method described in the paper aims at achieving comparable results to other state of the art methods with approximately four and a half times reduction in training time. It is a two step procedure which involves image completion and enhancing the resolution of the completed image. We use the SSIM metric to evaluate the quality of the completed image and to also time our model against other image completion models.


Machine Learning In Crop Classification Of Temporal Multispectral Satellite Image, Ravali Koppaka May 2019

Machine Learning In Crop Classification Of Temporal Multispectral Satellite Image, Ravali Koppaka

Master's Projects

Recently, there has been a remarkable growth in Artificial Intelligence (AI) with

the development of efficient AI models and high-power computational resources for processing complex datasets. There has been a growing number of applications of machine learning in satellite remote sensing image data processing. In this work, machine learning methods were applied for crop classification of temporal multi- spectral satellite image to achieve better prediction of crop-wise area statistics. In India, agriculture has a huge impact on the national economy and most of the critical decisions are dependent on agricultural statistics. Sentinel-2 satellite image data for the Guntur district region …


Detecting Crispr Arrays Using Long-Short Term Memory Network, Shantanu Deshmukh May 2019

Detecting Crispr Arrays Using Long-Short Term Memory Network, Shantanu Deshmukh

Master's Projects

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) is a se- quence found in the DNA sequence of an organism. It provides provides immunity to the organism. Recently, it was found that the CRISPR-based immunity mechanism can be manipulated to perform genome editing. The problem is, it is hard to know the specificity of this system and in turn, making it highly specific is difficult. More re- search is required to improve this CRISPR-based genome editing. Detecting CRISPR arrays in the DNA sequence is the first step towards this research. In this work, a CRISPR array detection pipeline, CRISPRLstm, is proposed. …


Music Mood Classification Using Convolutional Neural Networks, Revanth Akella May 2019

Music Mood Classification Using Convolutional Neural Networks, Revanth Akella

Master's Projects

Grouping music into moods is useful as music is migrating from to online streaming services as it can help in recommendations. To establish the connection between music and mood we develop an end-to-end, open source approach for mood classification using lyrics. We develop a pipeline for tag extraction, lyric extraction, and establishing classification models for classifying music into moods. We investigate techniques to classify music into moods using lyrics and audio features. Using various natural language processing methods with machine learning and deep learning we perform a comparative study across different classification and mood models. The results infer that features …


Learning For Free – Object Detectors Trained On Synthetic Data, Charles Thane Mackay May 2019

Learning For Free – Object Detectors Trained On Synthetic Data, Charles Thane Mackay

Master's Projects

A picture is worth a thousand words, or if you want it labeled, it’s worth about four cents per bounding box. Data is the fuel that powers modern technologies run by artificial intelligence engines which is increasingly valuable in today’s industry. High quality labeled data is the most important factor in producing accurate machine learning models which can be used to make powerful predictions and identify patterns humans may not see. Acquiring high quality labeled data however, can be expensive and time consuming. For small companies, academic researchers, or machine learning hobbyists, gathering large datasets for a specific task that …


Sql Injection Detection Using Machine Learning, Sonali Mishra May 2019

Sql Injection Detection Using Machine Learning, Sonali Mishra

Master's Projects

Sharing information over the Internet over multiple platforms and web-applications has become a quite common phenomenon in the recent times. The web-based applications that accept critical information from users store this information in databases. These applications and the databases connected to them are susceptible to all kinds of information security threats due to being accessible through the Internet. The threats include attacks such as Cross Side Scripting (CSS), Denial of Service Attack (DoS0, and Structured Query Language (SQL) Injection attacks. SQL Injection attacks fall under the top ten vulnerabilities when we talk about web-based applications. Through this kind of attack, …


Poriferal Vision, Saketh Saxena May 2019

Poriferal Vision, Saketh Saxena

Master's Projects

Sponges provide nourishment as well as a habitat for various aquatic organisms. Anatomically, sponges are made up of soft tissue with a silica based exoskeleton which serves both as support and protection for the underlying tissue. The exoskeleton persists after the tissue decomposes, and microscopic parts of the exoskeleton break away to form spicules. Oceanographic studies have shown that the density of the sponge spicules is a good indicator of the sponge population in an area. This measure can be used to study sponge population dynamics over time. The spicule density is measured by imaging spicules from samples of water …


Classification Of Humans Into Ayurvedic Prakruti Types Using Computer Vision, Gayatri Gadre May 2019

Classification Of Humans Into Ayurvedic Prakruti Types Using Computer Vision, Gayatri Gadre

Master's Projects

Ayurveda, a 5000 years old Indian medical science, believes that the universe and hence humans are made up of five elements namely ether, fire, water, earth, and air. The three Doshas (Tridosha) Vata, Pitta, and Kapha originated from the combinations of these elements. Every person has a unique combination of Tridosha elements contributing to a person’s ‘Prakruti’. Prakruti governs the physiological and psychological tendencies in all living beings as well as the way they interact with the environment. This balance influences their physiological features like the texture and colour of skin, hair, eyes, length of fingers, the shape of the …


Using Computer Vision To Quantify Coral Reef Biodiversity, Niket Bhodia May 2019

Using Computer Vision To Quantify Coral Reef Biodiversity, Niket Bhodia

Master's Projects

The preservation of the world’s oceans is crucial to human survival on this planet, yet we know too little to begin to understand anthropogenic impacts on marine life. This is especially true for coral reefs, which are the most diverse marine habitat per unit area (if not overall) as well as the most sensitive. To address this gap in knowledge, simple field devices called autonomous reef monitoring structures (ARMS) have been developed, which provide standardized samples of life from these complex ecosystems. ARMS have now become successful to the point that the amount of data collected through them has outstripped …


Over Speed Detection Using Artificial Intelligence, Samkit Patira May 2019

Over Speed Detection Using Artificial Intelligence, Samkit Patira

Master's Projects

Over speeding is one of the most common traffic violations. Around 41 million people are issued speeding tickets each year in USA i.e one every second. Existing approaches to detect over- speeding are not scalable and require manual efforts. In this project, by the use of computer vision and artificial intelligence, I have tried to detect over speeding and report the violation to the law enforcement officer. It was observed that when predictions are done using YoloV3, we get the best results.


Robust Lightweight Object Detection, Siddharth Kumar May 2019

Robust Lightweight Object Detection, Siddharth Kumar

Master's Projects

Object detection is a very challenging problem in computer vision and has been a prominent subject of research for nearly three decades. There has been a promising in- crease in the accuracy and performance of object detectors ever since deep convolutional networks (CNN) were introduced. CNNs can be trained on large datasets made of high resolution images without flattening them, thereby using the spatial information. Their superior learning ability also makes them ideal for image classification and object de- tection tasks. Unfortunately, this power comes at the big cost of compute and memory. For instance, the Faster R-CNN detector required …


Improving Steering Ability Of An Autopilot In A Fully Autonomous Car, Shivanku Mahna May 2019

Improving Steering Ability Of An Autopilot In A Fully Autonomous Car, Shivanku Mahna

Master's Projects

The world we live in is developing at a really rapid pace and along with it is developing the technology that we use. We have clearly come a long way from calling a car modern because it had a touch screen infotainment system to calling it modern because it drives on its own. The progress has been so rapid that it demands for us to analyze this and try to improvise a small part of this journey. With the same thought in mind, this project focuses on improvising the steering ability of an autonomous car. In order to make more …


Deep Learning Based Real Time Devanagari Character Recognition, Aseem Chhabra May 2019

Deep Learning Based Real Time Devanagari Character Recognition, Aseem Chhabra

Master's Projects

The revolutionization of the technology behind optical character recognition (OCR) has helped it to become one of those technologies that have found plenty of uses in the entire industrial space. Today, the OCR is available for several languages and have the capability to recognize the characters in real time, but there are some languages for which this technology has not developed much. All these advancements have been possible because of the introduction of concepts like artificial intelligence and deep learning. Deep Neural Networks have proven to be the best choice when it comes to a task involving recognition. There are …


Predicting Off-Target Potential Of Crispr-Cas9 Single Guide Rna, Ishita Mathur May 2019

Predicting Off-Target Potential Of Crispr-Cas9 Single Guide Rna, Ishita Mathur

Master's Projects

With advancements in the field of genome engineering, researchers have come up with potential ways for site-specific gene editing. One of the methods uses the Clustered Regularly Interspaced Short Palindromic Repeats - CRISPR-Cas technology. It consists of a Cas9 nuclease and a single guide RNA (sgRNA) that cleaves the DNA at the intended target site. However, the target genome could contain multiple potential off-target sites and cleaving an off-target site can have deleterious effects in case of gene editing in humans.

Lab based assays have been developed to test the off-target effects of guide RNAs. However, it is not feasible …


Benchmarking Optimization Algorithms For Capacitated Vehicle Routing Problems, Pratik Surana May 2019

Benchmarking Optimization Algorithms For Capacitated Vehicle Routing Problems, Pratik Surana

Master's Projects

The Vehicle Routing Problem (VRP) originated in the 1950s when algorithms and mathematical approaches were applied to find solutions for routing vehicles. Since then, there has been extensive research in the field of VRPs to solve real-life problems. The process of generating an optimal routing schedule for a VRP is complex due to two reasons. First, VRP is considered to be an NP-Hard problem. Second, there are several constraints involved, such as the number of available vehicles, the vehicle capacities, time-windows for pickup or delivery etc.

The main goal for this project was to compare different optimization algorithms for solving …


Deep Learning On Graphs Using Graph Convolutional Networks, Saurabh Mithe May 2019

Deep Learning On Graphs Using Graph Convolutional Networks, Saurabh Mithe

Master's Projects

Graphs are a powerful way to model network data with the objects as nodes and the relationship between the various objects as links. Such graphs contain a plethora of valuable information about the underlying data which can be extracted, analyzed, and visualized using Machine Learning (ML). The challenge to this task is that graphs are non-Euclidean structures which means that they cannot be directly used with ML techniques because ML techniques only work with Euclidean structures like grids or sequences. In order to overcome this challenge, the graph structure first needs to be encoded into an equivalent Euclidean representation in …


Glovenor - Global Vectors For Node Representations, Shishir Kulkarni May 2019

Glovenor - Global Vectors For Node Representations, Shishir Kulkarni

Master's Projects

A graph is a very powerful abstract data type that can be used to model entities (nodes) and relationships (edges). Many real world networks like biological, computer and friendship networks can be represented as graphs. Graphs can be mined to extract interesting patterns and interactions between the participating entities. Recently, various Artificial Intelligence (AI) and Machine Learning (ML) techniques are used for this purpose. In order to do that, the nodes of a graph have to be represented as low dimensional feature vectors. Node embedding is the process of generating a �-dimensional feature vector corresponding to each node of a …


Learning To Play The Trading Game, Neeraj Kulkarni May 2019

Learning To Play The Trading Game, Neeraj Kulkarni

Master's Projects

Can we train a stock trading bot that can take decisions in high-entropy envi- ronments like stock markets to generate profits based on some optimal policy? Can we further extend this learning for any general trading problem? Quantitative Al- gorithms are responsible for more than 75% of the stock trading around the world. Creating a stock market prediction model is comparatively easy. But creating a prof- itable prediction model is still considered as a challenging task in the field of machine learning and deep learning due to the unpredictability of the financial markets. Us- ing biologically inspired computing techniques of …