Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 85

Full-Text Articles in Physical Sciences and Mathematics

Image-Based Malware Classification With Convolutional Neural Networks And Extreme Learning Machines, Mugdha Jain Dec 2019

Image-Based Malware Classification With Convolutional Neural Networks And Extreme Learning Machines, Mugdha Jain

Master's Projects

Research in the field of malware classification often relies on machine learning models that are trained on high level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this research, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or execution of code. Specifically, we visualize malware samples as images and employ image analysis techniques. In this context, we focus on two machine learning models, namely, Convolutional Neural Networks (CNN) and Extreme …


Hot Fusion Vs Cold Fusion For Malware Detection, Snehal Bichkar Dec 2019

Hot Fusion Vs Cold Fusion For Malware Detection, Snehal Bichkar

Master's Projects

A fundamental problem in malware research consists of malware detection, that is, dis- tinguishing malware samples from benign samples. This problem becomes more challeng- ing when we consider multiple malware families. A typical approach to this multi-family detection problem is to train a machine learning model for each malware family and score each sample against all models. The resulting scores are then used for classification. We refer to this approach as “cold fusion,” since we combine previously-trained models—no retraining of these base models is required when additional malware families are considered. An alternative approach is to train a single model …


Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur Dec 2019

Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur

Master's Projects

Myocardial Infarction (MI), commonly known as a heart attack, occurs when one of the three major blood vessels carrying blood to the heart get blocked, causing the death of myocardial (heart) cells. If not treated immediately, MI may cause cardiac arrest, which can ultimately cause death. Risk factors for MI include diabetes, family history, unhealthy diet and lifestyle. Medical treatments include various types of drugs and surgeries which can prove very expensive for patients due to high healthcare costs. Therefore, it is imperative that MI is diagnosed at the right time. Electrocardiography (ECG) is commonly used to detect MI. ECG …


Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


Assessing Wildfire Damage From High Resolution Satellite Imagery Using Classification Algorithms, Ai-Linh Alten Dec 2019

Assessing Wildfire Damage From High Resolution Satellite Imagery Using Classification Algorithms, Ai-Linh Alten

Master's Projects

Wildfire damage assessments are important information for first responders, govern- ment agencies, and insurance companies to estimate the cost of damages and to help provide relief to those affected by a wildfire. With the help of Earth Observation satellite technology, determining the burn area extent of a fire can be done with traditional remote sensing methods like Normalized Burn Ratio. Using Very High Resolution satellites can help give even more accurate damage assessments but will come with some tradeoffs; these satellites can provide higher spatial and temporal resolution at the expense of better spectral resolution. As a wildfire burn area …


Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach, Omid Sharagi Dec 2019

Toward Early Detection Of Pancreatic Cancer: An Evidence-Based Approach, Omid Sharagi

Master's Projects

This study observes how an evidential reasoning approach can be used as a diagnostic tool for early detection of pancreatic cancer. The evidential reasoning model combines the output of a linear Support Vector Classifier (SVC) with factors such as smoking history, health history, biopsy location, NGS technology used, and more to predict the likelihood of the disease. The SVC was trained using genomic data of pancreatic cancer patients derived from the National Cancer Institute (NIH) Genomic Data Commons (GDC). To test the evidential reasoning model, a variety of synthetic data was compiled to test the impact of combinations of different …


Image-Based Localization Of User-Interfaces, Riti Gupta Dec 2019

Image-Based Localization Of User-Interfaces, Riti Gupta

Master's Projects

Image localization corresponds to translating the text present in the images from one language to other language. The aim of the project is to develop a methodology to translate the text in image captions from English to Hindi by taking context of the images into account. A lot of work has been done in this field [22], but our aim was to explore if the accuracy can be further improved by consideration of the additional information imparted by the images apart from the text. We have explored Deep Learning using neural networks for this project. In particular, Recurrent Neural Networks …


A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma Dec 2019

A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma

Master's Projects

Text summarization has been a long studied topic in the field of natural language processing. There have been various approaches for both extractive text summarization as well as abstractive text summarization. Summarizing texts for a single document is a methodical task. But summarizing multiple documents poses as a greater challenge. This thesis explores the application of Latent Semantic Analysis, Text-Rank, Lex-Rank and Reduction algorithms for single document text summarization and compares it with the proposed approach of creating a hybrid system combining each of the above algorithms, individually, with Restricted Boltzmann Machines for multi-document text summarization and analyzing how all …


3d Shape Prediction On Convolutional Deep Belief Networks, Gregory Y. Enriquez Dec 2019

3d Shape Prediction On Convolutional Deep Belief Networks, Gregory Y. Enriquez

Master's Projects

The field of image recognition software has grown immensely in recent years with the emergence of new deep learning techniques. Deep belief networks inspired by Hinton [11] were one of the earliest methodologies of deep learning in the late 2000s. More recently, convolutional neural networks have been used in deep learning techniques, architecture, and software to identify patterns in imagery in order to make predictions such as classification, image segmentation, etc. Traditional two-dimensional, or 2D, images stored as picture files, typically contain red, green, and blue color data for each individual pixel in the picture. However, more recent commercial 2.5D …


Music Retrieval System Using Query-By-Humming, Parth Patel Dec 2019

Music Retrieval System Using Query-By-Humming, Parth Patel

Master's Projects

Music Information Retrieval (MIR) is a particular research area of great interest because there are various strategies to retrieve music. To retrieve music, it is important to find a similarity between the input query and the matching music. Several solutions have been proposed that are currently being used in the application domain(s) such as Query- by-Example (QBE) which takes a sample of an audio recording playing in the background and retrieves the result. However, there is no efficient approach to solve this problem in a Query-by-Humming (QBH) application. In a Query-by-Humming application, the aim is to retrieve music that is …


Predicting Switch-Like Behavior In Proteins Using Logistic Regression On Sequence-Based Descriptors, Benjamin Strauss Jul 2019

Predicting Switch-Like Behavior In Proteins Using Logistic Regression On Sequence-Based Descriptors, Benjamin Strauss

Master's Projects

Ligands can bind at specific protein locations, inducing conformational changes such as those involving secondary structure. Identifying these possible switches from sequence, including homology, is an important ongoing area of research. We attempt to predict possible secondary structure switches from sequence in proteins using machine learning, specifically a logistic regression approach with 48 N-acetyltransferases as our learning set and 5 sirtuins as our test set. Validated residue binary assignments of 0 (no change in secondary structure) and 1 (change in secondary structure) were determined (DSSP) from 3D X-ray structures for sets of virtually identical chains crystallized under different conditions. Our …


Designing Single Guide Rnas For Crispr/Cas9, Neha Atul Bhagwat May 2019

Designing Single Guide Rnas For Crispr/Cas9, Neha Atul Bhagwat

Master's Projects

Researchers have been working towards development of tools to facilitate regular use genome engineering techniques. In recent years, the focus of these efforts has been the Clustered Regularly Interspaced Short Palindromic Repeats(CRISPR)/CRISPR associated(Cas) systems. These systems, while found naturally in bacteria and archaea as an immunity mechanism, can be used for genome engineering in eukaryotes.

There are three major computational challenges associated with the use of CRISPR/Cas9 in genome engineering for mammals - identification of CRISPR arrays, single guide RNA design and minimizing off-target effects. This project attempts to solve the problem of single guide RNA design using a novel …


Fast High Resolution Image Completion, Chinmay Mishra May 2019

Fast High Resolution Image Completion, Chinmay Mishra

Master's Projects

This paper presents a method for image completion, an active research area in the field of computer vision. The method described in the paper aims at achieving comparable results to other state of the art methods with approximately four and a half times reduction in training time. It is a two step procedure which involves image completion and enhancing the resolution of the completed image. We use the SSIM metric to evaluate the quality of the completed image and to also time our model against other image completion models.


Machine Learning In Crop Classification Of Temporal Multispectral Satellite Image, Ravali Koppaka May 2019

Machine Learning In Crop Classification Of Temporal Multispectral Satellite Image, Ravali Koppaka

Master's Projects

Recently, there has been a remarkable growth in Artificial Intelligence (AI) with

the development of efficient AI models and high-power computational resources for processing complex datasets. There has been a growing number of applications of machine learning in satellite remote sensing image data processing. In this work, machine learning methods were applied for crop classification of temporal multi- spectral satellite image to achieve better prediction of crop-wise area statistics. In India, agriculture has a huge impact on the national economy and most of the critical decisions are dependent on agricultural statistics. Sentinel-2 satellite image data for the Guntur district region …


Detecting Crispr Arrays Using Long-Short Term Memory Network, Shantanu Deshmukh May 2019

Detecting Crispr Arrays Using Long-Short Term Memory Network, Shantanu Deshmukh

Master's Projects

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) is a se- quence found in the DNA sequence of an organism. It provides provides immunity to the organism. Recently, it was found that the CRISPR-based immunity mechanism can be manipulated to perform genome editing. The problem is, it is hard to know the specificity of this system and in turn, making it highly specific is difficult. More re- search is required to improve this CRISPR-based genome editing. Detecting CRISPR arrays in the DNA sequence is the first step towards this research. In this work, a CRISPR array detection pipeline, CRISPRLstm, is proposed. …


Music Mood Classification Using Convolutional Neural Networks, Revanth Akella May 2019

Music Mood Classification Using Convolutional Neural Networks, Revanth Akella

Master's Projects

Grouping music into moods is useful as music is migrating from to online streaming services as it can help in recommendations. To establish the connection between music and mood we develop an end-to-end, open source approach for mood classification using lyrics. We develop a pipeline for tag extraction, lyric extraction, and establishing classification models for classifying music into moods. We investigate techniques to classify music into moods using lyrics and audio features. Using various natural language processing methods with machine learning and deep learning we perform a comparative study across different classification and mood models. The results infer that features …


An Industry Driven Genre Classification Application Using Natural Language Processing, Sharan Duggirala May 2019

An Industry Driven Genre Classification Application Using Natural Language Processing, Sharan Duggirala

Master's Projects

With the advent of digitized music, many online streaming companies such as Spotify have capitalized on a listener’s need for a common stream platform. An essential component of such a platform is the recommender systems that suggest to the constituent user base, related tracks, albums and artists. In order to sustain such a recommender system, labeling data to indicate which genre it belongs to is essential. Most recent academic publications that deal with music genre classification focus on the use of deep neural networks developed and applied within the music genre classification domain. This thesis attempts to use some of …


Learning For Free – Object Detectors Trained On Synthetic Data, Charles Thane Mackay May 2019

Learning For Free – Object Detectors Trained On Synthetic Data, Charles Thane Mackay

Master's Projects

A picture is worth a thousand words, or if you want it labeled, it’s worth about four cents per bounding box. Data is the fuel that powers modern technologies run by artificial intelligence engines which is increasingly valuable in today’s industry. High quality labeled data is the most important factor in producing accurate machine learning models which can be used to make powerful predictions and identify patterns humans may not see. Acquiring high quality labeled data however, can be expensive and time consuming. For small companies, academic researchers, or machine learning hobbyists, gathering large datasets for a specific task that …


Randition: Random Blockchain Partitioning For Write Throughput, David Nguyen May 2019

Randition: Random Blockchain Partitioning For Write Throughput, David Nguyen

Master's Projects

This paper proposes to support dynamic runtime partitioning of Tendermint, which is an in-development state machine replication algorithm that uses the blockchain model to provide Byzantine-fault tolerance. We call this variation Randition. We incorporate recent research from blockchain consensus and replicated state machine partitioning to allow Randition users to partition their blockchain for improved write performance at the cost of some Byzantine fault tolerance. We conduct an experiment to compare the raw write throughput of Randition and Tendermint. Finally, we discuss the experiment results and discuss further improvements to Randition.


Context-Based Multi-Stage Offline Handwritten Mathematical Symbol Recognition Using Deep Learning, Sui Kun Guan May 2019

Context-Based Multi-Stage Offline Handwritten Mathematical Symbol Recognition Using Deep Learning, Sui Kun Guan

Master's Projects

We propose a multi-stage machine learning (ML) architecture to improve the accuracy of offline handwritten mathematical symbol recognition. In the first stage, we train and assemble multiple deep convolutional neural networks to classify isolated mathematical symbols. However, certain ambiguous symbols are hard to classify without the context information of the mathematical expressions where the symbols belong. In the second stage, we train a deep convolutional neural network that further classifies the ambiguous symbols based on the context information of the symbols. To further improve the classification accuracy, in the third stage, we develop a set of rules to classify the …


A Webrtc Video Chat Implementation Within The Yioop Search Engine, Yangcha Ho May 2019

A Webrtc Video Chat Implementation Within The Yioop Search Engine, Yangcha Ho

Master's Projects

Web real-time communication (abbreviated as WebRTC) is one of the latest Web application technologies that allows voice, video, and data to work collectively in a browser without a need for third-party plugins or proprietary software installation. When two browsers from different locations communicate with each other, they must know how to locate each other,

bypass security and firewall protections, and transmit all multimedia communications in real time. This project not only illustrates how WebRTC technology works but also walks through a real example of video chat-style application. The application communicates between two remote users using WebSocket and the data encryption …


Sql Injection Detection Using Machine Learning, Sonali Mishra May 2019

Sql Injection Detection Using Machine Learning, Sonali Mishra

Master's Projects

Sharing information over the Internet over multiple platforms and web-applications has become a quite common phenomenon in the recent times. The web-based applications that accept critical information from users store this information in databases. These applications and the databases connected to them are susceptible to all kinds of information security threats due to being accessible through the Internet. The threats include attacks such as Cross Side Scripting (CSS), Denial of Service Attack (DoS0, and Structured Query Language (SQL) Injection attacks. SQL Injection attacks fall under the top ten vulnerabilities when we talk about web-based applications. Through this kind of attack, …


Poriferal Vision, Saketh Saxena May 2019

Poriferal Vision, Saketh Saxena

Master's Projects

Sponges provide nourishment as well as a habitat for various aquatic organisms. Anatomically, sponges are made up of soft tissue with a silica based exoskeleton which serves both as support and protection for the underlying tissue. The exoskeleton persists after the tissue decomposes, and microscopic parts of the exoskeleton break away to form spicules. Oceanographic studies have shown that the density of the sponge spicules is a good indicator of the sponge population in an area. This measure can be used to study sponge population dynamics over time. The spicule density is measured by imaging spicules from samples of water …


Classification Of Humans Into Ayurvedic Prakruti Types Using Computer Vision, Gayatri Gadre May 2019

Classification Of Humans Into Ayurvedic Prakruti Types Using Computer Vision, Gayatri Gadre

Master's Projects

Ayurveda, a 5000 years old Indian medical science, believes that the universe and hence humans are made up of five elements namely ether, fire, water, earth, and air. The three Doshas (Tridosha) Vata, Pitta, and Kapha originated from the combinations of these elements. Every person has a unique combination of Tridosha elements contributing to a person’s ‘Prakruti’. Prakruti governs the physiological and psychological tendencies in all living beings as well as the way they interact with the environment. This balance influences their physiological features like the texture and colour of skin, hair, eyes, length of fingers, the shape of the …


Using Computer Vision To Quantify Coral Reef Biodiversity, Niket Bhodia May 2019

Using Computer Vision To Quantify Coral Reef Biodiversity, Niket Bhodia

Master's Projects

The preservation of the world’s oceans is crucial to human survival on this planet, yet we know too little to begin to understand anthropogenic impacts on marine life. This is especially true for coral reefs, which are the most diverse marine habitat per unit area (if not overall) as well as the most sensitive. To address this gap in knowledge, simple field devices called autonomous reef monitoring structures (ARMS) have been developed, which provide standardized samples of life from these complex ecosystems. ARMS have now become successful to the point that the amount of data collected through them has outstripped …


Over Speed Detection Using Artificial Intelligence, Samkit Patira May 2019

Over Speed Detection Using Artificial Intelligence, Samkit Patira

Master's Projects

Over speeding is one of the most common traffic violations. Around 41 million people are issued speeding tickets each year in USA i.e one every second. Existing approaches to detect over- speeding are not scalable and require manual efforts. In this project, by the use of computer vision and artificial intelligence, I have tried to detect over speeding and report the violation to the law enforcement officer. It was observed that when predictions are done using YoloV3, we get the best results.


R*-Tree Index In Cassandra For Geospatial Processing, Avinashilingam Nanjappan May 2019

R*-Tree Index In Cassandra For Geospatial Processing, Avinashilingam Nanjappan

Master's Projects

Geospatial data has garnered enough attention in recent times that it is being used everywhere right from simple applications such as booking a taxi ride to complex applications such as autonomous driving. Though the attention towards geospatial processing is something new, substantial research has been going on for years. With the evolution of NoSQL databases in recent times, geospatial processing has attained a new dimension concerning its applications and capability. The most popular NoSQL database to be used for geospatial processing is the MongoDB followed by Cassandra. It is the indexing process that is important concerning the data at hand …


Schema Migration From Relational Databases To Nosql Databases With Graph Transformation And Selective Denormalization, Krishna Chaitanya Mullapudi May 2019

Schema Migration From Relational Databases To Nosql Databases With Graph Transformation And Selective Denormalization, Krishna Chaitanya Mullapudi

Master's Projects

We witnessed a dramatic increase in the volume, variety and velocity of data leading to the era of big data. The structure of data has become highly flexible leading to the development of many storage systems that are different from the traditional structured relational databases where data is stored in “tables,” with columns representing the lowest granularity of data. Although relational databases are still predominant in the industry, there has been a major drift towards alternative database systems that support unstructured data with better scalability leading to the popularity of “Not Only SQL.”

Migration from relational databases to NoSQL databases …


Robust Lightweight Object Detection, Siddharth Kumar May 2019

Robust Lightweight Object Detection, Siddharth Kumar

Master's Projects

Object detection is a very challenging problem in computer vision and has been a prominent subject of research for nearly three decades. There has been a promising in- crease in the accuracy and performance of object detectors ever since deep convolutional networks (CNN) were introduced. CNNs can be trained on large datasets made of high resolution images without flattening them, thereby using the spatial information. Their superior learning ability also makes them ideal for image classification and object de- tection tasks. Unfortunately, this power comes at the big cost of compute and memory. For instance, the Faster R-CNN detector required …


Improving Steering Ability Of An Autopilot In A Fully Autonomous Car, Shivanku Mahna May 2019

Improving Steering Ability Of An Autopilot In A Fully Autonomous Car, Shivanku Mahna

Master's Projects

The world we live in is developing at a really rapid pace and along with it is developing the technology that we use. We have clearly come a long way from calling a car modern because it had a touch screen infotainment system to calling it modern because it drives on its own. The progress has been so rapid that it demands for us to analyze this and try to improvise a small part of this journey. With the same thought in mind, this project focuses on improvising the steering ability of an autonomous car. In order to make more …