Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 63

Full-Text Articles in Physical Sciences and Mathematics

Pattern Discovery In Dna Using Stochastic Automata, Shweta Shweta Dec 2015

Pattern Discovery In Dna Using Stochastic Automata, Shweta Shweta

Master's Projects

We consider the problem of identifying similarities between different species of DNA. To do this we infer a stochastic finite automata from a given training data and compare it with a test data. The training and test data consist of DNA sequence of different species. Our method first identifies sentences in DNA. To identify sentences we read DNA sequence one character at a time, 3 characters form a codon and codons form proteins (also known as amino acid chains).Each amino acid in proteins belongs to a group. In total we have 5 groups’ polar, non-polar, acidic, basic and stop codons. …


Cryptanalysis Of The Purple Cipher Using Random Restarts, Aparna Shikhare Oct 2015

Cryptanalysis Of The Purple Cipher Using Random Restarts, Aparna Shikhare

Master's Projects

Cryptanalysis is the process of trying to analyze ciphers, cipher text, and crypto systems, which may exploit any loopholes or weaknesses in the systems, leading us to an understanding of the key used to encrypt the data. This project uses Expectation Maximization (EM) approach using numerous restarts to attack decipherment problems such as the Purple Cipher. In this research, we perform cryptanalysis of the Purple cipher using genetic algorithms and hidden Markov models (HMM). If the Purple cipher has a fixed plugboard, we show that genetic algorithms are successful in retrieving the plaintext from cipher text with high accuracy. On …


Implementing Type Inference In Jedi, Vaibhav Kamble Oct 2015

Implementing Type Inference In Jedi, Vaibhav Kamble

Master's Projects

This thesis begins with an overview of type systems: evolution, concepts, and problems. This survey is based on type systems of modern languages like Scala and Haskell. Scala has a very sophisticated type system that includes generics, polymorphism, and closures. It has a built-in type inference mechanism that enables the programmer to exclude certain type annotations. It is often not required in Scala to mention the type of a variable because the compiler can infer the type from the initialization of the variable. Study of such type system is demonstrated by the implementation of the type system. A type system …


Predicting Autism Over Large-Scale Child Dataset, Arpit Arya Oct 2015

Predicting Autism Over Large-Scale Child Dataset, Arpit Arya

Master's Projects

Data Analytics and Machine learning in healthcare are one of the most emerging and needed fields in current time. Also, a lot of research has been performed and is still being done in this field. In healthcare, gone are those days when only doctor examines and patient listens. Now doctor has a lot of technologies which can assist him and help in accurately diagnosing the disease with which his patient is suffering. The backbone of such technologies is data analytics and machine learning where we can make out a lot of inferences from tons of patients‟ data already available. This …


Bitfed, A Centralized Cryptocurrency With Distributed Miners, Shruti Sharma Oct 2015

Bitfed, A Centralized Cryptocurrency With Distributed Miners, Shruti Sharma

Master's Projects

Bitcoin is a decentralized peer-to-peer electronic currency wherein all the payments are sent from one transactor to another directly [1]. Financial institutions are not present in the protocol, hence, there are lower processing fees. The distributed nature provides resilience to Bitcoin transactions, and it operates on mathematical principles and cryptographic proofs. As per Bitcoin generation algorithm, the number of bitcoins in existence will never surpass 21 million, which will lead to deflation and encourage hoarding. In this project, we have implemented a Bitcoin-like currency in order to mitigate the issue of deflation [7]. The idea for our protocol is based …


A Javascript And Php Epub Reader Web Application, Xiaqing He Oct 2015

A Javascript And Php Epub Reader Web Application, Xiaqing He

Master's Projects

EPUB is one of the most popular ebook formats. It is supported by many e-Readers, such as Apple’s iBooks, BlackBerry Playbooks, Sony Reader, Kobo eReader, Amazon Kindle Fire, and the Mozilla Firefox add-on EPUBReader. In this report, we describe our implementation of a JavaScript and PHP EPUB reader web application. This web application allows users to read the EPUB format books easily across multiple devices without any specified platform lock-in. Our application provides an EPUB library. When a user logs in, he can borrow an EPUB book from the library and save into his own bookshelf, then choose any one …


A Completely Covert Audio Channel In Android, Sukanya Thakur Oct 2015

A Completely Covert Audio Channel In Android, Sukanya Thakur

Master's Projects

Exfilteration of private data is a potential security threat against mobile devices. Previous research concerning such threats has generally focused on techniques that are only valid over short distances (NFC, Bluetooth, electromagnetic emanations, and so on). In this research, we develop and analyze an exfilteration attack that has no distance limitation. Specifically, we take advantage of vulnerabilities in Android that enable us to covertly record and exfilterate a voice call. This paper presents a successful implementation of our attack, which records a call (both uplink and downlink voice streams), and inaudibly transmits the recorded voice over a subsequent inaudible call, …


Recommendation System Using Collaborative Filtering, Yunkyoung Lee Oct 2015

Recommendation System Using Collaborative Filtering, Yunkyoung Lee

Master's Projects

Collaborative filtering is one of the well known and most extensive techniques in recommendation system its basic idea is to predict which items a user would be interested in based on their preferences. Recommendation systems using collaborative filtering are able to provide an accurate prediction when enough data is provided, because this technique is based on the user’s preference. User-based collaborative filtering has been very successful in the past to predict the customer’s behavior as the most important part of the recommendation system. However, their widespread use has revealed some real challenges, such as data sparsity and data scalability, with …


Pattern-Aided Regression Modelling And Prediction Model Analysis, Naresh Avva Oct 2015

Pattern-Aided Regression Modelling And Prediction Model Analysis, Naresh Avva

Master's Projects

In this research, we develop an application for generating a pattern aided regression (PXR) model, a new type of regression model designed to represent accurate and interpretable prediction model. Our goal is to generate a PXR model using Contrast Pattern Aided Regression (CPXR) method and compare it with the multiple linear regression method. The PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. CPXR is especially effective for high-dimensional data. We use pruning to improve the classification accuracy and to remove outliers from the dataset. We provide implementation details and give …


Question Answering System For Yioop, Niravkumar Patel Oct 2015

Question Answering System For Yioop, Niravkumar Patel

Master's Projects

Yioop is an open source search engine developed and managed by Dr. Christopher Pollett. Currently, Yioop returns the search results of the query in the form of list of URLs, just like other search engines (Google, Bing, DuckDuckGo, etc.) This paper created a new module for Yioop. This new module, known as the Question-Answering (QA) System, takes the search queries in the form of natural language questions and returns results in the form of a short answer that is appropriate to the question asked. This feature is achieved by implementing various functionalities of Natural Language Processing (NLP). By using NLP, …


Ssct Score For Malware Detection, Srividhya Srinivasan Oct 2015

Ssct Score For Malware Detection, Srividhya Srinivasan

Master's Projects

Metamorphic malware transforms its internal structure when it propagates, making detection of such malware a challenging research problem. Previous research considered a score based on simple substitution cryptanalysis, which was applied to the metamorphic detection problem. In this research, we analyze a new score based on a combined simple substitution and column transposition (SSCT) cryptanalysis. We show that this SSCT score significantly outperforms the simple substitution score— and other malware detection scores—in many cases.


Metamorphic Java Engine, Sailee Choudhary Oct 2015

Metamorphic Java Engine, Sailee Choudhary

Master's Projects

Malware is a software program outlined to damage or perform other unwanted actions to a computer system. Metamorphic malware is a category of malignant software programs that has the ability to change its code as it propagates. A hidden Markov model (HMM) is a statistical model where the system is assumed to be a Markov process with unseen states. An HMM is based on the use of statistics to detect patterns, and hence in metamorphic virus detection. Previous work has been done in order to create morphing engines using LLVM-bytecode format. This project includes the creation of a morphing engine …


Predicting 'Attention Deficit Hyperactive Disorder' Using Large Scale Child Data Set, Arpi Shah Oct 2015

Predicting 'Attention Deficit Hyperactive Disorder' Using Large Scale Child Data Set, Arpi Shah

Master's Projects

Attention deficit hyperactivity disorder (ADHD) is a disorder found in children affecting about 9.5% of American children aged 13 years or more. Every year, the number of children diagnosed with ADHD is increasing. There is no single test that can diagnose ADHD. In fact, a health practitioner has to analyze the behavior of the child to determine if the child has ADHD. He has to gather information about the child, and his/her behavior and environment. Because of all these problems in diagnosis, I propose to use Machine Learning techniques to predict ADHD by using large scale child data set. Machine …


Neural Network Captcha Cracker, Geetika Garg Oct 2015

Neural Network Captcha Cracker, Geetika Garg

Master's Projects

NEURAL NETWORK CAPTCHA CRACKER A CAPTCHA (acronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used to determine whether or not a user providing the response is human. In this project, we used a deep neural network framework for CAPTCHA recognition. The core idea of the project is to learn a model that breaks image-based CAPTCHAs. We used convolutional neural networks and recurrent neural networks instead of the conventional methods of CAPTCHA breaking based on segmenting and recognizing a CAPTCHA. Our models consist of two convolutional layers to learn image …


Author Recognition Using Locality Sensitive Hashing & Alergia (Stochastic Finite Automata), Prashanth Sandela Oct 2015

Author Recognition Using Locality Sensitive Hashing & Alergia (Stochastic Finite Automata), Prashanth Sandela

Master's Projects

In today’s world data grows very fast. It is difficult to answer questions like 1) Is the content completely written by this author, 2) Did he get few sentences or pages from another author, 3) Is there any way to identify actual author. There are many plagiarism software’s available in the market which identify duplicate content. It doesn’t understand writing pattern involved. There is always a necessity to make an effort to find the original author. Locality sensitive hashing is one such standard for applying hashing to recognize authors writing pattern.


A Secured Cloud System Based On Log Analysis, Sindhusha Doddapaneni Oct 2015

A Secured Cloud System Based On Log Analysis, Sindhusha Doddapaneni

Master's Projects

Now-a-days, enterprises’ acceptance over the Cloud is increasing but businesses are now finding issues related to security. Everyday, users store a large amount of data in the Cloud and user input may be malicious. Therefore, security has become the critical feature in the applications stored in the Cloud. Though there are many existing systems which provide us different encryption algorithms and security methods, there is still a possibility of attacks to applications and increasing data modifications. The idea behind this project is to find attacks and protect the applications stored in the Cloud using log analysis. The proposed solution detects …


Extensible Authentication Protocol Vulnerabilities And Improvements, Akshay Baheti Oct 2015

Extensible Authentication Protocol Vulnerabilities And Improvements, Akshay Baheti

Master's Projects

Extensible Authentication Protocol(EAP) is a widely used security protocol for Wireless networks around the world. The project examines different security issues with the EAP based protocols, the family of security protocols for Wireless LAN. The project discovers an attack on the subscriber identity module(SIM) based extension of EAP. The attack is a Denial-of-Service attack that exploits the error handling mechanism in EAP protocols. The project further proposes countermeasures for detection and a defense against the discovered attack. The discovered attack can be prevented by changing the protocol to delay the processing of protocol error messages.


Collaboration Prototyper: Automatic Generation Of Prototypes From Uml Collaborations, Ramya Badthody Shenoy Oct 2015

Collaboration Prototyper: Automatic Generation Of Prototypes From Uml Collaborations, Ramya Badthody Shenoy

Master's Projects

The thesis begins with a discussion of the use of designing versus prototyping in the initial stages of software development lifecycle. We present the idea of generating a working prototype directly from a UML analysis model to overcome time spent on creating a prototype. UML diagrams can be made to give sufficient information to generate code. Collaborations form the backbone of analysis models. A collaboration can be defined as a UML class diagram together with one or more sequence diagrams. Collaborations are the input of the Collaboration Prototyper (CP). CP transforms a collaboration into a Java prototype.


Pattern-Driven Programming In Scala, Huaxin Pang Oct 2015

Pattern-Driven Programming In Scala, Huaxin Pang

Master's Projects

This is an experimental exploration of the pattern-driven programming paradigm—the sole use of pattern matching to determine the next instruction or execute. We define a pure pattern-driven programming language named PA-Scala by defining a subset of the Scala programming language, which restricts sequence control to the powerful pattern matching facilities in Scala. We use PA-Scala to explore the strengths and limitations of pattern-driven programming. By implementing a phrase structure grammar solver in PA-Scala, we show that pattern-driven programming can be used to solve general computation problems. We then implement a Prolog interpreter in PA-Scala, which demonstrates how resolution and unification …


Scalable Techniques For Similarity Search, Siddartha Reddy Nagireddy Oct 2015

Scalable Techniques For Similarity Search, Siddartha Reddy Nagireddy

Master's Projects

Document similarity is similar to the nearest neighbour problem and has applications in various domains. In order to determine the similarity / dissimilarity of the documents first they need to be converted into sets containing shingles. Each document is converted into k-shingles, k being the length of each shingle. The similarity is calculated using Jaccard distance between sets and output into a characteristic matrix, the complexity to parse this matrix is significantly high especially when the sets are large. In this project we explore various approaches such as Min hashing, LSH & Bloom Filter to decrease the matrix size and …


Load Balancing For Entity Matching Over Big Data Using Sorted Neighborhood, Yogesh Wattamwar Oct 2015

Load Balancing For Entity Matching Over Big Data Using Sorted Neighborhood, Yogesh Wattamwar

Master's Projects

Entity matching also known as entity resolution, duplicate identification, reference reconciliation or record linkage and is a critically important task for data cleaning and data integration. One can think of it, as the task of finding entities matching to the same entity in the real world. These entities can belong to a single source of data, or distributed data-sources. It takes structured data as an input and process includes comparison of that structured data (entity or database record) with entities present in the knowledge base. For large-scale entity, matching data has to go through some sequence of steps, which includes …


Graph Basesd Word Sense Disambiguation For Clinical Abbreviations Using Apache Spark, Veebha Padavkar Oct 2015

Graph Basesd Word Sense Disambiguation For Clinical Abbreviations Using Apache Spark, Veebha Padavkar

Master's Projects

Identification of the correct sense for an ambiguous word is one of the major challenges for language processing in all domains. Word Sense Disambiguation is the task of identifying the correct sense of an ambiguous word by referencing the surrounding context of the word. Similar to the narrative documents, clinical documents suffer from ambiguity issues that impact automatic extraction of correct sense from the document. In this project, we propose a graph-based solution based on an algorithm originally implemented by Osmar R. Zaine et al. for word sense disambiguation specifically focusing on clinical text. The algorithm makes use of proposed …


Energy Efficiency And Quality Of Services In Virtualized Cloud Radio Access Network, Khushbu Mohta Oct 2015

Energy Efficiency And Quality Of Services In Virtualized Cloud Radio Access Network, Khushbu Mohta

Master's Projects

Cloud Radio Access Network (C-RAN) is being widely studied for soft and green fifth generation of Long Term Evolution - Advanced (LTE-A). The recent technology advancement in network virtualization function (NFV) and software defined radio (SDR) has enabled virtualization of Baseband Units (BBU) and sharing of underlying general purpose processing (GPP) infrastructure. Also, new innovations in optical transport network (OTN) such as Dark Fiber provides low latency and high bandwidth channels that can support C-RAN for more than forty-kilometer radius. All these advancements make C-RAN feasible and practical. Several virtualization strategies and architectures are proposed for C-RAN and it has …


Measuring Malware Evolution, Poonkodi Ponnambalam Oct 2015

Measuring Malware Evolution, Poonkodi Ponnambalam

Master's Projects

In this research, we simulate the effect of code evolution by applying a variety of code morphing strategies. Specifically, we consider code substitution, transposition, insertion, and deletion. We then analyze the effect of these code morphing strategies relative to a variety of malware scores that have been considered in previous research. Our goal is to gain a better understanding of the strengths and weaknesses of these various malware scoring techniques. This research should prove useful in designing more robust scores for detecting malware.


A Recommendation Engine Using Apache Spark, Swapna Kulkarni Oct 2015

A Recommendation Engine Using Apache Spark, Swapna Kulkarni

Master's Projects

The volume of structured and unstructured data has grown at exponential scale in recent days. As a result of this rapid data growth, we are always inundated with plethora of choices in any product or service. It is very natural to get lost in the amazon of such choices and finding hard to make decisions. The project aims at addressing this problem by using entity recommendation. The two main aspects that the project concentrates on are implementing and presenting more accurate entity recommendations to the user and another is dealing with vast amount of data. The project aims at presenting …


Designing A Programming Contract Library For Java, Neha Rajkumar Oct 2015

Designing A Programming Contract Library For Java, Neha Rajkumar

Master's Projects

Programmers are now developing large and complex software systems, so it’s important to have software that is consistent, efficient, and robust. Programming contracts allow developers to specify preconditions, postconditions, and invariants in order to more easily identify programming errors. The design by contract principle [1] was first used in the Eiffel programming language [2], and has since been extended to libraries in many other languages. The purpose of my project is to design a programming contract library for Java. The library supports a set of preconditions, postconditions, and invariants that are specified in Java annotations. It incorporates contract checking for …


Sharedwealth: A Cryptocurrency To Reward Miners Evenly, Siddiq Ahmed Syed Oct 2015

Sharedwealth: A Cryptocurrency To Reward Miners Evenly, Siddiq Ahmed Syed

Master's Projects

Bitcoin [19] is a decentralized cryptocurrency that has recently gained popularity and has emerged as a popular medium of exchange. The total market capitalization is around 1.5 billion US dollars as of October 2013 [28]. All the operations of Bitcoin are maintained in a distributed public global ledger known as a block chain which consists of all the successful transactions that have ever taken place. The security of a block chain is maintained by a chain of cryptographic puzzles solved by participants called miners, who in return are rewarded with bitcoins. To be successful, the miner has to put in …


Function Call Graph Score For Malware Detection, Deebiga Rajeswaran Oct 2015

Function Call Graph Score For Malware Detection, Deebiga Rajeswaran

Master's Projects

Metamorphic malware changes its internal structure with each infection, while maintaining its core functionality. Detecting such malware is a challenging research problem. Function call graph analysis has previously shown promise in detecting such malware. In this research, we analyze the robustness of a function call graph score with respect to various code morphing strategies. We also consider modifications of the score that make it more robust in the face of such morphing.


Clustering Web Concepts Using Algebraic Topology, Harleen Kaur Ahuja Oct 2015

Clustering Web Concepts Using Algebraic Topology, Harleen Kaur Ahuja

Master's Projects

In this world of Internet, there is a rapid amount of growth in data both in terms of size and dimension. It consists of web pages that represents human thoughts. These thoughts involves concepts and associations which we can capture. Using mathematics, we can perform meaningful clustering of these pages. This project aims at providing a new problem solving paradigm known as algebraic topology in data science. Professor Vasant Dhar, Editor-In-Chief of Big Data (Professor at NYU) define data science as a generalizable extraction of knowledge from data. The core concept of semantic based search engine project developed by my …


Entity And Relational Queries Over Big Data Storage, Nachappa Achakalera Ponnappa Oct 2015

Entity And Relational Queries Over Big Data Storage, Nachappa Achakalera Ponnappa

Master's Projects

Big data storage involves using NoSQL technologies to handle and process huge volumes of data. NoSQL databases are non-relational, schema-free where data is stored as key-value pairs. The aim of the thesis is to implement Entity and Relational queries on top of Big Data storage. In order to achieve this, we use NoSQL technologies like MongoDB and HBase. We implement various methodologies and solutions on top of MongoDB and HBase to map data across different tables and implement entity and relational queries to retrieve entities from huge volumes of data. We also measure the performance of both the technologies and …