Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Master's Projects

Theses/Dissertations

2016

Discipline
Keyword

Articles 1 - 30 of 32

Full-Text Articles in Physical Sciences and Mathematics

Real-Time Online Chinese Character Recognition, Wenlong Zhang Dec 2016

Real-Time Online Chinese Character Recognition, Wenlong Zhang

Master's Projects

In this project, I built a web application for handwritten Chinese characters recognition in real time. This system determines a Chinese character while a user is drawing/writing it. The techniques and steps I use to build the recognition system include data preparation, preprocessing, features extraction, and classification. To increase the accuracy, two different types of neural networks ared used in the system: a multi-layer neural network and a convolutional neural network.


Cryptanalysis Of Homophonic Substitution Cipher Using Hidden Markov Models, Guannan Zhong Dec 2016

Cryptanalysis Of Homophonic Substitution Cipher Using Hidden Markov Models, Guannan Zhong

Master's Projects

We investigate the effectiveness of a Hidden Markov Model (HMM) with random restarts as a mean of breaking a homophonic substitution cipher. Based on extensive experiments, we find that such an HMM-based attack outperforms a previously de- veloped nested hill climb approach, particularly when the ciphertext message is short. We then consider a combination cipher, consisting of a homophonic substitution and a column transposition. We develop and analyze an attack on such a cipher. This attack employs an HMM (with random restarts), together with a hill climb to recover the column permutation. We show that this attack can succeed on …


Deep Data Analysis On The Web, Xuanyu Liu Dec 2016

Deep Data Analysis On The Web, Xuanyu Liu

Master's Projects

Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only …


Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio Dec 2016

Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio

Master's Projects

In this research, we predict User's Future Request using Data Mining Algorithm. Usage of the World Wide Web has resulted in a huge amount of data and handling of this data is getting hard day by day. All this data is stored as Web Logs and each web log is stored in a different format with different Field names like search string, URL with its corresponding timestamp, User ID’s that helps for session identification, Status code, etc. Whenever a user requests for a URL there is a delay in getting the page requested and sometimes the request is denied. Our …


Handling Relationships In A Wiki System, Yashi Kamboj Dec 2016

Handling Relationships In A Wiki System, Yashi Kamboj

Master's Projects

Wiki software enables users to manage content on the web, and create or edit web pages freely. Most wiki systems support the creation of hyperlinks on pages and have a simple text syntax for page formatting. A common, more advanced feature is to allow pages to be grouped together as categories. Currently, wiki systems support categorization of pages in a very traditional way by specifying whether a wiki page belongs to a category or not. Categorization represents unary relationship and is not sufficient to represent n-ary relationships, those involving links between multiple wiki pages.

In this project, we extend Yioop, …


Web-Based Integrated Development Environment, Hien T. Vu Dec 2016

Web-Based Integrated Development Environment, Hien T. Vu

Master's Projects

As tablets become more powerful and more economical, students are attracted to them and are moving away from desktops and laptops. Their compact size and easy to use Graphical User Interface (GUI) reduce the learning and adoption barriers for new users. This also changes the environment in which undergraduate Computer Science students learn how to program. Popular Integrated Development Environments (IDE) such as Eclipse and NetBeans require disk space for local installations as well as an external compiler. These requirements cannot be met by current tablets and thus drive the need for a web-based IDE. There are also many other …


Analyzing Clustered Web Concepts With Homology, Eric Nam Jul 2016

Analyzing Clustered Web Concepts With Homology, Eric Nam

Master's Projects

As data is being mined more and more from the Internet today, Data Science has become an important field of computing to make that data useful. Data Science allows people to turn all of that data into structured knowledge that is easily utilized, validated, and understandable. There are many known theories to analyze data, but this project will focus on a recently introduced method: analyzing text data with homology from mathematics to understand relationships between keyword-sets.

Using structures of algebraic topology as a starting point, keyword-sets in the text are represented by simplexes based on what they are and what …


Dna Analysis Using Grammatical Inference, Cory Cook Jun 2016

Dna Analysis Using Grammatical Inference, Cory Cook

Master's Projects

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.

An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.

Testing shows that the accuracy of inferred languages for …


Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi Jun 2016

Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi

Master's Projects

Based on Kolmogorov Complexity, a finite set x of strings has a pattern if the set x can be output by a Turing machine of length that is less than minimum of all |x|; this Turing machine, that may not be unique, is called a pattern of the finite set of string. In order to find a pattern of a given finite set of strings (assuming such a pattern exists), the ALERGIA algorithm is used to approximate such a pattern (Turing machine) in terms of finite automata. Note that each finite automaton defines a partition on formal language Σ*, ALERGIA …


Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le Jun 2016

Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le

Master's Projects

This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …


Static And Dynamic Analysis For Android Malware Detection, Ankita Kapratwar Jun 2016

Static And Dynamic Analysis For Android Malware Detection, Ankita Kapratwar

Master's Projects

Static analysis relies on features extracted without executing code, while dynamic analysis extracts features based on code execution (or emulation). In general, static analysis is more e cient, while static analysis is often more informative, particularly in cases of highly obfuscated code. Static analysis of an Android application can rely on features extracted from the manifest le or the Java bytecode, while dynamic analysis of Android applications can deal with features involving dynamic code loading and system calls that are collected while the application is running. In this research, we analyzed the e ectiveness of combining static and dynamic features …


VigenèRe Score For Malware Detection, Suchita Deshmukh Jun 2016

VigenèRe Score For Malware Detection, Suchita Deshmukh

Master's Projects

Previous research has applied classic cryptanalytic techniques to the malware detection problem. Speci cally, scores based on simple substitution cipher cryptanal- ysis and various generalizations have been considered. In this research, we analyze two new malware scoring techniques based on classic cryptanalysis. Our rst ap- proach relies on the Index of Coincidence, which is used, for example, to determine the length of the keyword in a Vigenère ciphertext. We also consider a score based on a more complete cryptanalysis of a Vigenère cipher. We nd that the Vigenère score is competitive with previous statistical-based malware scores.


Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen Jun 2016

Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen

Master's Projects

Pattern recognition is a field of machine learning with applications to areas such as text recognition and computer vision. Machine learning algorithms, such as convolutional neural networks, may be trained to classify images. However, such tasks may be computationally intensive for a commercial computer for larger volumes or larger sizes of images. Cloud computing allows one to overcome the processing and memory constraints of average commercial computers, allowing computations on larger amounts of data. In this project, we developed a system for detection and tracking of moving human and vehicle objects in videos in real time or near real time. …


Image Spam Analysis, Annapurna Sowmya Annadatha Jun 2016

Image Spam Analysis, Annapurna Sowmya Annadatha

Master's Projects

Image spam is unsolicited bulk email, where the message is embedded in an image. This technique is used to evade text-based spam lters. In this research, we analyze and compare two novel approaches for detecting spam images. Our rst approach focuses on the extraction of a broad set of image features and selection of an optimal subset using a Support Vector Machine (SVM). Our second approach is based on Principal Component Analysis (PCA), where we determine eigenvectors for a set of spam images and compute scores by projecting images onto the resulting eigenspace. Both approaches provide high accuracy with low …


Defeating N-Gram Scores For Http Attack Detection, Samyuktha Sridharan Jun 2016

Defeating N-Gram Scores For Http Attack Detection, Samyuktha Sridharan

Master's Projects

Web applications that generate malicious HTTP requests provide a platform that attackers use to exploit vulnerable machines. Such malicious traffic should be identified by network intrusion detection systems, based on traffic analysis. Previous research has shown that n-gram techniques can be successfully applied to detect HTTP attacks. In this research, we analyze the robustness of these n-gram techniques. We show that n-gram scores are surprisingly robust, but can be defeated using certain obfuscation strategies. We also consider the need for a more costlier HMM-based intrusion detection system.


Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala Jun 2016

Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala

Master's Projects

In recent year’s document management tasks (known as information retrieval) increased a lot due to availability of digital documents everywhere. The need of automatic methods for extracting document information became a prominent method for organizing information and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. In my research classification of text is mainly focused on sentiment label classification. The idea proposed for sentiment analysis is multi-class classification of online movie reviews. Many research papers discussed the classification of sentiment either positive or …


Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy Jun 2016

Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy

Master's Projects

Digital information available on the Internet is increasing day by day. As a result of this, the demand for tools that help people in finding and analyzing all these resources are also growing in number. Text Classification, in particular, has been very useful in managing the information. Text Classification is the process of assigning natural language text to one or more categories based on the content. It has many important applications in the real world. For example, finding the sentiment of the reviews, posted by people on restaurants, movies and other such things are all applications of Text classification. In …


Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala May 2016

Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala

Master's Projects

Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: …


Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali May 2016

Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali

Master's Projects

Entity matching is the process of identifying different manifestations of the same real world entity. These entities can be referred to as objects(string) or data instances. These entities are in turn split over several databases or clusters based on the signatures of the entities. When entity matching algorithms are performed on these databases or clusters, there is a high possibility that a particular entity pair is compared more than once. The number of comparison for any two entities depend on the number of common signatures or keys they possess. This effects the performance of any entity matching algorithm. This paper …


Tracking User Activity While Safeguarding Data From Attackers, Justin Dahmubed May 2016

Tracking User Activity While Safeguarding Data From Attackers, Justin Dahmubed

Master's Projects

Companies constantly look for ways to better understand customer activity on their websites. Website owners may want to be able to analyze customer activity without having to concern themselves with a government agency forcing them to reveal their information. Multiple analytical tools have been created, most notably Google Analytics.

In my thesis, I demonstrate how analytics data can be stored so that only the site owners can view the data about their customers. With my design, even the analytics site itself cannot decrypt the data after a given window of time has elapsed. The novel aspect of my design is …


Movie Script Shot Lister, David Robert Smith May 2016

Movie Script Shot Lister, David Robert Smith

Master's Projects

The making of a motion picture almost always starts with the script, the written version of a story envisioned within the mind of its creator. The script is then broken down into shots. Each individual shot is filmed and then they are edited together to create the motion picture. The goal of the Movie Script Shot Lister thesis project is to be able to read in a script for a movie or television show, and automatically generate a shot list. While a script is text, a shot list is the blue print for how to visualize that script, so the …


Library Writers Reward Project, Saravana Kumar Gajendran May 2016

Library Writers Reward Project, Saravana Kumar Gajendran

Master's Projects

Open-source library development exploits the distributed intelligence of participants in Internet communities. Nowadays, contribution to the open-source community is fading [16] (Stackalytics, 2016) as there is not much recognition for library writers. They can start exploring ways to generate revenue as they actively contribute to the open-source community.

This project helps library writers to generate revenue in the form of bitcoins for their contribution. Our solution to generate revenue for library writers is to integrate bitcoin mining with existing JavaScript libraries, such as jQuery. More use of the library leads to more revenue for the library writers. It uses the …


Malicious Javascript Detection Using Statistical Language Model, Anumeha Shah May 2016

Malicious Javascript Detection Using Statistical Language Model, Anumeha Shah

Master's Projects

The Internet has an immense importance in our day to day life, but at the same time, it has become the medium of infecting computers, attacking users, and distributing malicious code. As JavaScript is the principal language of client side pro- gramming, it is frequently used in conducting such attacks. Various approaches have been made to overcome the JavaScript security issues. Some advanced approaches utilize machine learning technology in combination with de-obfuscation and emula- tion. Many methods of analysis incorporate static analysis and dynamic analysis. Our solution is entirely based on static analysis, which avoids unnecessary runtime overhead.

The central …


Multiple Sequence Alignment With Pro Le Hidden Markov Models, Shubhangi Rakhonde May 2016

Multiple Sequence Alignment With Pro Le Hidden Markov Models, Shubhangi Rakhonde

Master's Projects

The human genome consists of various patterns and sequences that are of biolog- ical signi cance. Capturing these patterns can help us in resolving various mysteries related to the genome, like how genomes evolve, how diseases occur due to genetic mutation, how viruses mutate to cause new disease and what is the cure for these diseases. All these applications are covered in the study of bioinformatics.

One of the very common tasks in bioinformatics involves simultaneous alignment of a number of biological sequences. In bioinformatics, this is widely known as Mul- tiple Sequence Alignment. Multiple sequence alignments help in grouping …


Processing Posting Lists Using Opencl, Radha Kotipalli May 2016

Processing Posting Lists Using Opencl, Radha Kotipalli

Master's Projects

One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements.

Some of the critical functions in search engines are resource-intensive in terms of processing power, …


Hive - An Agent Based Modeling Framework, Roohi Bharti May 2016

Hive - An Agent Based Modeling Framework, Roohi Bharti

Master's Projects

This thesis begins by defining agent based modeling. Agent based models are used to model the emergent behavior of complex systems with many interacting components, known as agents. Several model examples are given using NetLogo, which is a popular agent-based modeling platform. A model of concurrent computation is described that uses message passing as the only form of communication between the model’s components, which are called actors. The model is called an actor model. Actors are primitive objects of concurrency in an actor model. In particular, we describe the actor model implemented by Akka, which is Scala’s new actor library. …


Secure Declassification In Faceted Javascript, Tam Wing May 2016

Secure Declassification In Faceted Javascript, Tam Wing

Master's Projects

Information leaks currently represent a major security vulnerability. Malicious code, when injected into a trusted environment and executed in the context of the victim’s privileges, often results in the loss of sensitive information. To address this security issue, this paper focuses on the idea of information flow control using faceted execution [3]. This mechanism allows the interpreter to efficiently keep track of variables across multiple security levels, achieving termination-insensitive non-interference (TINI). With TINI, a program can only leak one bit of data, caused by the termination of a program. One key benefit of having faceted execution is that flow policy …


Interactive Computer Science Exercises In Edx, Hong Le May 2016

Interactive Computer Science Exercises In Edx, Hong Le

Master's Projects

This project focuses on improving online learning courses for Computer Science. My approach is to create a platform in which interactive exercises can be implemented for students to work on. Methodology includes creating plugins for interactive exercises using XBlock, a component architecture for building independent online courses on edX. The exercises are based on existing exercises like CodeCheck and Wiley’s InterActivities Exercise System. In order to integrate these exercises, I implemented CodeCheck XBlock and Interactive XBlock. These Xblocks allow students to work on interactive exercises on edX, and instructors to view and download students’ submissions.


Detection Of Locations Of Key Points On Facial Images, Manoj Gyanani May 2016

Detection Of Locations Of Key Points On Facial Images, Manoj Gyanani

Master's Projects

In field of computer vision research, One of the most important branch is Face recognition. It targets at finding size and location of human face on digital image, by identifying and separating faces from the surrounding objects like building, plants etc. For the purpose of developing an advanced face recognition algorithm, Detection of facial key points is the basic and very important task, basically it is about finding out the location of specific key points on facial images. This key points can be mouths, noses, left eyes, right eyes and so on.

For implementation of solution, I have used amazon …


Taint And Information Flow Analysis Using Sweet.Js Macros, Prakasam Kannan May 2016

Taint And Information Flow Analysis Using Sweet.Js Macros, Prakasam Kannan

Master's Projects

JavaScript has been the primary language for application development in browsers and with the advent of JIT compilers, it is increasingly becoming popular on server side development as well. However, JavaScript suffers from vulnerabilities like cross site scripting and malicious advertisement code on the the client side and on the server side from SQL injection.

In this paper, we present a dynamic approach to efficiently track information flow and taint detection to aid in mitigation and prevention of such attacks using JavaScript based hygienic macros. We use Sweet.js and object proxies to override built-in JavaScript operators to track information flow …