Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 60 of 66

Full-Text Articles in Physical Sciences and Mathematics

Entity And Relational Queries Over Big Data Storage, Nachappa Achakalera Ponnappa Oct 2015

Entity And Relational Queries Over Big Data Storage, Nachappa Achakalera Ponnappa

Master's Projects

Big data storage involves using NoSQL technologies to handle and process huge volumes of data. NoSQL databases are non-relational, schema-free where data is stored as key-value pairs. The aim of the thesis is to implement Entity and Relational queries on top of Big Data storage. In order to achieve this, we use NoSQL technologies like MongoDB and HBase. We implement various methodologies and solutions on top of MongoDB and HBase to map data across different tables and implement entity and relational queries to retrieve entities from huge volumes of data. We also measure the performance of both the technologies and …


Predicting Autism Over Large-Scale Child Dataset, Arpit Arya Oct 2015

Predicting Autism Over Large-Scale Child Dataset, Arpit Arya

Master's Projects

Data Analytics and Machine learning in healthcare are one of the most emerging and needed fields in current time. Also, a lot of research has been performed and is still being done in this field. In healthcare, gone are those days when only doctor examines and patient listens. Now doctor has a lot of technologies which can assist him and help in accurately diagnosing the disease with which his patient is suffering. The backbone of such technologies is data analytics and machine learning where we can make out a lot of inferences from tons of patients‟ data already available. This …


Load Balancing For Entity Matching Over Big Data Using Sorted Neighborhood, Yogesh Wattamwar Oct 2015

Load Balancing For Entity Matching Over Big Data Using Sorted Neighborhood, Yogesh Wattamwar

Master's Projects

Entity matching also known as entity resolution, duplicate identification, reference reconciliation or record linkage and is a critically important task for data cleaning and data integration. One can think of it, as the task of finding entities matching to the same entity in the real world. These entities can belong to a single source of data, or distributed data-sources. It takes structured data as an input and process includes comparison of that structured data (entity or database record) with entities present in the knowledge base. For large-scale entity, matching data has to go through some sequence of steps, which includes …


Relationship Based Entity Recommendation System, Rakhi Poonam Verma Oct 2015

Relationship Based Entity Recommendation System, Rakhi Poonam Verma

Master's Projects

With the increase in usage of the internet as a place to search for information, the importance of the level of relevance of the results returned by search engines have increased by many folds in recent years. In this paper, we propose techniques to improve the relevance of results shown by a search engine, by using the kinds of relationships between entities a user is interested in. We propose a technique that uses relationships between entities to recommend related entities from a knowledge base which is a collection of entities and the relationships with which they are connected to other …


Graph Basesd Word Sense Disambiguation For Clinical Abbreviations Using Apache Spark, Veebha Padavkar Oct 2015

Graph Basesd Word Sense Disambiguation For Clinical Abbreviations Using Apache Spark, Veebha Padavkar

Master's Projects

Identification of the correct sense for an ambiguous word is one of the major challenges for language processing in all domains. Word Sense Disambiguation is the task of identifying the correct sense of an ambiguous word by referencing the surrounding context of the word. Similar to the narrative documents, clinical documents suffer from ambiguity issues that impact automatic extraction of correct sense from the document. In this project, we propose a graph-based solution based on an algorithm originally implemented by Osmar R. Zaine et al. for word sense disambiguation specifically focusing on clinical text. The algorithm makes use of proposed …


A Recommendation Engine Using Apache Spark, Swapna Kulkarni Oct 2015

A Recommendation Engine Using Apache Spark, Swapna Kulkarni

Master's Projects

The volume of structured and unstructured data has grown at exponential scale in recent days. As a result of this rapid data growth, we are always inundated with plethora of choices in any product or service. It is very natural to get lost in the amazon of such choices and finding hard to make decisions. The project aims at addressing this problem by using entity recommendation. The two main aspects that the project concentrates on are implementing and presenting more accurate entity recommendations to the user and another is dealing with vast amount of data. The project aims at presenting …


Study Of Big Data Arhitecture Lambda Arhitecture, Jaideep Katkar Oct 2015

Study Of Big Data Arhitecture Lambda Arhitecture, Jaideep Katkar

Master's Projects

The lambda architecture introduced by Marz is generic, scalable and fault-tolerant data processing architecture. It aims to satisfy the needs for a robust system that is faulttolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases. The architecture proposal decomposes the problem into three layers: a) the batch layer focuses on fault tolerance and optimizes for precise results b) the speed layer is optimized for short response-times and only takes into account the most recent data and c) the serving layer provides low latency views to the results of the …


Designing A Programming Contract Library For Java, Neha Rajkumar Oct 2015

Designing A Programming Contract Library For Java, Neha Rajkumar

Master's Projects

Programmers are now developing large and complex software systems, so it’s important to have software that is consistent, efficient, and robust. Programming contracts allow developers to specify preconditions, postconditions, and invariants in order to more easily identify programming errors. The design by contract principle [1] was first used in the Eiffel programming language [2], and has since been extended to libraries in many other languages. The purpose of my project is to design a programming contract library for Java. The library supports a set of preconditions, postconditions, and invariants that are specified in Java annotations. It incorporates contract checking for …


Mining Concept In Big Data, Jingjing Yang May 2015

Mining Concept In Big Data, Jingjing Yang

Master's Projects

To fruitful using big data, data mining is necessary. There are two well-known methods, one is based on apriori principle, and the other one is based on FP-tree. In this project we explore a new approach that is based on simplicial complex, which is a combinatorial form of polyhedron used in algebraic topology. Our approach, similar to FP-tree, is top down, at the same time, it is based on apriori principle in geometric form, called closed condition in simplicial complex. Our method is almost 300 times faster than FP-growth on a real world database using a SJSU laptop. The database …


Cryptanalysis Of Classic Ciphers Using Hidden Markov Models, Rohit Vobbilisetty May 2015

Cryptanalysis Of Classic Ciphers Using Hidden Markov Models, Rohit Vobbilisetty

Master's Projects

Cryptanalysis is the study of identifying weaknesses in the implementation of cryptographic algorithms. This process would improve the complexity of such algo- rithms, making the system secure.

In this research, we apply Hidden Markov Models (HMMs) to classic cryptanaly- sis problems. We show that with sufficient ciphertext, an HMM can be used to break a simple substitution cipher. We also show that when limited ciphertext is avail- able, using multiple random restarts for the HMM increases our chance of successful decryption.


Adding Syntax Parameters To The Sweet.Js Macro Library For Javascript, Vimal Kumar May 2015

Adding Syntax Parameters To The Sweet.Js Macro Library For Javascript, Vimal Kumar

Master's Projects

Lisp and Scheme have demonstrated the power of macros to enable programmers to evolve and craft languages. A macro is a rule or pattern that specifies how a certain input sequence should be mapped to an output sequence according to some defined procedure. Using a macro system a programmer can introduce new syntactic elements to the programming language. Macros found in a program are expanded by a macro expander and allow a programmer to enable code reuse. Mozilla Sweet.JS provides a way for developers to enrich their JavaScript code by adding new syntax to the language through the use of …


Introducing Faceted Exception Handling For Dynamic Information Flow, Sri Tej Narala May 2015

Introducing Faceted Exception Handling For Dynamic Information Flow, Sri Tej Narala

Master's Projects

JavaScript is most commonly used as a part of web browsers, especially client- side scripts interacting with the user. JavaScript is also the source of many security problems, which includes cross-site scripting attacks. The primary challenge is that code from untrusted sources run with full privileges on the client side, thus lead- ing to security breaches. This paper develops information flow controls with proper exception handling to prevent violations of data confidentiality and integrity.

Faceted values are a mechanism to handle dynamic information flow security in a way that overcomes the limitations caused by dynamic execution, but previous work has …


Support Vector Machines And Metamorphic Malware Detection, Tanuvir Singh May 2015

Support Vector Machines And Metamorphic Malware Detection, Tanuvir Singh

Master's Projects

Metamorphic malware changes its internal structure with each infection, which makes it challenging to detect. In this research, we test several scor- ing techniques that have shown promise in metamorphic detection. We then perform a careful robustness analysis by employing morphing strategies that cause each score to fail. Finally, we show that combining scores using a Sup- port Vector Machine (SVM) yields results that are significantly more robust than we obtained using any of the individual scores.


Malware Detection Using Dynamic Analysis, Swapna Vemparala May 2015

Malware Detection Using Dynamic Analysis, Swapna Vemparala

Master's Projects

In this research, we explore the field of dynamic analysis which has shown promis- ing results in the field of malware detection. Here, we extract dynamic software birth- marks during malware execution and apply machine learning based detection tech- niques to the resulting feature set. Specifically, we consider Hidden Markov Models and Profile Hidden Markov Models. To determine the effectiveness of this dynamic analysis approach, we compare our detection results to the results obtained by using static analysis. We show that in some cases, significantly stronger results can be obtained using our dynamic approach.


Clustering Versus Svm For Malware Detection, Usha Narra May 2015

Clustering Versus Svm For Malware Detection, Usha Narra

Master's Projects

Previous work has shown that we can effectively cluster certain classes of mal- ware into their respective families. In this research, we extend this previous work to the problem of developing an automated malware detection system. We first compute clusters for a collection of malware families. Then we analyze the effectiveness of clas- sifying new samples based on these existing clusters. We compare results obtained using �-means and Expectation Maximization (EM) clustering to those obtained us- ing Support Vector Machines (SVM). Using clustering, we are able to detect some malware families with an accuracy comparable to that of SVMs. One …


Optimization Of Scheduling And Dispatching Cars On Demand, Vu Tran May 2015

Optimization Of Scheduling And Dispatching Cars On Demand, Vu Tran

Master's Projects

Taxicab is the most common type of on-demand transportation service in the city because its dispatching system offers better services in terms of shorter wait time. However, the shorter wait time and travel time for multiple passengers and destinations are very considerable. There are recent companies implemented the real-time ridesharing model that expects to reduce the riding cost when passengers are willing to share their rides with the others. This model does not solve the shorter wait time and travel time when there are multiple passengers and destinations. This paper investigates how the ridesharing can be improved by using the …


A Comparison Of Clustering Techniques For Malware Analysis, Swathi Pai May 2015

A Comparison Of Clustering Techniques For Malware Analysis, Swathi Pai

Master's Projects

In this research, we apply clustering techniques to the malware detection problem. Our goal is to classify malware as part of a fully automated detection strategy. We compute clusters using the well-known �-means and EM clustering algorithms, with scores obtained from Hidden Markov Models (HMM). The previous work in this area consists of using HMM and �-means clustering technique to achieve the same. The current effort aims to extend it to use EM clustering technique for detection and also compare this technique with the �-means clustering.


Firefox Add-On For Metamorphic Javascript Malware Detection, Sravan Kumar Reddy Javaji May 2015

Firefox Add-On For Metamorphic Javascript Malware Detection, Sravan Kumar Reddy Javaji

Master's Projects

With the increasing use of the Internet, malicious software has more frequently been designed to take control of users computers for illicit purposes. Cybercriminals are putting a lot of efforts to make malware difficult to detect. In this study, we demonstrate how the metamorphic JavaScript malware can effect a victim’s machine using a malicious or compromised Firefox add-on. Following the same methodology, we develop another add-on with malware static detection technique to detect metamorphic JavaScript malware.


Index Strategies For Efficient And Effective Entity Search, Huy T. Vu May 2015

Index Strategies For Efficient And Effective Entity Search, Huy T. Vu

Master's Projects

The volume of structured data has rapidly grown in recent years, when data-entity emerged as an abstraction that captures almost every data pieces. As a result, searching for a desired piece of information on the web could be a challenge in term of time and relevancy because the number of matching entities could be very large for a given query. This project concerns with the efficiency and effectiveness of such entity queries. The work contains two major parts: implement inverted indexing strategies so that queries can be searched in minimal time, and rank results based on features that are independent …


Context-Based Autosuggest On Graph Data, Hai Nguyen May 2015

Context-Based Autosuggest On Graph Data, Hai Nguyen

Master's Projects

Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the …


A Scalable Search Engine Aggregator, Pooja Mishra May 2015

A Scalable Search Engine Aggregator, Pooja Mishra

Master's Projects

The ability to display different media sources in an appropriate way is an integral part of search engines such as Google, Yahoo, and Bing, as well as social networking sites like Facebook, etc. This project explores and implements various media-updating features of the open source search engine Yioop [1]. These include news aggregation, video conversion and email distribution. An older, preexisting news update feature of Yioop was modified and scaled so that it can work on many machines. We redesigned and modified the user interface associated with a distributed news updater feature in Yioop. This project also introduced a video …


An Open Source Advertisement Server, Pushkar Umaranikar May 2015

An Open Source Advertisement Server, Pushkar Umaranikar

Master's Projects

This report describes a new online advertisement system and its implementation for the Yioop open source search engine. This system was implemented for my CS298 project. It supports both selling advertisements and displaying them within search results. The selling of advertisement is done using a novel auction system, which we describe in this paper. With this auction system, it is possible to create an advertisement, attach keywords to it, and add it to the advertisement inventory. An advertisement is displayed on a search results page if the search keyword matches the keywords attached to the advertisement. Display of advertisements is …


Cheating Detection In Online Examinations, Gaurav Kasliwal May 2015

Cheating Detection In Online Examinations, Gaurav Kasliwal

Master's Projects

In this research, we develop and analyze a tool that monitor student browsing activity during online examination. Our goal is to detect cheating in real time. In our design, a server capture packets using KISMET and detects cheating based on either a whitelist or blacklist of URLs. We provide implementation details and give experimental results, and we analyze various attack strategies. Finally, we show that the system is practical and lightweight in comparison to other available tools.


Driver Telematics Analysis, Karthik Vakati May 2015

Driver Telematics Analysis, Karthik Vakati

Master's Projects

For automobile insurance firms, telemetric analysis represents a valuable and growing way to identify the risk associated with each driver. The pricing decisions of an insurer are best accounted for if they are made considering the driver’s behavior instead of just the vehicle characteristics and the best way to understand a driver’s behavior is to leverage the telemetric analysis. Decisions made on such factors can eventually lead to increased premium or reduced liability for unsafe or reckless drivers and can also help in transitioning the burden to the policies that lead to increased liability.

The dataset provided for this project …


Maximizing The Speed Of Influence In Social Networks, Yubo Wang May 2015

Maximizing The Speed Of Influence In Social Networks, Yubo Wang

Master's Projects

Influence maximization in social networks is the problem of selecting a limited

size of influential users as seed nodes so that the influence from these seed nodes can propagate to the largest number of other nodes in the network. Previous studies in influence maximization focused on three areas, i.e., designing propagation models, improving algorithms of seed-node selection and exploiting the structure of social networks. However, most of these studies ignored the time constraint in influence propagation. In this paper, I studied how to maximize influence propagation in a given time, i.e., maximizing the speed of influence propagation in social networks. …


Using Neural Networks For Image Classification, Tim Kang May 2015

Using Neural Networks For Image Classification, Tim Kang

Master's Projects

This paper will focus on applying neural network machine learning methods to images for the purpose of automatic detection and classification. The main advantage of using neural network methods in this project is its adeptness at fitting non­linear data and its ability to work as an unsupervised algorithm. The algorithms will be run on common, publically available datasets, namely the MNIST and CIFAR­10, so that our results will be easily reproducible.


Static Analysis Of Malicious Java Applets, Nikitha Ganesh May 2015

Static Analysis Of Malicious Java Applets, Nikitha Ganesh

Master's Projects

In this research, we consider the problem of detecting malicious Java applets, based on static analysis. In general, dynamic analysis is more informative, but static analysis is more efficient, and hence more practical. Consequently, static analysis is preferred, provided we can obtain results comparable to those obtained using dynamic analysis. We conducted experiments with the machine learning technique, Hidden Markov Model (HMM). We show that in some cases a static technique can detect malicious Java applets with greater accuracy than previously published research that relied on dynamic analysis.


Combining Dynamic And Static Analysis For Malware Detection, Anusha Damodaran May 2015

Combining Dynamic And Static Analysis For Malware Detection, Anusha Damodaran

Master's Projects

Well-designed malware can evade static detection techniques, such as signature scanning. Dynamic analysis strips away one layer of obfuscation and hence such an approach can potentially provide more accurate detection results. However, dynamic analysis is generally more costly than static analysis. In this research, we analyze the effectiveness of using dynamic analysis to enhance the training phase, while using only static techniques in the detection phase. Relative to a fully static approach, the additional overhead is minimal, since training is essentially one-time work.


Sociobot: Twitter For Command And Control Of A Botnet, Ismeet Kaur Makkar May 2015

Sociobot: Twitter For Command And Control Of A Botnet, Ismeet Kaur Makkar

Master's Projects

A botnet is a collection of computers controlled by a botmaster, often used for malicious activity. Social network provides an ideal medium for botnets to spread their reach. In this research, we develop and analyze a botnet that uses Twitter for its command and control channel. We use this botnet to perform a distributed denial of service attack on a web server, and we utilize the biological epidemic models to analyze the spread of the botnet using Twitter.


Operational Semantics For Featherweight Lua, Hanshu Lin May 2015

Operational Semantics For Featherweight Lua, Hanshu Lin

Master's Projects

Lua is a small, embedded language to provide scripting in other languages. De- spite a clean, minimal syntax, it is still too complex for formal reasoning because of some syntactic sugar or specific syntax structures in Lua.

This thesis develops Featherweight Lua (FWLua), following the tradition of lan- guages like Featherweight Java[1] and Featherweight JavaScript[2]. The goal is to develop a core of language features that, while remaining simple enough for formal reasoning, also remain faithful to the central characteristics of the language. Specifi- cally for Lua, the core features that are essential for our modeling include:

∙ First-class functions …