Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Autosuggest entities solr (1)
- Botnet Control Social Networks (1)
- Car telementrics data analysis (1)
- Classic Ciphers Hidden Markov Models (1)
- Clustering hmm malware detection (1)
-
- Computer Auctions Online Advertising (1)
- Concept Mining Simplicial Complex (1)
- DNA Species Similarity Stochastic Finite Automata (1)
- Distributed Computing (1)
- Entity search indexing (1)
- HMM DNA Motifs Bioinformatics (1)
- Java Malware Code Obfuscation Signature Statistical Detection (1)
- Javascript Macro Library Syntax Parameters (1)
- Javascript faceted exceptions (1)
- Lua Featherwight Language Semantics (1)
- Machine learning profile HMM malware detection (1)
- Malware Detection Obfuscation HMM (1)
- Neural nets image rcognition (1)
- News Aggregation (1)
- Online exams Cheating detection (1)
- Probabilistic Graphical Models NP-complete (1)
- Scheduling genetic algorithms (1)
- Social Network Models Influence (1)
- Support vector machines clustering malware detection (1)
- Support vector machines metamorphic malware detection (1)
- Publication Type
Articles 31 - 60 of 66
Full-Text Articles in Physical Sciences and Mathematics
Entity And Relational Queries Over Big Data Storage, Nachappa Achakalera Ponnappa
Entity And Relational Queries Over Big Data Storage, Nachappa Achakalera Ponnappa
Master's Projects
Big data storage involves using NoSQL technologies to handle and process huge volumes of data. NoSQL databases are non-relational, schema-free where data is stored as key-value pairs. The aim of the thesis is to implement Entity and Relational queries on top of Big Data storage. In order to achieve this, we use NoSQL technologies like MongoDB and HBase. We implement various methodologies and solutions on top of MongoDB and HBase to map data across different tables and implement entity and relational queries to retrieve entities from huge volumes of data. We also measure the performance of both the technologies and …
Predicting Autism Over Large-Scale Child Dataset, Arpit Arya
Predicting Autism Over Large-Scale Child Dataset, Arpit Arya
Master's Projects
Data Analytics and Machine learning in healthcare are one of the most emerging and needed fields in current time. Also, a lot of research has been performed and is still being done in this field. In healthcare, gone are those days when only doctor examines and patient listens. Now doctor has a lot of technologies which can assist him and help in accurately diagnosing the disease with which his patient is suffering. The backbone of such technologies is data analytics and machine learning where we can make out a lot of inferences from tons of patients‟ data already available. This …
Load Balancing For Entity Matching Over Big Data Using Sorted Neighborhood, Yogesh Wattamwar
Load Balancing For Entity Matching Over Big Data Using Sorted Neighborhood, Yogesh Wattamwar
Master's Projects
Entity matching also known as entity resolution, duplicate identification, reference reconciliation or record linkage and is a critically important task for data cleaning and data integration. One can think of it, as the task of finding entities matching to the same entity in the real world. These entities can belong to a single source of data, or distributed data-sources. It takes structured data as an input and process includes comparison of that structured data (entity or database record) with entities present in the knowledge base. For large-scale entity, matching data has to go through some sequence of steps, which includes …
Relationship Based Entity Recommendation System, Rakhi Poonam Verma
Relationship Based Entity Recommendation System, Rakhi Poonam Verma
Master's Projects
With the increase in usage of the internet as a place to search for information, the importance of the level of relevance of the results returned by search engines have increased by many folds in recent years. In this paper, we propose techniques to improve the relevance of results shown by a search engine, by using the kinds of relationships between entities a user is interested in. We propose a technique that uses relationships between entities to recommend related entities from a knowledge base which is a collection of entities and the relationships with which they are connected to other …
Graph Basesd Word Sense Disambiguation For Clinical Abbreviations Using Apache Spark, Veebha Padavkar
Graph Basesd Word Sense Disambiguation For Clinical Abbreviations Using Apache Spark, Veebha Padavkar
Master's Projects
Identification of the correct sense for an ambiguous word is one of the major challenges for language processing in all domains. Word Sense Disambiguation is the task of identifying the correct sense of an ambiguous word by referencing the surrounding context of the word. Similar to the narrative documents, clinical documents suffer from ambiguity issues that impact automatic extraction of correct sense from the document. In this project, we propose a graph-based solution based on an algorithm originally implemented by Osmar R. Zaine et al. for word sense disambiguation specifically focusing on clinical text. The algorithm makes use of proposed …
A Recommendation Engine Using Apache Spark, Swapna Kulkarni
A Recommendation Engine Using Apache Spark, Swapna Kulkarni
Master's Projects
The volume of structured and unstructured data has grown at exponential scale in recent days. As a result of this rapid data growth, we are always inundated with plethora of choices in any product or service. It is very natural to get lost in the amazon of such choices and finding hard to make decisions. The project aims at addressing this problem by using entity recommendation. The two main aspects that the project concentrates on are implementing and presenting more accurate entity recommendations to the user and another is dealing with vast amount of data. The project aims at presenting …
Study Of Big Data Arhitecture Lambda Arhitecture, Jaideep Katkar
Study Of Big Data Arhitecture Lambda Arhitecture, Jaideep Katkar
Master's Projects
The lambda architecture introduced by Marz is generic, scalable and fault-tolerant data processing architecture. It aims to satisfy the needs for a robust system that is faulttolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases. The architecture proposal decomposes the problem into three layers: a) the batch layer focuses on fault tolerance and optimizes for precise results b) the speed layer is optimized for short response-times and only takes into account the most recent data and c) the serving layer provides low latency views to the results of the …
Designing A Programming Contract Library For Java, Neha Rajkumar
Designing A Programming Contract Library For Java, Neha Rajkumar
Master's Projects
Programmers are now developing large and complex software systems, so it’s important to have software that is consistent, efficient, and robust. Programming contracts allow developers to specify preconditions, postconditions, and invariants in order to more easily identify programming errors. The design by contract principle [1] was first used in the Eiffel programming language [2], and has since been extended to libraries in many other languages. The purpose of my project is to design a programming contract library for Java. The library supports a set of preconditions, postconditions, and invariants that are specified in Java annotations. It incorporates contract checking for …
Mining Concept In Big Data, Jingjing Yang
Mining Concept In Big Data, Jingjing Yang
Master's Projects
To fruitful using big data, data mining is necessary. There are two well-known methods, one is based on apriori principle, and the other one is based on FP-tree. In this project we explore a new approach that is based on simplicial complex, which is a combinatorial form of polyhedron used in algebraic topology. Our approach, similar to FP-tree, is top down, at the same time, it is based on apriori principle in geometric form, called closed condition in simplicial complex. Our method is almost 300 times faster than FP-growth on a real world database using a SJSU laptop. The database …
Cryptanalysis Of Classic Ciphers Using Hidden Markov Models, Rohit Vobbilisetty
Cryptanalysis Of Classic Ciphers Using Hidden Markov Models, Rohit Vobbilisetty
Master's Projects
Cryptanalysis is the study of identifying weaknesses in the implementation of cryptographic algorithms. This process would improve the complexity of such algo- rithms, making the system secure.
In this research, we apply Hidden Markov Models (HMMs) to classic cryptanaly- sis problems. We show that with sufficient ciphertext, an HMM can be used to break a simple substitution cipher. We also show that when limited ciphertext is avail- able, using multiple random restarts for the HMM increases our chance of successful decryption.
Adding Syntax Parameters To The Sweet.Js Macro Library For Javascript, Vimal Kumar
Adding Syntax Parameters To The Sweet.Js Macro Library For Javascript, Vimal Kumar
Master's Projects
Lisp and Scheme have demonstrated the power of macros to enable programmers to evolve and craft languages. A macro is a rule or pattern that specifies how a certain input sequence should be mapped to an output sequence according to some defined procedure. Using a macro system a programmer can introduce new syntactic elements to the programming language. Macros found in a program are expanded by a macro expander and allow a programmer to enable code reuse. Mozilla Sweet.JS provides a way for developers to enrich their JavaScript code by adding new syntax to the language through the use of …
Introducing Faceted Exception Handling For Dynamic Information Flow, Sri Tej Narala
Introducing Faceted Exception Handling For Dynamic Information Flow, Sri Tej Narala
Master's Projects
JavaScript is most commonly used as a part of web browsers, especially client- side scripts interacting with the user. JavaScript is also the source of many security problems, which includes cross-site scripting attacks. The primary challenge is that code from untrusted sources run with full privileges on the client side, thus lead- ing to security breaches. This paper develops information flow controls with proper exception handling to prevent violations of data confidentiality and integrity.
Faceted values are a mechanism to handle dynamic information flow security in a way that overcomes the limitations caused by dynamic execution, but previous work has …
Support Vector Machines And Metamorphic Malware Detection, Tanuvir Singh
Support Vector Machines And Metamorphic Malware Detection, Tanuvir Singh
Master's Projects
Metamorphic malware changes its internal structure with each infection, which makes it challenging to detect. In this research, we test several scor- ing techniques that have shown promise in metamorphic detection. We then perform a careful robustness analysis by employing morphing strategies that cause each score to fail. Finally, we show that combining scores using a Sup- port Vector Machine (SVM) yields results that are significantly more robust than we obtained using any of the individual scores.
Malware Detection Using Dynamic Analysis, Swapna Vemparala
Malware Detection Using Dynamic Analysis, Swapna Vemparala
Master's Projects
In this research, we explore the field of dynamic analysis which has shown promis- ing results in the field of malware detection. Here, we extract dynamic software birth- marks during malware execution and apply machine learning based detection tech- niques to the resulting feature set. Specifically, we consider Hidden Markov Models and Profile Hidden Markov Models. To determine the effectiveness of this dynamic analysis approach, we compare our detection results to the results obtained by using static analysis. We show that in some cases, significantly stronger results can be obtained using our dynamic approach.
Clustering Versus Svm For Malware Detection, Usha Narra
Clustering Versus Svm For Malware Detection, Usha Narra
Master's Projects
Previous work has shown that we can effectively cluster certain classes of mal- ware into their respective families. In this research, we extend this previous work to the problem of developing an automated malware detection system. We first compute clusters for a collection of malware families. Then we analyze the effectiveness of clas- sifying new samples based on these existing clusters. We compare results obtained using �-means and Expectation Maximization (EM) clustering to those obtained us- ing Support Vector Machines (SVM). Using clustering, we are able to detect some malware families with an accuracy comparable to that of SVMs. One …
Optimization Of Scheduling And Dispatching Cars On Demand, Vu Tran
Optimization Of Scheduling And Dispatching Cars On Demand, Vu Tran
Master's Projects
Taxicab is the most common type of on-demand transportation service in the city because its dispatching system offers better services in terms of shorter wait time. However, the shorter wait time and travel time for multiple passengers and destinations are very considerable. There are recent companies implemented the real-time ridesharing model that expects to reduce the riding cost when passengers are willing to share their rides with the others. This model does not solve the shorter wait time and travel time when there are multiple passengers and destinations. This paper investigates how the ridesharing can be improved by using the …
A Comparison Of Clustering Techniques For Malware Analysis, Swathi Pai
A Comparison Of Clustering Techniques For Malware Analysis, Swathi Pai
Master's Projects
In this research, we apply clustering techniques to the malware detection problem. Our goal is to classify malware as part of a fully automated detection strategy. We compute clusters using the well-known �-means and EM clustering algorithms, with scores obtained from Hidden Markov Models (HMM). The previous work in this area consists of using HMM and �-means clustering technique to achieve the same. The current effort aims to extend it to use EM clustering technique for detection and also compare this technique with the �-means clustering.
Firefox Add-On For Metamorphic Javascript Malware Detection, Sravan Kumar Reddy Javaji
Firefox Add-On For Metamorphic Javascript Malware Detection, Sravan Kumar Reddy Javaji
Master's Projects
With the increasing use of the Internet, malicious software has more frequently been designed to take control of users computers for illicit purposes. Cybercriminals are putting a lot of efforts to make malware difficult to detect. In this study, we demonstrate how the metamorphic JavaScript malware can effect a victim’s machine using a malicious or compromised Firefox add-on. Following the same methodology, we develop another add-on with malware static detection technique to detect metamorphic JavaScript malware.
Index Strategies For Efficient And Effective Entity Search, Huy T. Vu
Index Strategies For Efficient And Effective Entity Search, Huy T. Vu
Master's Projects
The volume of structured data has rapidly grown in recent years, when data-entity emerged as an abstraction that captures almost every data pieces. As a result, searching for a desired piece of information on the web could be a challenge in term of time and relevancy because the number of matching entities could be very large for a given query. This project concerns with the efficiency and effectiveness of such entity queries. The work contains two major parts: implement inverted indexing strategies so that queries can be searched in minimal time, and rank results based on features that are independent …
Context-Based Autosuggest On Graph Data, Hai Nguyen
Context-Based Autosuggest On Graph Data, Hai Nguyen
Master's Projects
Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the …
A Scalable Search Engine Aggregator, Pooja Mishra
A Scalable Search Engine Aggregator, Pooja Mishra
Master's Projects
The ability to display different media sources in an appropriate way is an integral part of search engines such as Google, Yahoo, and Bing, as well as social networking sites like Facebook, etc. This project explores and implements various media-updating features of the open source search engine Yioop [1]. These include news aggregation, video conversion and email distribution. An older, preexisting news update feature of Yioop was modified and scaled so that it can work on many machines. We redesigned and modified the user interface associated with a distributed news updater feature in Yioop. This project also introduced a video …
An Open Source Advertisement Server, Pushkar Umaranikar
An Open Source Advertisement Server, Pushkar Umaranikar
Master's Projects
This report describes a new online advertisement system and its implementation for the Yioop open source search engine. This system was implemented for my CS298 project. It supports both selling advertisements and displaying them within search results. The selling of advertisement is done using a novel auction system, which we describe in this paper. With this auction system, it is possible to create an advertisement, attach keywords to it, and add it to the advertisement inventory. An advertisement is displayed on a search results page if the search keyword matches the keywords attached to the advertisement. Display of advertisements is …
Cheating Detection In Online Examinations, Gaurav Kasliwal
Cheating Detection In Online Examinations, Gaurav Kasliwal
Master's Projects
In this research, we develop and analyze a tool that monitor student browsing activity during online examination. Our goal is to detect cheating in real time. In our design, a server capture packets using KISMET and detects cheating based on either a whitelist or blacklist of URLs. We provide implementation details and give experimental results, and we analyze various attack strategies. Finally, we show that the system is practical and lightweight in comparison to other available tools.
Driver Telematics Analysis, Karthik Vakati
Driver Telematics Analysis, Karthik Vakati
Master's Projects
For automobile insurance firms, telemetric analysis represents a valuable and growing way to identify the risk associated with each driver. The pricing decisions of an insurer are best accounted for if they are made considering the driver’s behavior instead of just the vehicle characteristics and the best way to understand a driver’s behavior is to leverage the telemetric analysis. Decisions made on such factors can eventually lead to increased premium or reduced liability for unsafe or reckless drivers and can also help in transitioning the burden to the policies that lead to increased liability.
The dataset provided for this project …
Maximizing The Speed Of Influence In Social Networks, Yubo Wang
Maximizing The Speed Of Influence In Social Networks, Yubo Wang
Master's Projects
Influence maximization in social networks is the problem of selecting a limited
size of influential users as seed nodes so that the influence from these seed nodes can propagate to the largest number of other nodes in the network. Previous studies in influence maximization focused on three areas, i.e., designing propagation models, improving algorithms of seed-node selection and exploiting the structure of social networks. However, most of these studies ignored the time constraint in influence propagation. In this paper, I studied how to maximize influence propagation in a given time, i.e., maximizing the speed of influence propagation in social networks. …
Using Neural Networks For Image Classification, Tim Kang
Using Neural Networks For Image Classification, Tim Kang
Master's Projects
This paper will focus on applying neural network machine learning methods to images for the purpose of automatic detection and classification. The main advantage of using neural network methods in this project is its adeptness at fitting nonlinear data and its ability to work as an unsupervised algorithm. The algorithms will be run on common, publically available datasets, namely the MNIST and CIFAR10, so that our results will be easily reproducible.
Static Analysis Of Malicious Java Applets, Nikitha Ganesh
Static Analysis Of Malicious Java Applets, Nikitha Ganesh
Master's Projects
In this research, we consider the problem of detecting malicious Java applets, based on static analysis. In general, dynamic analysis is more informative, but static analysis is more efficient, and hence more practical. Consequently, static analysis is preferred, provided we can obtain results comparable to those obtained using dynamic analysis. We conducted experiments with the machine learning technique, Hidden Markov Model (HMM). We show that in some cases a static technique can detect malicious Java applets with greater accuracy than previously published research that relied on dynamic analysis.
Combining Dynamic And Static Analysis For Malware Detection, Anusha Damodaran
Combining Dynamic And Static Analysis For Malware Detection, Anusha Damodaran
Master's Projects
Well-designed malware can evade static detection techniques, such as signature scanning. Dynamic analysis strips away one layer of obfuscation and hence such an approach can potentially provide more accurate detection results. However, dynamic analysis is generally more costly than static analysis. In this research, we analyze the effectiveness of using dynamic analysis to enhance the training phase, while using only static techniques in the detection phase. Relative to a fully static approach, the additional overhead is minimal, since training is essentially one-time work.
Sociobot: Twitter For Command And Control Of A Botnet, Ismeet Kaur Makkar
Sociobot: Twitter For Command And Control Of A Botnet, Ismeet Kaur Makkar
Master's Projects
A botnet is a collection of computers controlled by a botmaster, often used for malicious activity. Social network provides an ideal medium for botnets to spread their reach. In this research, we develop and analyze a botnet that uses Twitter for its command and control channel. We use this botnet to perform a distributed denial of service attack on a web server, and we utilize the biological epidemic models to analyze the spread of the botnet using Twitter.
Operational Semantics For Featherweight Lua, Hanshu Lin
Operational Semantics For Featherweight Lua, Hanshu Lin
Master's Projects
Lua is a small, embedded language to provide scripting in other languages. De- spite a clean, minimal syntax, it is still too complex for formal reasoning because of some syntactic sugar or specific syntax structures in Lua.
This thesis develops Featherweight Lua (FWLua), following the tradition of lan- guages like Featherweight Java[1] and Featherweight JavaScript[2]. The goal is to develop a core of language features that, while remaining simple enough for formal reasoning, also remain faithful to the central characteristics of the language. Specifi- cally for Lua, the core features that are essential for our modeling include:
∙ First-class functions …