Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Keyword
-
- Agent Based Modeling Akka Scala (1)
- Alergia Granulaer Rough Computing (1)
- Android bytecode malware machine learning (1)
- Association rule search geometric traversal problem (1)
- CNN Chinese character recognition (1)
-
- CloudCoder CodeCheck Web-base IDE (1)
- Coding versus non-coding dna grammatical inference (1)
- Concept Web Search Engine (1)
- Data Mining World Wide Web Frequent Itemset (1)
- Entity Matching Semantic Web (1)
- Facial Recognition Neural Nets GPU programming (1)
- HTTP Attack n-grams obfuscation (1)
- Image feature tracking classification (1)
- Image-based bulk email spam SVM PCA (1)
- Inverted Index Construction (1)
- Javascript Malware Detection N-grams (1)
- Javascript flow analysis security (1)
- Knowledge mining simplical complexes (1)
- Learning Management System MOOCs interactive computer exercises (1)
- Malware detection cryptanalysis Vigenère cipher (1)
- Masquarade intrusion detection Hidden Markov Model (1)
- MediaWiki Binary Relationship Categories (1)
- Movie Script Parsing ans Shot Creation Naive Bayes (1)
- Multiple Sequence Alignment Profile Hidden Markov Model (1)
- OpenCL (1)
- Pairwise Similarity Apache Spark (1)
- Posting List Compression (1)
- Revenue source library writers bitcoin (1)
- SVM text Classification (1)
- Security Declassification Information Flow Faceted Typed Javascript (1)
Articles 1 - 30 of 32
Full-Text Articles in Physical Sciences and Mathematics
Real-Time Online Chinese Character Recognition, Wenlong Zhang
Real-Time Online Chinese Character Recognition, Wenlong Zhang
Master's Projects
In this project, I built a web application for handwritten Chinese characters recognition in real time. This system determines a Chinese character while a user is drawing/writing it. The techniques and steps I use to build the recognition system include data preparation, preprocessing, features extraction, and classification. To increase the accuracy, two different types of neural networks ared used in the system: a multi-layer neural network and a convolutional neural network.
Cryptanalysis Of Homophonic Substitution Cipher Using Hidden Markov Models, Guannan Zhong
Cryptanalysis Of Homophonic Substitution Cipher Using Hidden Markov Models, Guannan Zhong
Master's Projects
We investigate the effectiveness of a Hidden Markov Model (HMM) with random restarts as a mean of breaking a homophonic substitution cipher. Based on extensive experiments, we find that such an HMM-based attack outperforms a previously de- veloped nested hill climb approach, particularly when the ciphertext message is short. We then consider a combination cipher, consisting of a homophonic substitution and a column transposition. We develop and analyze an attack on such a cipher. This attack employs an HMM (with random restarts), together with a hill climb to recover the column permutation. We show that this attack can succeed on …
Deep Data Analysis On The Web, Xuanyu Liu
Deep Data Analysis On The Web, Xuanyu Liu
Master's Projects
Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only …
Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio
Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio
Master's Projects
In this research, we predict User's Future Request using Data Mining Algorithm. Usage of the World Wide Web has resulted in a huge amount of data and handling of this data is getting hard day by day. All this data is stored as Web Logs and each web log is stored in a different format with different Field names like search string, URL with its corresponding timestamp, User ID’s that helps for session identification, Status code, etc. Whenever a user requests for a URL there is a delay in getting the page requested and sometimes the request is denied. Our …
Handling Relationships In A Wiki System, Yashi Kamboj
Handling Relationships In A Wiki System, Yashi Kamboj
Master's Projects
Wiki software enables users to manage content on the web, and create or edit web pages freely. Most wiki systems support the creation of hyperlinks on pages and have a simple text syntax for page formatting. A common, more advanced feature is to allow pages to be grouped together as categories. Currently, wiki systems support categorization of pages in a very traditional way by specifying whether a wiki page belongs to a category or not. Categorization represents unary relationship and is not sufficient to represent n-ary relationships, those involving links between multiple wiki pages.
In this project, we extend Yioop, …
Web-Based Integrated Development Environment, Hien T. Vu
Web-Based Integrated Development Environment, Hien T. Vu
Master's Projects
As tablets become more powerful and more economical, students are attracted to them and are moving away from desktops and laptops. Their compact size and easy to use Graphical User Interface (GUI) reduce the learning and adoption barriers for new users. This also changes the environment in which undergraduate Computer Science students learn how to program. Popular Integrated Development Environments (IDE) such as Eclipse and NetBeans require disk space for local installations as well as an external compiler. These requirements cannot be met by current tablets and thus drive the need for a web-based IDE. There are also many other …
Analyzing Clustered Web Concepts With Homology, Eric Nam
Analyzing Clustered Web Concepts With Homology, Eric Nam
Master's Projects
As data is being mined more and more from the Internet today, Data Science has become an important field of computing to make that data useful. Data Science allows people to turn all of that data into structured knowledge that is easily utilized, validated, and understandable. There are many known theories to analyze data, but this project will focus on a recently introduced method: analyzing text data with homology from mathematics to understand relationships between keyword-sets.
Using structures of algebraic topology as a starting point, keyword-sets in the text are represented by simplexes based on what they are and what …
Dna Analysis Using Grammatical Inference, Cory Cook
Dna Analysis Using Grammatical Inference, Cory Cook
Master's Projects
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for …
Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi
Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi
Master's Projects
Based on Kolmogorov Complexity, a finite set x of strings has a pattern if the set x can be output by a Turing machine of length that is less than minimum of all |x|; this Turing machine, that may not be unique, is called a pattern of the finite set of string. In order to find a pattern of a given finite set of strings (assuming such a pattern exists), the ALERGIA algorithm is used to approximate such a pattern (Turing machine) in terms of finite automata. Note that each finite automaton defines a partition on formal language Σ*, ALERGIA …
Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le
Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le
Master's Projects
This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …
Static And Dynamic Analysis For Android Malware Detection, Ankita Kapratwar
Static And Dynamic Analysis For Android Malware Detection, Ankita Kapratwar
Master's Projects
Static analysis relies on features extracted without executing code, while dynamic analysis extracts features based on code execution (or emulation). In general, static analysis is more e cient, while static analysis is often more informative, particularly in cases of highly obfuscated code. Static analysis of an Android application can rely on features extracted from the manifest le or the Java bytecode, while dynamic analysis of Android applications can deal with features involving dynamic code loading and system calls that are collected while the application is running. In this research, we analyzed the e ectiveness of combining static and dynamic features …
VigenèRe Score For Malware Detection, Suchita Deshmukh
VigenèRe Score For Malware Detection, Suchita Deshmukh
Master's Projects
Previous research has applied classic cryptanalytic techniques to the malware detection problem. Speci cally, scores based on simple substitution cipher cryptanal- ysis and various generalizations have been considered. In this research, we analyze two new malware scoring techniques based on classic cryptanalysis. Our rst ap- proach relies on the Index of Coincidence, which is used, for example, to determine the length of the keyword in a Vigenère ciphertext. We also consider a score based on a more complete cryptanalysis of a Vigenère cipher. We nd that the Vigenère score is competitive with previous statistical-based malware scores.
Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen
Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen
Master's Projects
Pattern recognition is a field of machine learning with applications to areas such as text recognition and computer vision. Machine learning algorithms, such as convolutional neural networks, may be trained to classify images. However, such tasks may be computationally intensive for a commercial computer for larger volumes or larger sizes of images. Cloud computing allows one to overcome the processing and memory constraints of average commercial computers, allowing computations on larger amounts of data. In this project, we developed a system for detection and tracking of moving human and vehicle objects in videos in real time or near real time. …
Image Spam Analysis, Annapurna Sowmya Annadatha
Image Spam Analysis, Annapurna Sowmya Annadatha
Master's Projects
Image spam is unsolicited bulk email, where the message is embedded in an image. This technique is used to evade text-based spam lters. In this research, we analyze and compare two novel approaches for detecting spam images. Our rst approach focuses on the extraction of a broad set of image features and selection of an optimal subset using a Support Vector Machine (SVM). Our second approach is based on Principal Component Analysis (PCA), where we determine eigenvectors for a set of spam images and compute scores by projecting images onto the resulting eigenspace. Both approaches provide high accuracy with low …
Defeating N-Gram Scores For Http Attack Detection, Samyuktha Sridharan
Defeating N-Gram Scores For Http Attack Detection, Samyuktha Sridharan
Master's Projects
Web applications that generate malicious HTTP requests provide a platform that attackers use to exploit vulnerable machines. Such malicious traffic should be identified by network intrusion detection systems, based on traffic analysis. Previous research has shown that n-gram techniques can be successfully applied to detect HTTP attacks. In this research, we analyze the robustness of these n-gram techniques. We show that n-gram scores are surprisingly robust, but can be defeated using certain obfuscation strategies. We also consider the need for a more costlier HMM-based intrusion detection system.
Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala
Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala
Master's Projects
In recent year’s document management tasks (known as information retrieval) increased a lot due to availability of digital documents everywhere. The need of automatic methods for extracting document information became a prominent method for organizing information and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. In my research classification of text is mainly focused on sentiment label classification. The idea proposed for sentiment analysis is multi-class classification of online movie reviews. Many research papers discussed the classification of sentiment either positive or …
Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy
Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy
Master's Projects
Digital information available on the Internet is increasing day by day. As a result of this, the demand for tools that help people in finding and analyzing all these resources are also growing in number. Text Classification, in particular, has been very useful in managing the information. Text Classification is the process of assigning natural language text to one or more categories based on the content. It has many important applications in the real world. For example, finding the sentiment of the reviews, posted by people on restaurants, movies and other such things are all applications of Text classification. In …
Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala
Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala
Master's Projects
Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: …
Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali
Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali
Master's Projects
Entity matching is the process of identifying different manifestations of the same real world entity. These entities can be referred to as objects(string) or data instances. These entities are in turn split over several databases or clusters based on the signatures of the entities. When entity matching algorithms are performed on these databases or clusters, there is a high possibility that a particular entity pair is compared more than once. The number of comparison for any two entities depend on the number of common signatures or keys they possess. This effects the performance of any entity matching algorithm. This paper …
Tracking User Activity While Safeguarding Data From Attackers, Justin Dahmubed
Tracking User Activity While Safeguarding Data From Attackers, Justin Dahmubed
Master's Projects
Companies constantly look for ways to better understand customer activity on their websites. Website owners may want to be able to analyze customer activity without having to concern themselves with a government agency forcing them to reveal their information. Multiple analytical tools have been created, most notably Google Analytics.
In my thesis, I demonstrate how analytics data can be stored so that only the site owners can view the data about their customers. With my design, even the analytics site itself cannot decrypt the data after a given window of time has elapsed. The novel aspect of my design is …
Movie Script Shot Lister, David Robert Smith
Movie Script Shot Lister, David Robert Smith
Master's Projects
The making of a motion picture almost always starts with the script, the written version of a story envisioned within the mind of its creator. The script is then broken down into shots. Each individual shot is filmed and then they are edited together to create the motion picture. The goal of the Movie Script Shot Lister thesis project is to be able to read in a script for a movie or television show, and automatically generate a shot list. While a script is text, a shot list is the blue print for how to visualize that script, so the …
Library Writers Reward Project, Saravana Kumar Gajendran
Library Writers Reward Project, Saravana Kumar Gajendran
Master's Projects
Open-source library development exploits the distributed intelligence of participants in Internet communities. Nowadays, contribution to the open-source community is fading [16] (Stackalytics, 2016) as there is not much recognition for library writers. They can start exploring ways to generate revenue as they actively contribute to the open-source community.
This project helps library writers to generate revenue in the form of bitcoins for their contribution. Our solution to generate revenue for library writers is to integrate bitcoin mining with existing JavaScript libraries, such as jQuery. More use of the library leads to more revenue for the library writers. It uses the …
Malicious Javascript Detection Using Statistical Language Model, Anumeha Shah
Malicious Javascript Detection Using Statistical Language Model, Anumeha Shah
Master's Projects
The Internet has an immense importance in our day to day life, but at the same time, it has become the medium of infecting computers, attacking users, and distributing malicious code. As JavaScript is the principal language of client side pro- gramming, it is frequently used in conducting such attacks. Various approaches have been made to overcome the JavaScript security issues. Some advanced approaches utilize machine learning technology in combination with de-obfuscation and emula- tion. Many methods of analysis incorporate static analysis and dynamic analysis. Our solution is entirely based on static analysis, which avoids unnecessary runtime overhead.
The central …
Multiple Sequence Alignment With Pro Le Hidden Markov Models, Shubhangi Rakhonde
Multiple Sequence Alignment With Pro Le Hidden Markov Models, Shubhangi Rakhonde
Master's Projects
The human genome consists of various patterns and sequences that are of biolog- ical signi cance. Capturing these patterns can help us in resolving various mysteries related to the genome, like how genomes evolve, how diseases occur due to genetic mutation, how viruses mutate to cause new disease and what is the cure for these diseases. All these applications are covered in the study of bioinformatics.
One of the very common tasks in bioinformatics involves simultaneous alignment of a number of biological sequences. In bioinformatics, this is widely known as Mul- tiple Sequence Alignment. Multiple sequence alignments help in grouping …
Processing Posting Lists Using Opencl, Radha Kotipalli
Processing Posting Lists Using Opencl, Radha Kotipalli
Master's Projects
One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements.
Some of the critical functions in search engines are resource-intensive in terms of processing power, …
Hive - An Agent Based Modeling Framework, Roohi Bharti
Hive - An Agent Based Modeling Framework, Roohi Bharti
Master's Projects
This thesis begins by defining agent based modeling. Agent based models are used to model the emergent behavior of complex systems with many interacting components, known as agents. Several model examples are given using NetLogo, which is a popular agent-based modeling platform. A model of concurrent computation is described that uses message passing as the only form of communication between the model’s components, which are called actors. The model is called an actor model. Actors are primitive objects of concurrency in an actor model. In particular, we describe the actor model implemented by Akka, which is Scala’s new actor library. …
Secure Declassification In Faceted Javascript, Tam Wing
Secure Declassification In Faceted Javascript, Tam Wing
Master's Projects
Information leaks currently represent a major security vulnerability. Malicious code, when injected into a trusted environment and executed in the context of the victim’s privileges, often results in the loss of sensitive information. To address this security issue, this paper focuses on the idea of information flow control using faceted execution [3]. This mechanism allows the interpreter to efficiently keep track of variables across multiple security levels, achieving termination-insensitive non-interference (TINI). With TINI, a program can only leak one bit of data, caused by the termination of a program. One key benefit of having faceted execution is that flow policy …
Interactive Computer Science Exercises In Edx, Hong Le
Interactive Computer Science Exercises In Edx, Hong Le
Master's Projects
This project focuses on improving online learning courses for Computer Science. My approach is to create a platform in which interactive exercises can be implemented for students to work on. Methodology includes creating plugins for interactive exercises using XBlock, a component architecture for building independent online courses on edX. The exercises are based on existing exercises like CodeCheck and Wiley’s InterActivities Exercise System. In order to integrate these exercises, I implemented CodeCheck XBlock and Interactive XBlock. These Xblocks allow students to work on interactive exercises on edX, and instructors to view and download students’ submissions.
Detection Of Locations Of Key Points On Facial Images, Manoj Gyanani
Detection Of Locations Of Key Points On Facial Images, Manoj Gyanani
Master's Projects
In field of computer vision research, One of the most important branch is Face recognition. It targets at finding size and location of human face on digital image, by identifying and separating faces from the surrounding objects like building, plants etc. For the purpose of developing an advanced face recognition algorithm, Detection of facial key points is the basic and very important task, basically it is about finding out the location of specific key points on facial images. This key points can be mouths, noses, left eyes, right eyes and so on.
For implementation of solution, I have used amazon …
Taint And Information Flow Analysis Using Sweet.Js Macros, Prakasam Kannan
Taint And Information Flow Analysis Using Sweet.Js Macros, Prakasam Kannan
Master's Projects
JavaScript has been the primary language for application development in browsers and with the advent of JIT compilers, it is increasingly becoming popular on server side development as well. However, JavaScript suffers from vulnerabilities like cross site scripting and malicious advertisement code on the the client side and on the server side from SQL injection.
In this paper, we present a dynamic approach to efficiently track information flow and taint detection to aid in mitigation and prevention of such attacks using JavaScript based hygienic macros. We use Sweet.js and object proxies to override built-in JavaScript operators to track information flow …