Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

Deep Data Analysis On The Web, Xuanyu Liu Dec 2016

Deep Data Analysis On The Web, Xuanyu Liu

Master's Projects

Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only …


Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio Dec 2016

Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio

Master's Projects

In this research, we predict User's Future Request using Data Mining Algorithm. Usage of the World Wide Web has resulted in a huge amount of data and handling of this data is getting hard day by day. All this data is stored as Web Logs and each web log is stored in a different format with different Field names like search string, URL with its corresponding timestamp, User ID’s that helps for session identification, Status code, etc. Whenever a user requests for a URL there is a delay in getting the page requested and sometimes the request is denied. Our …


Handling Relationships In A Wiki System, Yashi Kamboj Dec 2016

Handling Relationships In A Wiki System, Yashi Kamboj

Master's Projects

Wiki software enables users to manage content on the web, and create or edit web pages freely. Most wiki systems support the creation of hyperlinks on pages and have a simple text syntax for page formatting. A common, more advanced feature is to allow pages to be grouped together as categories. Currently, wiki systems support categorization of pages in a very traditional way by specifying whether a wiki page belongs to a category or not. Categorization represents unary relationship and is not sufficient to represent n-ary relationships, those involving links between multiple wiki pages.

In this project, we extend Yioop, …


Web-Based Integrated Development Environment, Hien T. Vu Dec 2016

Web-Based Integrated Development Environment, Hien T. Vu

Master's Projects

As tablets become more powerful and more economical, students are attracted to them and are moving away from desktops and laptops. Their compact size and easy to use Graphical User Interface (GUI) reduce the learning and adoption barriers for new users. This also changes the environment in which undergraduate Computer Science students learn how to program. Popular Integrated Development Environments (IDE) such as Eclipse and NetBeans require disk space for local installations as well as an external compiler. These requirements cannot be met by current tablets and thus drive the need for a web-based IDE. There are also many other …


Analyzing Clustered Web Concepts With Homology, Eric Nam Jul 2016

Analyzing Clustered Web Concepts With Homology, Eric Nam

Master's Projects

As data is being mined more and more from the Internet today, Data Science has become an important field of computing to make that data useful. Data Science allows people to turn all of that data into structured knowledge that is easily utilized, validated, and understandable. There are many known theories to analyze data, but this project will focus on a recently introduced method: analyzing text data with homology from mathematics to understand relationships between keyword-sets.

Using structures of algebraic topology as a starting point, keyword-sets in the text are represented by simplexes based on what they are and what …


Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le Jun 2016

Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le

Master's Projects

This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …


Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala May 2016

Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala

Master's Projects

Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: …


Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali May 2016

Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali

Master's Projects

Entity matching is the process of identifying different manifestations of the same real world entity. These entities can be referred to as objects(string) or data instances. These entities are in turn split over several databases or clusters based on the signatures of the entities. When entity matching algorithms are performed on these databases or clusters, there is a high possibility that a particular entity pair is compared more than once. The number of comparison for any two entities depend on the number of common signatures or keys they possess. This effects the performance of any entity matching algorithm. This paper …


Library Writers Reward Project, Saravana Kumar Gajendran May 2016

Library Writers Reward Project, Saravana Kumar Gajendran

Master's Projects

Open-source library development exploits the distributed intelligence of participants in Internet communities. Nowadays, contribution to the open-source community is fading [16] (Stackalytics, 2016) as there is not much recognition for library writers. They can start exploring ways to generate revenue as they actively contribute to the open-source community.

This project helps library writers to generate revenue in the form of bitcoins for their contribution. Our solution to generate revenue for library writers is to integrate bitcoin mining with existing JavaScript libraries, such as jQuery. More use of the library leads to more revenue for the library writers. It uses the …


Processing Posting Lists Using Opencl, Radha Kotipalli May 2016

Processing Posting Lists Using Opencl, Radha Kotipalli

Master's Projects

One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements.

Some of the critical functions in search engines are resource-intensive in terms of processing power, …


Concept Based Search Engine: Concept Creation, Aishwarya Rastogi Mar 2016

Concept Based Search Engine: Concept Creation, Aishwarya Rastogi

Master's Projects

Data on the internet is increasing exponentially every single second. There are billions and billions of documents on the World Wide Web (The Internet). Each document on the internet contains multiple concepts (an abstract or general idea inferred from specific instances).

In this paper, we show how we created and implemented an algorithm for extracting concepts from a set of documents. These concepts can be used by a search engine for generating search results to cater the needs of the user. The search result will then be more targeted than the usual keyword search.

The main problem was to extract …