Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

San Jose State University

Discipline
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 31 - 60 of 83

Full-Text Articles in Databases and Information Systems

Predictive Analysis For Cloud Infrastructure Metrics, Paridhi Agrawal May 2019

Predictive Analysis For Cloud Infrastructure Metrics, Paridhi Agrawal

Master's Projects

In a cloud computing environment, enterprises have the flexibility to request resources according to their application demands. This elastic feature of cloud computing makes it an attractive option for enterprises to host their applications on the cloud. Cloud providers usually exploit this elasticity by auto-scaling the application resources for quality assurance. However, there is a setup-time delay that may take minutes between the demand for a new resource and it being prepared for utilization. This causes the static resource provisioning techniques, which request allocation of a new resource only when the application breaches a specific threshold, to be slow and …


Gradubique: An Academic Transcript Database Using Blockchain Architecture, Thinh Nguyen Dec 2018

Gradubique: An Academic Transcript Database Using Blockchain Architecture, Thinh Nguyen

Master's Projects

Blockchain has been widely adopted in the last few years even though it is in its infancy. The first well-known application built on blockchain technology was Bitcoin, which is a decentralized and distributed ledger to record crypto-currency transactions. All of the transactions in Bitcoin are anonymously transferred and validated by participants in the network. Bitcoin protocol and its operations are so reliable that technologists have been inspired to enhance blockchain technologies and deploy it outside of the crypto-currency world. The demand for private and non-crypto-currency solutions have surged among consortiums because of the security and fault tolerant features of blockchain. …


Recommender Systems For Large-Scale Social Networks: A Review Of Challenges And Solutions, Magdalini Eirinaki, Jerry Gao, Iraklis Varlamis, Konstantinos Tserpes Jan 2018

Recommender Systems For Large-Scale Social Networks: A Review Of Challenges And Solutions, Magdalini Eirinaki, Jerry Gao, Iraklis Varlamis, Konstantinos Tserpes

Faculty Publications

Social networks have become very important for networking, communications, and content sharing. Social networking applications generate a huge amount of data on a daily basis and social networks constitute a growing field of research, because of the heterogeneity of data and structures formed in them, and their size and dynamics. When this wealth of data is leveraged by recommender systems, the resulting coupling can help address interesting problems related to social engagement, member recruitment, and friend recommendations.In this work we review the various facets of large-scale social recommender systems, summarizing the challenges and interesting problems and discussing some of the …


Support Vector Machines For Image Spam Analysis, Aneri Chavda, Katerina Potika, Fabio Di Troia, Mark Stamp Jan 2018

Support Vector Machines For Image Spam Analysis, Aneri Chavda, Katerina Potika, Fabio Di Troia, Mark Stamp

Faculty Publications, Computer Science

Email is one of the most common forms of digital communication. Spam is unsolicited bulk email, while image spam consists of spam text embedded inside an image. Image spam is used as a means to evade text-based spam filters, and hence image spam poses a threat to email-based communication. In this research, we analyze image spam detection using support vector machines (SVMs), which we train on a wide variety of image features. We use a linear SVM to quantify the relative importance of the features under consideration. We also develop and analyze a realistic “challenge” dataset that illustrates the limitations …


Question Type Recognition Using Natural Language Input, Aishwarya Soni Jun 2017

Question Type Recognition Using Natural Language Input, Aishwarya Soni

Master's Projects

Recently, numerous specialists are concentrating on the utilization of Natural Language Processing (NLP) systems in various domains, for example, data extraction and content mining. One of the difficulties with these innovations is building up a precise Question and Answering (QA) System. Question type recognition is the most significant task in a QA system, for example, chat bots. Organization such as National Institute of Standards (NIST) hosts a conference series called as Text REtrieval Conference (TREC) series which keeps a competition every year to encourage and improve the technique of information retrieval from a large corpus of text. When a user …


Improving Text Classification With Word Embedding, Lihao Ge Jun 2017

Improving Text Classification With Word Embedding, Lihao Ge

Master's Projects

One challenge in text classification is that it is hard to make feature reduction basing upon the meaning of the features. An improper feature reduction may even worsen the classification accuracy. Word2Vec, a word embedding method, has recently been gaining popularity due to its high precision rate of analyzing the semantic similarity between words at relatively low computational cost. However, there are only a limited number of researchers focusing on feature reduction using Word2Vec. In this project, we developed a Word2Vec based method to reduce the feature size while increasing the classification accuracy. The feature reduction is achieved by loosely …


An Open Source Discussion Group Recommendation System, Sarika Padmashali May 2017

An Open Source Discussion Group Recommendation System, Sarika Padmashali

Master's Projects

A recommendation system analyzes user behavior on a website to make suggestions about what a user should do in the future on the website. It basically tries to predict the “rating” or “preference” a user would have for an action. Yioop is an open source search engine, wiki system, and user discussion group system managed by Dr. Christopher Pollett at SJSU. In this project, we have developed a recommendation system for Yioop where users are given suggestions about the threads and groups they could join based on their user history. We have used collaborative filtering techniques to make recommendations and …


Adding Differential Privacy In An Open Board Discussion Board System, Pragya Rana May 2017

Adding Differential Privacy In An Open Board Discussion Board System, Pragya Rana

Master's Projects

This project implements a privacy system for statistics generated by the Yioop search and discussion board system. Statistical data for such a system consists of various counts, sums, and averages that might be displayed for groups, threads, etc. When statistical data is made publicly available, there is no guarantee of preserving the privacy of an individual. Ideally, any data extracted should not reveal any sensitive information about an individual. In order to help achieve this, we implemented a Differential Privacy mechanism for Yioop. Differential privacy preserves privacy up to some controllable parameters of the number of items or individuals being …


Document Classification Using Machine Learning, Ankit Basarkar May 2017

Document Classification Using Machine Learning, Ankit Basarkar

Master's Projects

To perform document classification algorithmically, documents need to be represented such that it is understandable to the machine learning classifier. The report discusses the different types of feature vectors through which document can be represented and later classified. The project aims at comparing the Binary, Count and TfIdf feature vectors and their impact on document classification. To test how well each of the three mentioned feature vectors perform, we used the 20-newsgroup dataset and converted the documents to all the three feature vectors. For each feature vector representation, we trained the Naïve Bayes classifier and then tested the generated classifier …


Headline Generation Using Deep Neural Networks, Dhruven Vora May 2017

Headline Generation Using Deep Neural Networks, Dhruven Vora

Master's Projects

News headline generation is one of the important text summarization tasks. Human generated news headlines are generally intended to catch the eye rather than provide useful information. There have been many approaches to generate meaningful headlines by either using neural networks or using linguistic features. In this report, we are proposing a novel approach based on integrating Hedge Trimmer, which is a grammar based extractive summarization system with a deep neural network abstractive summarization system to generate meaningful headlines. We analyze the results against current recurrent neural network based headline generation system.


Reducing Query Latency For Information Retrieval, Swapnil Satish Kamble May 2017

Reducing Query Latency For Information Retrieval, Swapnil Satish Kamble

Master's Projects

As the world is moving towards Big Data, NoSQL (Not only SQL) databases are gaining much more popularity. Among the other advantages of NoSQL databases, one of their key advantage is that they facilitate faster retrieval for huge volumes of data, as compared to traditional relational databases. This project deals with one such popular NoSQL database, Apache HBase. It performs quite efficiently in cases of retrieving information using the rowkey (similar to a primary key in a SQL database). But, in cases where one needs to get information based on non-rowkey columns, the response latency is higher than what we …


A Chatbot Framework For Yioop, Harika Nukala May 2017

A Chatbot Framework For Yioop, Harika Nukala

Master's Projects

Over the past few years, messaging applications have become more popular than Social networking sites. Instead of using a specific application or website to access some service, chatbots are created on messaging platforms to allow users to interact with companies’ products and also give assistance as needed. In this project, we designed and implemented a chatbot Framework for Yioop. The goal of the Chatbot Framework for Yioop project is to provide a platform for developers in Yioop to build and deploy chatbot applications. A chatbot is a web service that can converse with users using artificial intelligence in messaging platforms. …


Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar May 2017

Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar

Master's Projects

Natural language processing (NLP) is a technique by which computers can analyze, understand, and derive meaning from human language. Phrases in a body of natural text that represent names, such as those of persons, organizations or locations are referred to as named entities. Identifying and categorizing these named entities is still a challenging task, research on which, has been carried out for many years. In this project, we build a supervised learning based classifier which can perform named entity recognition and classification (NERC) on input text and implement it as part of a chatbot application. The implementation is then scaled …


Intelligent Web Crawler For Semantic Search Engine, Shujia Zhang Feb 2017

Intelligent Web Crawler For Semantic Search Engine, Shujia Zhang

Master's Projects

A Semantic Search Engine (SSE) is a program that produces semantic-oriented concepts from the Internet. A web crawler is the front end of our SSE; its primary goal is to supply important and necessary information to the data analysis component of SSE. The main function of the analysis component is to produce the concepts (moderately frequent finite sequences of keywords) from the input; it uses some variants of TF-IDF as a primary tool to remove stop words. However, it is a very expensive way to filter out stop words using the idea of TF-IDF. The goal of this project is …


Deep Data Analysis On The Web, Xuanyu Liu Dec 2016

Deep Data Analysis On The Web, Xuanyu Liu

Master's Projects

Search engines are well known to people all over the world. People prefer to use keywords searching to open websites or retrieve information rather than type typical URLs. Therefore, collecting finite sequences of keywords that represent important concepts within a set of authors is important, in other words, we need knowledge mining. We use a simplicial concept method to speed up concept mining. Previous CS 298 project has studied this approach under Dr. Lin. This method is very fast, for example, to mine the concept, FP-growth takes 876 seconds from a database with 1257 columns 65k rows, simplicial complex only …


Handling Relationships In A Wiki System, Yashi Kamboj Dec 2016

Handling Relationships In A Wiki System, Yashi Kamboj

Master's Projects

Wiki software enables users to manage content on the web, and create or edit web pages freely. Most wiki systems support the creation of hyperlinks on pages and have a simple text syntax for page formatting. A common, more advanced feature is to allow pages to be grouped together as categories. Currently, wiki systems support categorization of pages in a very traditional way by specifying whether a wiki page belongs to a category or not. Categorization represents unary relationship and is not sufficient to represent n-ary relationships, those involving links between multiple wiki pages.

In this project, we extend Yioop, …


Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio Dec 2016

Predicting User's Future Requests Using Frequent Patterns, Marc Nipuna Dominic Savio

Master's Projects

In this research, we predict User's Future Request using Data Mining Algorithm. Usage of the World Wide Web has resulted in a huge amount of data and handling of this data is getting hard day by day. All this data is stored as Web Logs and each web log is stored in a different format with different Field names like search string, URL with its corresponding timestamp, User ID’s that helps for session identification, Status code, etc. Whenever a user requests for a URL there is a delay in getting the page requested and sometimes the request is denied. Our …


Web-Based Integrated Development Environment, Hien T. Vu Dec 2016

Web-Based Integrated Development Environment, Hien T. Vu

Master's Projects

As tablets become more powerful and more economical, students are attracted to them and are moving away from desktops and laptops. Their compact size and easy to use Graphical User Interface (GUI) reduce the learning and adoption barriers for new users. This also changes the environment in which undergraduate Computer Science students learn how to program. Popular Integrated Development Environments (IDE) such as Eclipse and NetBeans require disk space for local installations as well as an external compiler. These requirements cannot be met by current tablets and thus drive the need for a web-based IDE. There are also many other …


Analyzing Clustered Web Concepts With Homology, Eric Nam Jul 2016

Analyzing Clustered Web Concepts With Homology, Eric Nam

Master's Projects

As data is being mined more and more from the Internet today, Data Science has become an important field of computing to make that data useful. Data Science allows people to turn all of that data into structured knowledge that is easily utilized, validated, and understandable. There are many known theories to analyze data, but this project will focus on a recently introduced method: analyzing text data with homology from mathematics to understand relationships between keyword-sets.

Using structures of algebraic topology as a starting point, keyword-sets in the text are represented by simplexes based on what they are and what …


Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le Jun 2016

Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le

Master's Projects

This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …


Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali May 2016

Efficient Pair-Wise Similarity Computation Using Apache Spark, Parineetha Gandhi Tirumali

Master's Projects

Entity matching is the process of identifying different manifestations of the same real world entity. These entities can be referred to as objects(string) or data instances. These entities are in turn split over several databases or clusters based on the signatures of the entities. When entity matching algorithms are performed on these databases or clusters, there is a high possibility that a particular entity pair is compared more than once. The number of comparison for any two entities depend on the number of common signatures or keys they possess. This effects the performance of any entity matching algorithm. This paper …


Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala May 2016

Hybrid Similarity Function For Big Data Entity Matching With R-Swoosh, Vimal Chandra Gorijala

Master's Projects

Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: …


Library Writers Reward Project, Saravana Kumar Gajendran May 2016

Library Writers Reward Project, Saravana Kumar Gajendran

Master's Projects

Open-source library development exploits the distributed intelligence of participants in Internet communities. Nowadays, contribution to the open-source community is fading [16] (Stackalytics, 2016) as there is not much recognition for library writers. They can start exploring ways to generate revenue as they actively contribute to the open-source community.

This project helps library writers to generate revenue in the form of bitcoins for their contribution. Our solution to generate revenue for library writers is to integrate bitcoin mining with existing JavaScript libraries, such as jQuery. More use of the library leads to more revenue for the library writers. It uses the …


Processing Posting Lists Using Opencl, Radha Kotipalli May 2016

Processing Posting Lists Using Opencl, Radha Kotipalli

Master's Projects

One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements.

Some of the critical functions in search engines are resource-intensive in terms of processing power, …


Concept Based Search Engine: Concept Creation, Aishwarya Rastogi Mar 2016

Concept Based Search Engine: Concept Creation, Aishwarya Rastogi

Master's Projects

Data on the internet is increasing exponentially every single second. There are billions and billions of documents on the World Wide Web (The Internet). Each document on the internet contains multiple concepts (an abstract or general idea inferred from specific instances).

In this paper, we show how we created and implemented an algorithm for extracting concepts from a set of documents. These concepts can be used by a search engine for generating search results to cater the needs of the user. The search result will then be more targeted than the usual keyword search.

The main problem was to extract …


Mining Concept In Big Data, Jingjing Yang May 2015

Mining Concept In Big Data, Jingjing Yang

Master's Projects

To fruitful using big data, data mining is necessary. There are two well-known methods, one is based on apriori principle, and the other one is based on FP-tree. In this project we explore a new approach that is based on simplicial complex, which is a combinatorial form of polyhedron used in algebraic topology. Our approach, similar to FP-tree, is top down, at the same time, it is based on apriori principle in geometric form, called closed condition in simplicial complex. Our method is almost 300 times faster than FP-growth on a real world database using a SJSU laptop. The database …


A Scalable Search Engine Aggregator, Pooja Mishra May 2015

A Scalable Search Engine Aggregator, Pooja Mishra

Master's Projects

The ability to display different media sources in an appropriate way is an integral part of search engines such as Google, Yahoo, and Bing, as well as social networking sites like Facebook, etc. This project explores and implements various media-updating features of the open source search engine Yioop [1]. These include news aggregation, video conversion and email distribution. An older, preexisting news update feature of Yioop was modified and scaled so that it can work on many machines. We redesigned and modified the user interface associated with a distributed news updater feature in Yioop. This project also introduced a video …


An Open Source Advertisement Server, Pushkar Umaranikar May 2015

An Open Source Advertisement Server, Pushkar Umaranikar

Master's Projects

This report describes a new online advertisement system and its implementation for the Yioop open source search engine. This system was implemented for my CS298 project. It supports both selling advertisements and displaying them within search results. The selling of advertisement is done using a novel auction system, which we describe in this paper. With this auction system, it is possible to create an advertisement, attach keywords to it, and add it to the advertisement inventory. An advertisement is displayed on a search results page if the search keyword matches the keywords attached to the advertisement. Display of advertisements is …


Index Strategies For Efficient And Effective Entity Search, Huy T. Vu May 2015

Index Strategies For Efficient And Effective Entity Search, Huy T. Vu

Master's Projects

The volume of structured data has rapidly grown in recent years, when data-entity emerged as an abstraction that captures almost every data pieces. As a result, searching for a desired piece of information on the web could be a challenge in term of time and relevancy because the number of matching entities could be very large for a given query. This project concerns with the efficiency and effectiveness of such entity queries. The work contains two major parts: implement inverted indexing strategies so that queries can be searched in minimal time, and rank results based on features that are independent …


Context-Based Autosuggest On Graph Data, Hai Nguyen May 2015

Context-Based Autosuggest On Graph Data, Hai Nguyen

Master's Projects

Autosuggest is an important feature in any search applications. Currently, most applications only suggest a single term based on how frequent that term appears in the indexed documents or how often it is searched upon. These approaches might not provide the most relevant suggestions because users often enter a series of related query terms to answer a question they have in mind. In this project, we implemented the Smart Solr Suggester plugin using a context-based approach that takes into account the relationships among search keywords. In particular, we used the keywords that the user has chosen so far in the …