Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 83

Full-Text Articles in Physical Sciences and Mathematics

Nosql Databases In Kubernetes, Parth Sandip Mehta Jan 2023

Nosql Databases In Kubernetes, Parth Sandip Mehta

Master's Projects

With the increasing popularity of deploying applications in containers, Kubernetes (K8s) has become one of the most accepted container orchestration systems. Kubernetes helps maintain containers smoothly and simplifies DevOps with powerful automations. It was originally developed as a tool to manage stateless microservices that run seamlessly in containers. The ephemeral nature of pods, the smallest deployable unit, in Kubernetes was well-aligned with stateless applications since destroying and recreating pods didn’t impact applications. There was a need to provision solutions around stateful workloads like databases so as to take advantage of K8s. This project explores this need, the challenges associated and …


High Performance Distributed File System Based On Blockchain, Ajinkya Rajguru Jan 2023

High Performance Distributed File System Based On Blockchain, Ajinkya Rajguru

Master's Projects

Distributed filesystem architectures use commodity hardware to store data on a large scale with maximum consistency and availability. Blockchain makes it possible to store information that can never be tampered with and incentivizes a traditional decentralized storage system. This project aimed to implement a decentralized filesystem that leverages the blockchain to keep a record of all the transactions on it. A conventional filesystem viz. GFS [1] or HDFS [2] uses designated servers owned by their organization to store the data and are governed by a master service. This project aimed at removing a single point of failure and makes use …


Robust Cache System For Web Search Engine Yioop, Rushikesh Padia Jan 2023

Robust Cache System For Web Search Engine Yioop, Rushikesh Padia

Master's Projects

Caches are the most effective mechanism utilized by web search engines to optimize the performance of search queries. Search engines employ caching at multiple levels to improve its performance, for example, caching posting list and caching result set. Caching query results reduces overhead of processing frequent queries and thus saves a lot of time and computing power. Yioop is an open-source web search engine which utilizes result cache to optimize searches. The current implementation utilizes a single dynamic cache based on Marker’s algorithm. The goal of the project is to improve the performance of cache in Yioop. To choose a …


Caption And Image Based Next-Word Auto-Completion, Meet Patel Jan 2022

Caption And Image Based Next-Word Auto-Completion, Meet Patel

Master's Projects

With the increasing number of options or choices in terms of entities like products, movies, songs, etc. which are now available to users, they try to save time by looking for an application or system that provides automatic recommendations. Recommender systems are automated computing processes that leverage concepts of Machine Learning, Data Mining and Artificial Intelligence towards generating product recommendations based on a user’s preferences. These systems have given a significant boost to businesses across multiple segments as a result of reduced human intervention. One similar aspect of this is content writing. It would save users a lot of time …


Improving User Experiences For Wiki Systems, Parth Patel Jan 2022

Improving User Experiences For Wiki Systems, Parth Patel

Master's Projects

Wiki systems are web applications that allow users to collaboratively manage the content. Such systems enable users to read and write information in the form of web pages and share media items like videos, audios, books etc. Yioop is an open-source web portal with features of a search engine, a wiki system and discussion groups. In this project I have enhanced Yioop’s features for improving the user experiences. The preliminary work introduced new features like emoji picker tool for direct messaging system, unit testing framework for automating the UI testing of Yioop and redeeming advertisement credits back into real money. …


Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol Jan 2022

Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol

Master's Projects

The first web applications appeared in the early nineteen nineties. These applica- tions were entirely hosted in house by companies that developed them. In the mid 2000s the concept of a digital cloud was introduced by the then CEO of google Eric Schmidt. Now in the current day most companies will at least partially host their applications on proprietary servers hosted at data-centers or commercial clouds like Amazon Web Services (AWS) or Heroku.

This arrangement seems like a straight forward win-win for both parties, the customer gets rid of the hassle of maintaining a live server for their applications and …


Whole File Chunk Based Deduplication Using Reinforcement Learning, Xincheng Yuan Jan 2022

Whole File Chunk Based Deduplication Using Reinforcement Learning, Xincheng Yuan

Master's Projects

Deduplication is the process of removing replicated data content from storage facilities like online databases, cloud datastore, local file systems, etc., which is commonly performed as part of data preprocessing to eliminate redundant data that requires unnecessary storage spaces and computing power. Deduplication is even more specifically essential for file backup systems since duplicated files will presumably consume more storage space, especially with a short backup period like daily [8]. A common technique in this field involves splitting files into chunks whose hashes can be compared using data structures or techniques like clustering. In this project we explore the possibility …


Benchmarking Newsql Database Voltdb, Kevin Schumacher Jan 2022

Benchmarking Newsql Database Voltdb, Kevin Schumacher

Master's Projects

NewSQL is a type of relational database that is able to horizontally scale while retaining linearizable consistency. This is an improvement over a traditional SQL relational database because SQL databases cannot effectively scale across multiple machines. This is also an improvement over NoSQL databases because NewSQL databases are designed from the ground up to be consistent and have ACID guarantees. However, it should be noted that NewSQL databases are not a one size fits all type of database, each specific database is designed to perform well on specific workloads. This project will evaluate a NewSQL database, VoltDB, with a focus …


An Open Source Direct Messaging And Enhanced Recommendation System For Yioop, Aniruddha Dinesh Mallya Dec 2021

An Open Source Direct Messaging And Enhanced Recommendation System For Yioop, Aniruddha Dinesh Mallya

Master's Projects

Recommendation systems and direct messaging systems are two popular components of web portals. A recommendation system is an information filtering system that seeks to predict the "rating" or "preference" a user would give to an item and a direct messaging system allows private communication between users of any platform. Yioop, is an open source, PHP search engine and web portal that can be configured to allow users to create discussion groups, blogs, wikis etc.

In this project, we expanded on Yioop’s group system so that every user now has a personal group. Personal groups were then used to add user …


High Performance Document Store Implementation In Rust, Ishaan Aggarwal Dec 2021

High Performance Document Store Implementation In Rust, Ishaan Aggarwal

Master's Projects

Databases are a core part of any application which requires persistence of data. The performance of applications involving the use of database systems is directly proportional to how fast their database read-write operations are. The aim of this project was to build a high- performance document store which can support variety of applications which require data storage and retrieval of some kind. This document store can be used as an independently running backend service which can be utilized by search engines, applications which deal with keeping records, etc. We used Rust to make this document store which is fast, robust, …


Node.Js Based Document Store For Web Crawling, David Bui Dec 2021

Node.Js Based Document Store For Web Crawling, David Bui

Master's Projects

WARC files are central to internet preservation projects. They contain the raw resources of web crawled data and can be used to create windows into the past of web pages at the time they were accessed. Yet there are few tools that manipulate WARC files outside of basic parsing. The creation of our tool WARC-KIT gives users in the Node.js JavaScript environment, a tool kit to interact with and manipulate WARC files.

Included with WARC-KIT is a WARC parsing tool known as WARCFilter that can be used standalone tool to parse, filter, and create new WARC files. WARCFilter can also, …


Mapping E-Commerce Locally And Beyond: Citt K12 Special Investigation Project, Thomas O’Brien, Deanna Matsumoto Nov 2021

Mapping E-Commerce Locally And Beyond: Citt K12 Special Investigation Project, Thomas O’Brien, Deanna Matsumoto

Mineta Transportation Institute Publications

As all aspects of the American workplace become automated or digitally enhanced to some degree, K12 educators have an increasing responsibility to help their students acquire the technical skills necessary to organize and interpret information. Increasingly, this is done through Geographic Information Systems (GIS), especially in careers related to transportation and logistics. The Center for International Trade & Transportation (CITT) at CSU Long Beach has developed this K12 Special Investigation Project to introduce ArcGIS StoryMaps, an engaging, accessible and sophisticated web-based GIS application. The lessons center on e-commerce and its accompanying environmental and economic impact. Still, the activities can be …


Using Oracle To Solve Zookeeper On Two-Replica Problems, Ching-Chan Lee May 2021

Using Oracle To Solve Zookeeper On Two-Replica Problems, Ching-Chan Lee

Master's Projects

The project introduces an Oracle, a failure detector, in Apache ZooKeeper and makes it fault-tolerant in a two-node system. The project demonstrates the Oracle authorizes the primary process to maintain the liveness when the majority’s rule becomes an obstacle to continue Apache ZooKeeper service. In addition to the property of accuracy and completeness from Chandra et al.’s research, the project proposes the property of see to avoid losing transactions and the property of mutual exclusion to avoid split-brain issues. The hybrid properties render not only more sounder flexibility in the implementation but also stronger guarantees on safety. Thus, the Oracle …


Translating Natural Language Queries To Sparql, Shreya Satish Bhajikhaye May 2021

Translating Natural Language Queries To Sparql, Shreya Satish Bhajikhaye

Master's Projects

The Semantic Web is an extensive knowledge base that contains facts in the form of RDF
triples. These facts are not easily accessible to the average user because to use them requires
an understanding of ontologies and a query language like SPARQL. Question answering systems
form a layer of abstraction on linked data to overcome these issues. These systems allow the
user to input a question in a natural language and receive the equivalent SPARQL query. The
user can then execute the query on the database to fetch the desired results. The standard
techniques involved in translating natural language questions …


Hybrid Cloud Workload Monitoring As A Service, Shreya Kundu Feb 2021

Hybrid Cloud Workload Monitoring As A Service, Shreya Kundu

Master's Projects

Cloud computing and cloud-based hosting has become embedded in our daily lives. It is imperative for cloud providers to make sure all services used by both enterprises and consumers have high availability and elasticity to prevent any downtime, which impacts negatively for any business. To ensure cloud infrastructures are working reliably, cloud monitoring becomes an essential need for both businesses, the provider and the consumer. This thesis project reports on the need of efficient scalable monitoring, enumerating the necessary types of metrics of interest to be collected. Current understanding of various architectures designed to collect, store and process monitoring data …


Cat Tracks – Tracking Wildlife Through Crowdsourcing Using Firebase, Tracy Ho Dec 2020

Cat Tracks – Tracking Wildlife Through Crowdsourcing Using Firebase, Tracy Ho

Master's Projects

Many mountain lions are killed in the state of California every year from roadkill. To reduce these numbers, it is important that a system be built to track where these mountain lions have been around. One such system could be built using the platform-as-a-service, Firebase. Firebase is a platform service that collects and manages data that comes in through a mobile application. For the development of cross-platform mobile applications, Flutter is used as a toolkit for developers for both iOS and Android. This entire system, Cat Tracks is proposed as a crowdsource platform to track wildlife, with the current focus …


Improved Chinese Language Processing For An Open Source Search Engine, Xianghong Sun May 2020

Improved Chinese Language Processing For An Open Source Search Engine, Xianghong Sun

Master's Projects

Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.

Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote …


Benchmarking Mongodb Multi-Document Transactions In A Sharded Cluster, Tushar Panpaliya May 2020

Benchmarking Mongodb Multi-Document Transactions In A Sharded Cluster, Tushar Panpaliya

Master's Projects

Relational databases like Oracle, MySQL, and Microsoft SQL Server offer trans- action processing as an integral part of their design. These databases have been a primary choice among developers for business-critical workloads that need the highest form of consistency. On the other hand, the distributed nature of NoSQL databases makes them suitable for scenarios needing scalability, faster data access, and flexible schema design. Recent developments in the NoSQL database community show that NoSQL databases have started to incorporate transactions in their drivers to let users work on business-critical scenarios without compromising the power of distributed NoSQL features [1].

MongoDB is …


Improved User News Feed Customization For An Open Source Search Engine, Timothy Chow May 2020

Improved User News Feed Customization For An Open Source Search Engine, Timothy Chow

Master's Projects

Yioop is an open source search engine project hosted on the site of the same name.It offers several features outside of searching, with one such feature being a news feed. The current news feed system aggregates articles from a curated list of news sites determined by the owner. However in its current state, the feed list is limited in size, constrained by the hardware that the aggregator is run on. The goal of my project was to overcome this limit by improving the current storage method used. The solution was derived by making use of IndexArchiveBundles and IndexShards, both of …


Developing A Mongodb Monitoring System Using Nosql Databases For Monitored Data Management, Anjitha Karattu Thodi May 2020

Developing A Mongodb Monitoring System Using Nosql Databases For Monitored Data Management, Anjitha Karattu Thodi

Master's Projects

MongoDB is a NoSQL database, specifically used to efficiently store and access a large quantity of unstructured data over a distributed cluster of nodes. As the number of nodes in the cluster increases, it becomes difficult to manually monitor different components of the database. This poses an interesting problem of monitoring the MongoDB database to view the state of the system at any point. Although a few proprietary monitoring tools exist to monitor MongoDB clusters, they are not freely available for use in academia. Therefore, the focus of this project is to create a monitoring system that is completely built …


Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


Music Retrieval System Using Query-By-Humming, Parth Patel Dec 2019

Music Retrieval System Using Query-By-Humming, Parth Patel

Master's Projects

Music Information Retrieval (MIR) is a particular research area of great interest because there are various strategies to retrieve music. To retrieve music, it is important to find a similarity between the input query and the matching music. Several solutions have been proposed that are currently being used in the application domain(s) such as Query- by-Example (QBE) which takes a sample of an audio recording playing in the background and retrieves the result. However, there is no efficient approach to solve this problem in a Query-by-Humming (QBH) application. In a Query-by-Humming application, the aim is to retrieve music that is …


A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma Dec 2019

A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma

Master's Projects

Text summarization has been a long studied topic in the field of natural language processing. There have been various approaches for both extractive text summarization as well as abstractive text summarization. Summarizing texts for a single document is a methodical task. But summarizing multiple documents poses as a greater challenge. This thesis explores the application of Latent Semantic Analysis, Text-Rank, Lex-Rank and Reduction algorithms for single document text summarization and compares it with the proposed approach of creating a hybrid system combining each of the above algorithms, individually, with Restricted Boltzmann Machines for multi-document text summarization and analyzing how all …


Influence Analysis Based On Political Twitter Data, Jace Rose May 2019

Influence Analysis Based On Political Twitter Data, Jace Rose

Master's Projects

Studies of online behavior often consider how users interact online, their posting behaviors, what they are tweeting about, and how likely they are to follow other people. The problem is there is that no deeper study on the people that a user has interacted with and how these other users affect them. This study examines if it is possible to draw similar sentiment from users with whom the target user has interacted with. The data collection process gathers data from Twitter users posting to popular political hashtags, which the highest at the time published were #MAGA and #TRUMP, as well …


Schema Migration From Relational Databases To Nosql Databases With Graph Transformation And Selective Denormalization, Krishna Chaitanya Mullapudi May 2019

Schema Migration From Relational Databases To Nosql Databases With Graph Transformation And Selective Denormalization, Krishna Chaitanya Mullapudi

Master's Projects

We witnessed a dramatic increase in the volume, variety and velocity of data leading to the era of big data. The structure of data has become highly flexible leading to the development of many storage systems that are different from the traditional structured relational databases where data is stored in “tables,” with columns representing the lowest granularity of data. Although relational databases are still predominant in the industry, there has been a major drift towards alternative database systems that support unstructured data with better scalability leading to the popularity of “Not Only SQL.”

Migration from relational databases to NoSQL databases …


Image Retrieval Using Image Captioning, Nivetha Vijayaraju May 2019

Image Retrieval Using Image Captioning, Nivetha Vijayaraju

Master's Projects

The rapid growth in the availability of the Internet and smartphones have resulted in the increase in usage of social media in recent years. This increased usage has thereby resulted in the exponential growth of digital images which are available. Therefore, image retrieval systems play a major role in fetching images relevant to the query provided by the users. These systems should also be able to handle the massive growth of data and take advantage of the emerging technologies, like deep learning and image captioning. This report aims at understanding the purpose of image retrieval and various research held in …


Sentiment Analysis For Search Engine, Saravana Gunaseelan May 2019

Sentiment Analysis For Search Engine, Saravana Gunaseelan

Master's Projects

The chief purpose of this study is to detect and eliminate the sentiment bias in a search engine. Sentiment bias means a bias induced in the search results based on the sentiment of the user’s search query. As people increasing depend on search engines for information, it is important to understand the quality of results produced by the search engines. This study does not try to build a search engine but leverage the existing search engines to provide better results to the user. In this study, only the queries that have high sentiment polarity are analyzed and the machine learning …


An Ensemble Model For Click Through Rate Prediction, Muthaiah Ramanathan May 2019

An Ensemble Model For Click Through Rate Prediction, Muthaiah Ramanathan

Master's Projects

Internet has become the most prominent and accessible way to spread the news about an event or to pitch, advertise and sell a product, globally. The success of any advertisement campaign lies in reaching the right class of target audience and eventually convert them as potential customers in the future. Search engines like the Google, Yahoo, Bing are a few of the most used ones by the businesses to market their product. Apart from this, certain websites like the www.alibaba.com that has more traffic also offer services for B2B customers to set their advertisement campaign. The look of the advertisement, …


Topic Classification Using Hybrid Of Unsupervised And Supervised Learning, Jayant Shelke May 2019

Topic Classification Using Hybrid Of Unsupervised And Supervised Learning, Jayant Shelke

Master's Projects

There has been research around the idea of representing words in text as vectors and many models proposed that vary in performance as well as applications. Text processing is used for content recommendation, sentiment analysis, plagiarism detection, content creation, language translation, etc. to name a few. Specifically, we want to look at the problem of topic detection in text content of articles/blogs/summaries. With the humungous amount of text content published each and every minute on the internet, it is imperative that we have very good algorithms and approaches to analyze all the content and be able to classify most of …


Benchmarking Scalability Of Nosql Databases For Geospatial Queries, Yuvraj Singh Kanwar May 2019

Benchmarking Scalability Of Nosql Databases For Geospatial Queries, Yuvraj Singh Kanwar

Master's Projects

NoSQL databases provide an edge when it comes to dealing with big unstructured data. Flexibility, agility, and scalability offered by NoSQL databases become increasingly essential when dealing with geospatial data. The proliferation of geospatial applications has tremendously increased the variety, velocity, and volume of data that the data stores must manage. Such characteristics of big spatial data surpassed the capability and anticipated use cases of relational databases. Because we can choose from an extensive collection of NoSQL databases these days, it becomes vital for organizations to make an informed decision. NoSQL Database benchmarks provide system architects, who shoulder a considerable …