Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 26 of 26

Full-Text Articles in Artificial Intelligence and Robotics

Wikipedia Web Table Interpretation, Keyword-Based Search, And Ranking, Kartikee Dabir Jan 2023

Wikipedia Web Table Interpretation, Keyword-Based Search, And Ranking, Kartikee Dabir

Master's Projects

Information retrieval and data interpretation on the web, for the purpose of gaining knowledgeable insights, has been a widely researched topic from the onset of the world wide web or what is today popularly known as the internet. Web tables are structured tabular data present amidst unstructured, heterogenous data on the web. This makes web tables a rich source of information for a variety of tasks like data analysis, data interpretation, and information retrieval pertaining to extracting knowledge from information present on the web. Wikipedia tables which are a subset of web tables hold a huge amount of useful data, …


Graph Deep Learning Based Hashtag Recommender For Reels On Social Media, Sriya Balineni Jan 2023

Graph Deep Learning Based Hashtag Recommender For Reels On Social Media, Sriya Balineni

Master's Projects

Many businesses, including Facebook, Netflix, and YouTube, rely heavily on a recommendation system. Recommendation systems are algorithms that attempt to provide consumers with relevant suggestions for items such as movies, videos, or reels (microvideos) to watch, hashtags for their posts, songs to listen to, and products to purchase. In many businesses, recommender systems are essential because they can generate enormous amounts of revenue and make the platform stand out when compared to others. Reels are a feature of the social media platforms that enable users to create and share videos of up to sixty seconds in length. Individuals, businesses, and …


Caption And Image Based Next-Word Auto-Completion, Meet Patel Jan 2022

Caption And Image Based Next-Word Auto-Completion, Meet Patel

Master's Projects

With the increasing number of options or choices in terms of entities like products, movies, songs, etc. which are now available to users, they try to save time by looking for an application or system that provides automatic recommendations. Recommender systems are automated computing processes that leverage concepts of Machine Learning, Data Mining and Artificial Intelligence towards generating product recommendations based on a user’s preferences. These systems have given a significant boost to businesses across multiple segments as a result of reduced human intervention. One similar aspect of this is content writing. It would save users a lot of time …


Improving User Experiences For Wiki Systems, Parth Patel Jan 2022

Improving User Experiences For Wiki Systems, Parth Patel

Master's Projects

Wiki systems are web applications that allow users to collaboratively manage the content. Such systems enable users to read and write information in the form of web pages and share media items like videos, audios, books etc. Yioop is an open-source web portal with features of a search engine, a wiki system and discussion groups. In this project I have enhanced Yioop’s features for improving the user experiences. The preliminary work introduced new features like emoji picker tool for direct messaging system, unit testing framework for automating the UI testing of Yioop and redeeming advertisement credits back into real money. …


Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol Jan 2022

Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol

Master's Projects

The first web applications appeared in the early nineteen nineties. These applica- tions were entirely hosted in house by companies that developed them. In the mid 2000s the concept of a digital cloud was introduced by the then CEO of google Eric Schmidt. Now in the current day most companies will at least partially host their applications on proprietary servers hosted at data-centers or commercial clouds like Amazon Web Services (AWS) or Heroku.

This arrangement seems like a straight forward win-win for both parties, the customer gets rid of the hassle of maintaining a live server for their applications and …


Whole File Chunk Based Deduplication Using Reinforcement Learning, Xincheng Yuan Jan 2022

Whole File Chunk Based Deduplication Using Reinforcement Learning, Xincheng Yuan

Master's Projects

Deduplication is the process of removing replicated data content from storage facilities like online databases, cloud datastore, local file systems, etc., which is commonly performed as part of data preprocessing to eliminate redundant data that requires unnecessary storage spaces and computing power. Deduplication is even more specifically essential for file backup systems since duplicated files will presumably consume more storage space, especially with a short backup period like daily [8]. A common technique in this field involves splitting files into chunks whose hashes can be compared using data structures or techniques like clustering. In this project we explore the possibility …


An Open Source Direct Messaging And Enhanced Recommendation System For Yioop, Aniruddha Dinesh Mallya Dec 2021

An Open Source Direct Messaging And Enhanced Recommendation System For Yioop, Aniruddha Dinesh Mallya

Master's Projects

Recommendation systems and direct messaging systems are two popular components of web portals. A recommendation system is an information filtering system that seeks to predict the "rating" or "preference" a user would give to an item and a direct messaging system allows private communication between users of any platform. Yioop, is an open source, PHP search engine and web portal that can be configured to allow users to create discussion groups, blogs, wikis etc.

In this project, we expanded on Yioop’s group system so that every user now has a personal group. Personal groups were then used to add user …


Translating Natural Language Queries To Sparql, Shreya Satish Bhajikhaye May 2021

Translating Natural Language Queries To Sparql, Shreya Satish Bhajikhaye

Master's Projects

The Semantic Web is an extensive knowledge base that contains facts in the form of RDF
triples. These facts are not easily accessible to the average user because to use them requires
an understanding of ontologies and a query language like SPARQL. Question answering systems
form a layer of abstraction on linked data to overcome these issues. These systems allow the
user to input a question in a natural language and receive the equivalent SPARQL query. The
user can then execute the query on the database to fetch the desired results. The standard
techniques involved in translating natural language questions …


Improved Chinese Language Processing For An Open Source Search Engine, Xianghong Sun May 2020

Improved Chinese Language Processing For An Open Source Search Engine, Xianghong Sun

Master's Projects

Natural Language Processing (NLP) is the process of computers analyzing on human languages. There are also many areas in NLP. Some of the areas include speech recognition, natural language understanding, and natural language generation.

Information retrieval and natural language processing for Asians languages has its own unique set of challenges not present for Indo-European languages. Some of these are text segmentation, named entity recognition in unsegmented text, and part of speech tagging. In this report, we describe our implementation of and experiments with improving the Chinese language processing sub-component of an open source search engine, Yioop. In particular, we rewrote …


Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma Dec 2019

A Hybrid Approach For Multi-Document Text Summarization, Rashmi Varma

Master's Projects

Text summarization has been a long studied topic in the field of natural language processing. There have been various approaches for both extractive text summarization as well as abstractive text summarization. Summarizing texts for a single document is a methodical task. But summarizing multiple documents poses as a greater challenge. This thesis explores the application of Latent Semantic Analysis, Text-Rank, Lex-Rank and Reduction algorithms for single document text summarization and compares it with the proposed approach of creating a hybrid system combining each of the above algorithms, individually, with Restricted Boltzmann Machines for multi-document text summarization and analyzing how all …


Music Retrieval System Using Query-By-Humming, Parth Patel Dec 2019

Music Retrieval System Using Query-By-Humming, Parth Patel

Master's Projects

Music Information Retrieval (MIR) is a particular research area of great interest because there are various strategies to retrieve music. To retrieve music, it is important to find a similarity between the input query and the matching music. Several solutions have been proposed that are currently being used in the application domain(s) such as Query- by-Example (QBE) which takes a sample of an audio recording playing in the background and retrieves the result. However, there is no efficient approach to solve this problem in a Query-by-Humming (QBH) application. In a Query-by-Humming application, the aim is to retrieve music that is …


Influence Analysis Based On Political Twitter Data, Jace Rose May 2019

Influence Analysis Based On Political Twitter Data, Jace Rose

Master's Projects

Studies of online behavior often consider how users interact online, their posting behaviors, what they are tweeting about, and how likely they are to follow other people. The problem is there is that no deeper study on the people that a user has interacted with and how these other users affect them. This study examines if it is possible to draw similar sentiment from users with whom the target user has interacted with. The data collection process gathers data from Twitter users posting to popular political hashtags, which the highest at the time published were #MAGA and #TRUMP, as well …


Topic Classification Using Hybrid Of Unsupervised And Supervised Learning, Jayant Shelke May 2019

Topic Classification Using Hybrid Of Unsupervised And Supervised Learning, Jayant Shelke

Master's Projects

There has been research around the idea of representing words in text as vectors and many models proposed that vary in performance as well as applications. Text processing is used for content recommendation, sentiment analysis, plagiarism detection, content creation, language translation, etc. to name a few. Specifically, we want to look at the problem of topic detection in text content of articles/blogs/summaries. With the humungous amount of text content published each and every minute on the internet, it is imperative that we have very good algorithms and approaches to analyze all the content and be able to classify most of …


Image Retrieval Using Image Captioning, Nivetha Vijayaraju May 2019

Image Retrieval Using Image Captioning, Nivetha Vijayaraju

Master's Projects

The rapid growth in the availability of the Internet and smartphones have resulted in the increase in usage of social media in recent years. This increased usage has thereby resulted in the exponential growth of digital images which are available. Therefore, image retrieval systems play a major role in fetching images relevant to the query provided by the users. These systems should also be able to handle the massive growth of data and take advantage of the emerging technologies, like deep learning and image captioning. This report aims at understanding the purpose of image retrieval and various research held in …


An Ensemble Model For Click Through Rate Prediction, Muthaiah Ramanathan May 2019

An Ensemble Model For Click Through Rate Prediction, Muthaiah Ramanathan

Master's Projects

Internet has become the most prominent and accessible way to spread the news about an event or to pitch, advertise and sell a product, globally. The success of any advertisement campaign lies in reaching the right class of target audience and eventually convert them as potential customers in the future. Search engines like the Google, Yahoo, Bing are a few of the most used ones by the businesses to market their product. Apart from this, certain websites like the www.alibaba.com that has more traffic also offer services for B2B customers to set their advertisement campaign. The look of the advertisement, …


Sentiment Analysis For Search Engine, Saravana Gunaseelan May 2019

Sentiment Analysis For Search Engine, Saravana Gunaseelan

Master's Projects

The chief purpose of this study is to detect and eliminate the sentiment bias in a search engine. Sentiment bias means a bias induced in the search results based on the sentiment of the user’s search query. As people increasing depend on search engines for information, it is important to understand the quality of results produced by the search engines. This study does not try to build a search engine but leverage the existing search engines to provide better results to the user. In this study, only the queries that have high sentiment polarity are analyzed and the machine learning …


Predictive Analysis For Cloud Infrastructure Metrics, Paridhi Agrawal May 2019

Predictive Analysis For Cloud Infrastructure Metrics, Paridhi Agrawal

Master's Projects

In a cloud computing environment, enterprises have the flexibility to request resources according to their application demands. This elastic feature of cloud computing makes it an attractive option for enterprises to host their applications on the cloud. Cloud providers usually exploit this elasticity by auto-scaling the application resources for quality assurance. However, there is a setup-time delay that may take minutes between the demand for a new resource and it being prepared for utilization. This causes the static resource provisioning techniques, which request allocation of a new resource only when the application breaches a specific threshold, to be slow and …


Question Type Recognition Using Natural Language Input, Aishwarya Soni Jun 2017

Question Type Recognition Using Natural Language Input, Aishwarya Soni

Master's Projects

Recently, numerous specialists are concentrating on the utilization of Natural Language Processing (NLP) systems in various domains, for example, data extraction and content mining. One of the difficulties with these innovations is building up a precise Question and Answering (QA) System. Question type recognition is the most significant task in a QA system, for example, chat bots. Organization such as National Institute of Standards (NIST) hosts a conference series called as Text REtrieval Conference (TREC) series which keeps a competition every year to encourage and improve the technique of information retrieval from a large corpus of text. When a user …


Improving Text Classification With Word Embedding, Lihao Ge Jun 2017

Improving Text Classification With Word Embedding, Lihao Ge

Master's Projects

One challenge in text classification is that it is hard to make feature reduction basing upon the meaning of the features. An improper feature reduction may even worsen the classification accuracy. Word2Vec, a word embedding method, has recently been gaining popularity due to its high precision rate of analyzing the semantic similarity between words at relatively low computational cost. However, there are only a limited number of researchers focusing on feature reduction using Word2Vec. In this project, we developed a Word2Vec based method to reduce the feature size while increasing the classification accuracy. The feature reduction is achieved by loosely …


An Open Source Discussion Group Recommendation System, Sarika Padmashali May 2017

An Open Source Discussion Group Recommendation System, Sarika Padmashali

Master's Projects

A recommendation system analyzes user behavior on a website to make suggestions about what a user should do in the future on the website. It basically tries to predict the “rating” or “preference” a user would have for an action. Yioop is an open source search engine, wiki system, and user discussion group system managed by Dr. Christopher Pollett at SJSU. In this project, we have developed a recommendation system for Yioop where users are given suggestions about the threads and groups they could join based on their user history. We have used collaborative filtering techniques to make recommendations and …


Document Classification Using Machine Learning, Ankit Basarkar May 2017

Document Classification Using Machine Learning, Ankit Basarkar

Master's Projects

To perform document classification algorithmically, documents need to be represented such that it is understandable to the machine learning classifier. The report discusses the different types of feature vectors through which document can be represented and later classified. The project aims at comparing the Binary, Count and TfIdf feature vectors and their impact on document classification. To test how well each of the three mentioned feature vectors perform, we used the 20-newsgroup dataset and converted the documents to all the three feature vectors. For each feature vector representation, we trained the Naïve Bayes classifier and then tested the generated classifier …


A Chatbot Framework For Yioop, Harika Nukala May 2017

A Chatbot Framework For Yioop, Harika Nukala

Master's Projects

Over the past few years, messaging applications have become more popular than Social networking sites. Instead of using a specific application or website to access some service, chatbots are created on messaging platforms to allow users to interact with companies’ products and also give assistance as needed. In this project, we designed and implemented a chatbot Framework for Yioop. The goal of the Chatbot Framework for Yioop project is to provide a platform for developers in Yioop to build and deploy chatbot applications. A chatbot is a web service that can converse with users using artificial intelligence in messaging platforms. …


Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar May 2017

Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar

Master's Projects

Natural language processing (NLP) is a technique by which computers can analyze, understand, and derive meaning from human language. Phrases in a body of natural text that represent names, such as those of persons, organizations or locations are referred to as named entities. Identifying and categorizing these named entities is still a challenging task, research on which, has been carried out for many years. In this project, we build a supervised learning based classifier which can perform named entity recognition and classification (NERC) on input text and implement it as part of a chatbot application. The implementation is then scaled …


Headline Generation Using Deep Neural Networks, Dhruven Vora May 2017

Headline Generation Using Deep Neural Networks, Dhruven Vora

Master's Projects

News headline generation is one of the important text summarization tasks. Human generated news headlines are generally intended to catch the eye rather than provide useful information. There have been many approaches to generate meaningful headlines by either using neural networks or using linguistic features. In this report, we are proposing a novel approach based on integrating Hedge Trimmer, which is a grammar based extractive summarization system with a deep neural network abstractive summarization system to generate meaningful headlines. We analyze the results against current recurrent neural network based headline generation system.


Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le Jun 2016

Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le

Master's Projects

This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …