Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 211 - 240 of 267

Full-Text Articles in Physical Sciences and Mathematics

Question Type Recognition Using Natural Language Input, Aishwarya Soni Jun 2017

Question Type Recognition Using Natural Language Input, Aishwarya Soni

Master's Projects

Recently, numerous specialists are concentrating on the utilization of Natural Language Processing (NLP) systems in various domains, for example, data extraction and content mining. One of the difficulties with these innovations is building up a precise Question and Answering (QA) System. Question type recognition is the most significant task in a QA system, for example, chat bots. Organization such as National Institute of Standards (NIST) hosts a conference series called as Text REtrieval Conference (TREC) series which keeps a competition every year to encourage and improve the technique of information retrieval from a large corpus of text. When a user …


Improving Text Classification With Word Embedding, Lihao Ge Jun 2017

Improving Text Classification With Word Embedding, Lihao Ge

Master's Projects

One challenge in text classification is that it is hard to make feature reduction basing upon the meaning of the features. An improper feature reduction may even worsen the classification accuracy. Word2Vec, a word embedding method, has recently been gaining popularity due to its high precision rate of analyzing the semantic similarity between words at relatively low computational cost. However, there are only a limited number of researchers focusing on feature reduction using Word2Vec. In this project, we developed a Word2Vec based method to reduce the feature size while increasing the classification accuracy. The feature reduction is achieved by loosely …


Housing Price Prediction Using Support Vector Regression, Jiao Yang Wu May 2017

Housing Price Prediction Using Support Vector Regression, Jiao Yang Wu

Master's Projects

The relationship between house prices and the economy is an important motivating factor for predicting house prices. Housing price trends are not only the concern of buyers and sellers, but it also indicates the current economic situation. Therefore, it is important to predict housing prices without bias to help both the buyers and sellers make their decisions. This project uses an open source dataset, which include 20 explanatory features and 21,613 entries of housing sales in King County, USA. We compare different feature selection methods and feature extraction algorithm with Support Vector Regression (SVR) to predict the house prices in …


Path-Finding Methodology For Visually-Impaired Patients Based On Image-Processing, Abhilash Goyal May 2017

Path-Finding Methodology For Visually-Impaired Patients Based On Image-Processing, Abhilash Goyal

Master's Projects

The objective of this project is to propose and develop the path-finding methodology for the visually impaired patients. The proposed novel methodology is based on image-processing and it is targeted for the patients who are not completely blind. The major problem faced by visually impaired patients is to walk independently. It is mainly because these patients can not see obstacles in front of them due to the degradation in their eye sight. Degradation in the eye-sight is mainly because either the light doesn't focus on the retina properly or due to the malfunction of the photoreceptor cells on the retina, …


Predicting Pancreatic Cancer Using Support Vector Machine, Akshay Bodkhe May 2017

Predicting Pancreatic Cancer Using Support Vector Machine, Akshay Bodkhe

Master's Projects

This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that …


An Open Source Discussion Group Recommendation System, Sarika Padmashali May 2017

An Open Source Discussion Group Recommendation System, Sarika Padmashali

Master's Projects

A recommendation system analyzes user behavior on a website to make suggestions about what a user should do in the future on the website. It basically tries to predict the “rating” or “preference” a user would have for an action. Yioop is an open source search engine, wiki system, and user discussion group system managed by Dr. Christopher Pollett at SJSU. In this project, we have developed a recommendation system for Yioop where users are given suggestions about the threads and groups they could join based on their user history. We have used collaborative filtering techniques to make recommendations and …


Neural Net Stock Trend Predictor, Sonal Kabra May 2017

Neural Net Stock Trend Predictor, Sonal Kabra

Master's Projects

This report analyzes new and existing stock market prediction techniques. Traditional technical analysis was combined with various machine-learning approaches such as artificial neural networks, k-nearest neighbors, and decision trees. Experiments we conducted show that technical analysis together with machine learning can be used to profitably direct an investor’s trading decisions. We are measuring the profitability of experiments by calculating the percentage weekly return for each stock entity under study. Our algorithms and simulations are developed using Python. The technical analysis methodology combined with machine learning algorithms show promising results which we discuss in this report.


Ai For Classic Video Games Using Reinforcement Learning, Shivika Sodhi May 2017

Ai For Classic Video Games Using Reinforcement Learning, Shivika Sodhi

Master's Projects

Deep reinforcement learning is a technique to teach machines tasks based on trial and error experiences in the way humans learn. In this paper, some preliminary research is done to understand how reinforcement learning and deep learning techniques can be combined to train an agent to play Archon, a classic video game. We compare two methods to estimate a Q function, the function used to compute the best action to take at each point in the game. In the first approach, we used a Q table to store the states and weights of the corresponding actions. In our experiments, this …


Document Classification Using Machine Learning, Ankit Basarkar May 2017

Document Classification Using Machine Learning, Ankit Basarkar

Master's Projects

To perform document classification algorithmically, documents need to be represented such that it is understandable to the machine learning classifier. The report discusses the different types of feature vectors through which document can be represented and later classified. The project aims at comparing the Binary, Count and TfIdf feature vectors and their impact on document classification. To test how well each of the three mentioned feature vectors perform, we used the 20-newsgroup dataset and converted the documents to all the three feature vectors. For each feature vector representation, we trained the Naïve Bayes classifier and then tested the generated classifier …


Credit Scoring Using Logistic Regression, Ansen Mathew May 2017

Credit Scoring Using Logistic Regression, Ansen Mathew

Master's Projects

This report presents an approach to predict the credit scores of customers using the Logistic Regression machine learning algorithm. The research objective of this project is to perform a comparative study between feature selection and feature extraction, against the same dataset using the Logistic Regression machine learning algorithm. For feature selection, we have used Stepwise Logistic Regression. For feature extraction, we have used Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). In order to test the accuracy obtained using feature selection and feature extraction, we used a public credit dataset having 11 features and 150,000 records. After performing …


A Chatbot Framework For Yioop, Harika Nukala May 2017

A Chatbot Framework For Yioop, Harika Nukala

Master's Projects

Over the past few years, messaging applications have become more popular than Social networking sites. Instead of using a specific application or website to access some service, chatbots are created on messaging platforms to allow users to interact with companies’ products and also give assistance as needed. In this project, we designed and implemented a chatbot Framework for Yioop. The goal of the Chatbot Framework for Yioop project is to provide a platform for developers in Yioop to build and deploy chatbot applications. A chatbot is a web service that can converse with users using artificial intelligence in messaging platforms. …


Cascaded Facial Detection Algorithms To Improve Recognition, Edmund Yee May 2017

Cascaded Facial Detection Algorithms To Improve Recognition, Edmund Yee

Master's Projects

The desire to be able to use computer programs to recognize certain biometric qualities of people have been desired by several different types of organizations. One of these qualities worked on and has achieved moderate success is facial detection and recognition. Being able to use computers to determine where and who a face is has generated several different algorithms to solve this problem with different benefits and drawbacks. At the backbone of each algorithm is the desire for it to be quick and accurate. By cascading face detection algorithms, accuracy can be improved but runtime will subsequently be increased. Neural …


Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar May 2017

Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar

Master's Projects

Natural language processing (NLP) is a technique by which computers can analyze, understand, and derive meaning from human language. Phrases in a body of natural text that represent names, such as those of persons, organizations or locations are referred to as named entities. Identifying and categorizing these named entities is still a challenging task, research on which, has been carried out for many years. In this project, we build a supervised learning based classifier which can perform named entity recognition and classification (NERC) on input text and implement it as part of a chatbot application. The implementation is then scaled …


Comparing Authentic And Cryptic 5’ Splice Sites Using Hidden Markov Models And Decision Trees, Pratikshya Mishra May 2017

Comparing Authentic And Cryptic 5’ Splice Sites Using Hidden Markov Models And Decision Trees, Pratikshya Mishra

Master's Projects

Splicing is the editing of the precursor mRNA produced during transcription. The mRNA contains a large number of nucleotides in the introns and exons which are spliced to remove the introns and bind the exons to produce the mature mRNA which is translated to generate proteins. Hence accurate splicing at 5’ and 3’ splice sites (authentic splice sites (AuthSS)) is of foremost importance. The 5’ and 3’ splice sites are characterized by consensus sequences. Eukaryotic genome also contains splice sites known as Cryptic Splice Sites (CSS) that match the consensus. But the CSS are activated only when there is a …


Application Of Computational Methods To Study The Selection Of Authentic And Cryptic Splice Sites, Tapomay Dey May 2017

Application Of Computational Methods To Study The Selection Of Authentic And Cryptic Splice Sites, Tapomay Dey

Master's Projects

Proteins are building blocks of the bodies of eukaryotes, and the process of synthesizing proteins from DNA is crucial for the good health of an organism [13]. However, some mutations in the DNA may disrupt the selection of 5’ or 3’ splice sites by a spliceosome. An important research question is whether the disruptions have a stochastic relation to the position of nucleotides in the vicinity of the known authentic and cryptic splice sites. This can be achieved by proving that the authentic and cryptic splice sites are intrinsically different. However, the behavior of the spliceosome is not accurately known. …


Computational Analysis Of Cryptic Splice Sites, Remya Mohanan May 2017

Computational Analysis Of Cryptic Splice Sites, Remya Mohanan

Master's Projects

DNA in the nucleus of all eukaryotes is transcribed into mRNA where it is then translated into proteins. The DNA which is transcribed into mRNA is composed of coding and non-coding regions called exons and introns, respectively. It undergoes a post-trancriptional process called splicing where the introns or the non-coding regions are removed from the pre-mRNA to give the mature mRNA. Splicing of pre-mRNAs at 5 ́ and 3ˊ ends is a crucial step in the gene expression pathway. The mis-splicing by the spliceosome at different sites known as cryptic splice sites is caused by mutations which will affect the …


Headline Generation Using Deep Neural Networks, Dhruven Vora May 2017

Headline Generation Using Deep Neural Networks, Dhruven Vora

Master's Projects

News headline generation is one of the important text summarization tasks. Human generated news headlines are generally intended to catch the eye rather than provide useful information. There have been many approaches to generate meaningful headlines by either using neural networks or using linguistic features. In this report, we are proposing a novel approach based on integrating Hedge Trimmer, which is a grammar based extractive summarization system with a deep neural network abstractive summarization system to generate meaningful headlines. We analyze the results against current recurrent neural network based headline generation system.


Shopbot: An Image Based Search Application For E-Commerce Domain, Nishant Goel May 2017

Shopbot: An Image Based Search Application For E-Commerce Domain, Nishant Goel

Master's Projects

For the past few years, e-commerce has changed the way people buy and sell products. People use this business model to do business over the Internet. In this domain, Human-Computer Interaction has been gaining momentum. Lately, there has been an upsurge in agent based applications in the form of intelligent personal assistants (also known as Chatbots) which make it easier for users to interact with digital services via a conversation, in the same way we talk to humans. In e- commerce, these assistants offer mainly text-based or speech based search capabilities. They can handle search for most products, but cannot …


Generic Online Learning For Partial Visible & Dynamic Environment With Delayed Feedback, Behrooz Shahriari May 2017

Generic Online Learning For Partial Visible & Dynamic Environment With Delayed Feedback, Behrooz Shahriari

Master's Projects

Reinforcement learning (RL) has been applied to robotics and many other domains which a system must learn in real-time and interact with a dynamic environment. In most studies the state- action space that is the key part of RL is predefined. Integration of RL with deep learning method has however taken a tremendous leap forward to solve novel challenging problems such as mastering a board game of Go. The surrounding environment to the agent may not be fully visible, the environment can change over time, and the feedbacks that agent receives for its actions can have a fluctuating delay. In …


Masquerade Detection On Mobile Devices, Swathi Nambiar Kadala Manikoth May 2017

Masquerade Detection On Mobile Devices, Swathi Nambiar Kadala Manikoth

Master's Projects

A masquerade is an attack where the attacker avoids detection by impersonating an authorized user of a system. In this research we consider the problem of masquerade detection on mobile devices. Our goal is to improve on previous work by considering more features and a wide variety of machine learning techniques. Our approach consists of verifying the authenticity of users based on individual features and combinations of features for all users to determine which features contribute the most to masquerade detection. Also, we determine which of the two approaches - the combination of features or using individual features has performed …


Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu May 2017

Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu

Master's Projects

Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets …


Image Spam Detection, Aneri Chavda May 2017

Image Spam Detection, Aneri Chavda

Master's Projects

Email is one of the most common forms of digital communication. Spam can be de ned as unsolicited bulk email, while image spam includes spam text embedded inside images. Image spam is used by spammers so as to evade text-based spam lters and hence it poses a threat to email based communication. In this research, we analyze image spam detection methods based on various combinations of image processing and machine learning techniques.


Malware Detection Using The Index Of Coincidence, Bhavna Gurnani Jan 2017

Malware Detection Using The Index Of Coincidence, Bhavna Gurnani

Master's Projects

In this research, we apply the Index of Coincidence (IC) to problems in malware analysis. The IC, which is often used in cryptanalysis of classic ciphers, is a technique for measuring the repeat rate in a string of symbols. A score based on the IC is applied to a variety of challenging malware families. We nd that this relatively simple IC score performs surprisingly well, with superior results in comparison to various machine learning based scores, at least in some cases.


Real-Time Online Chinese Character Recognition, Wenlong Zhang Dec 2016

Real-Time Online Chinese Character Recognition, Wenlong Zhang

Master's Projects

In this project, I built a web application for handwritten Chinese characters recognition in real time. This system determines a Chinese character while a user is drawing/writing it. The techniques and steps I use to build the recognition system include data preparation, preprocessing, features extraction, and classification. To increase the accuracy, two different types of neural networks ared used in the system: a multi-layer neural network and a convolutional neural network.


Dna Analysis Using Grammatical Inference, Cory Cook Jun 2016

Dna Analysis Using Grammatical Inference, Cory Cook

Master's Projects

An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.

An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.

Testing shows that the accuracy of inferred languages for …


Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi Jun 2016

Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi

Master's Projects

Based on Kolmogorov Complexity, a finite set x of strings has a pattern if the set x can be output by a Turing machine of length that is less than minimum of all |x|; this Turing machine, that may not be unique, is called a pattern of the finite set of string. In order to find a pattern of a given finite set of strings (assuming such a pattern exists), the ALERGIA algorithm is used to approximate such a pattern (Turing machine) in terms of finite automata. Note that each finite automaton defines a partition on formal language Σ*, ALERGIA …


Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le Jun 2016

Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le

Master's Projects

This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …


Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen Jun 2016

Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen

Master's Projects

Pattern recognition is a field of machine learning with applications to areas such as text recognition and computer vision. Machine learning algorithms, such as convolutional neural networks, may be trained to classify images. However, such tasks may be computationally intensive for a commercial computer for larger volumes or larger sizes of images. Cloud computing allows one to overcome the processing and memory constraints of average commercial computers, allowing computations on larger amounts of data. In this project, we developed a system for detection and tracking of moving human and vehicle objects in videos in real time or near real time. …


Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala Jun 2016

Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala

Master's Projects

In recent year’s document management tasks (known as information retrieval) increased a lot due to availability of digital documents everywhere. The need of automatic methods for extracting document information became a prominent method for organizing information and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. In my research classification of text is mainly focused on sentiment label classification. The idea proposed for sentiment analysis is multi-class classification of online movie reviews. Many research papers discussed the classification of sentiment either positive or …


Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy Jun 2016

Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy

Master's Projects

Digital information available on the Internet is increasing day by day. As a result of this, the demand for tools that help people in finding and analyzing all these resources are also growing in number. Text Classification, in particular, has been very useful in managing the information. Text Classification is the process of assigning natural language text to one or more categories based on the content. It has many important applications in the real world. For example, finding the sentiment of the reviews, posted by people on restaurants, movies and other such things are all applications of Text classification. In …