Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Machine learning (22)
- Machine Learning (19)
- Deep learning (14)
- Deep Learning (13)
- CNNs (10)
-
- Neural networks (8)
- SVM (7)
- Computer vision (6)
- Natural Language Processing (6)
- Reinforcement Learning (6)
- CNN (5)
- Classification (5)
- Twitter (5)
- Word2Vec (5)
- BERT (4)
- Computer Vision (4)
- Convolutional Neural Network (4)
- Logistic Regression (4)
- Malware classification (4)
- Malware detection (4)
- Reinforcement learning (4)
- Artificial intelligence (3)
- Biometrics (3)
- Chatbots (3)
- Convolutional Neural Networks (3)
- Convolutional neural networks (3)
- Image processing (3)
- Keystroke dynamics (3)
- LSTM (3)
- Malware Classification (3)
- Publication Year
- Publication
- Publication Type
Articles 211 - 240 of 267
Full-Text Articles in Physical Sciences and Mathematics
Question Type Recognition Using Natural Language Input, Aishwarya Soni
Question Type Recognition Using Natural Language Input, Aishwarya Soni
Master's Projects
Recently, numerous specialists are concentrating on the utilization of Natural Language Processing (NLP) systems in various domains, for example, data extraction and content mining. One of the difficulties with these innovations is building up a precise Question and Answering (QA) System. Question type recognition is the most significant task in a QA system, for example, chat bots. Organization such as National Institute of Standards (NIST) hosts a conference series called as Text REtrieval Conference (TREC) series which keeps a competition every year to encourage and improve the technique of information retrieval from a large corpus of text. When a user …
Improving Text Classification With Word Embedding, Lihao Ge
Improving Text Classification With Word Embedding, Lihao Ge
Master's Projects
One challenge in text classification is that it is hard to make feature reduction basing upon the meaning of the features. An improper feature reduction may even worsen the classification accuracy. Word2Vec, a word embedding method, has recently been gaining popularity due to its high precision rate of analyzing the semantic similarity between words at relatively low computational cost. However, there are only a limited number of researchers focusing on feature reduction using Word2Vec. In this project, we developed a Word2Vec based method to reduce the feature size while increasing the classification accuracy. The feature reduction is achieved by loosely …
Housing Price Prediction Using Support Vector Regression, Jiao Yang Wu
Housing Price Prediction Using Support Vector Regression, Jiao Yang Wu
Master's Projects
The relationship between house prices and the economy is an important motivating factor for predicting house prices. Housing price trends are not only the concern of buyers and sellers, but it also indicates the current economic situation. Therefore, it is important to predict housing prices without bias to help both the buyers and sellers make their decisions. This project uses an open source dataset, which include 20 explanatory features and 21,613 entries of housing sales in King County, USA. We compare different feature selection methods and feature extraction algorithm with Support Vector Regression (SVR) to predict the house prices in …
Path-Finding Methodology For Visually-Impaired Patients Based On Image-Processing, Abhilash Goyal
Path-Finding Methodology For Visually-Impaired Patients Based On Image-Processing, Abhilash Goyal
Master's Projects
The objective of this project is to propose and develop the path-finding methodology for the visually impaired patients. The proposed novel methodology is based on image-processing and it is targeted for the patients who are not completely blind. The major problem faced by visually impaired patients is to walk independently. It is mainly because these patients can not see obstacles in front of them due to the degradation in their eye sight. Degradation in the eye-sight is mainly because either the light doesn't focus on the retina properly or due to the malfunction of the photoreceptor cells on the retina, …
Predicting Pancreatic Cancer Using Support Vector Machine, Akshay Bodkhe
Predicting Pancreatic Cancer Using Support Vector Machine, Akshay Bodkhe
Master's Projects
This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that …
An Open Source Discussion Group Recommendation System, Sarika Padmashali
An Open Source Discussion Group Recommendation System, Sarika Padmashali
Master's Projects
A recommendation system analyzes user behavior on a website to make suggestions about what a user should do in the future on the website. It basically tries to predict the “rating” or “preference” a user would have for an action. Yioop is an open source search engine, wiki system, and user discussion group system managed by Dr. Christopher Pollett at SJSU. In this project, we have developed a recommendation system for Yioop where users are given suggestions about the threads and groups they could join based on their user history. We have used collaborative filtering techniques to make recommendations and …
Neural Net Stock Trend Predictor, Sonal Kabra
Neural Net Stock Trend Predictor, Sonal Kabra
Master's Projects
This report analyzes new and existing stock market prediction techniques. Traditional technical analysis was combined with various machine-learning approaches such as artificial neural networks, k-nearest neighbors, and decision trees. Experiments we conducted show that technical analysis together with machine learning can be used to profitably direct an investor’s trading decisions. We are measuring the profitability of experiments by calculating the percentage weekly return for each stock entity under study. Our algorithms and simulations are developed using Python. The technical analysis methodology combined with machine learning algorithms show promising results which we discuss in this report.
Ai For Classic Video Games Using Reinforcement Learning, Shivika Sodhi
Ai For Classic Video Games Using Reinforcement Learning, Shivika Sodhi
Master's Projects
Deep reinforcement learning is a technique to teach machines tasks based on trial and error experiences in the way humans learn. In this paper, some preliminary research is done to understand how reinforcement learning and deep learning techniques can be combined to train an agent to play Archon, a classic video game. We compare two methods to estimate a Q function, the function used to compute the best action to take at each point in the game. In the first approach, we used a Q table to store the states and weights of the corresponding actions. In our experiments, this …
Document Classification Using Machine Learning, Ankit Basarkar
Document Classification Using Machine Learning, Ankit Basarkar
Master's Projects
To perform document classification algorithmically, documents need to be represented such that it is understandable to the machine learning classifier. The report discusses the different types of feature vectors through which document can be represented and later classified. The project aims at comparing the Binary, Count and TfIdf feature vectors and their impact on document classification. To test how well each of the three mentioned feature vectors perform, we used the 20-newsgroup dataset and converted the documents to all the three feature vectors. For each feature vector representation, we trained the Naïve Bayes classifier and then tested the generated classifier …
Credit Scoring Using Logistic Regression, Ansen Mathew
Credit Scoring Using Logistic Regression, Ansen Mathew
Master's Projects
This report presents an approach to predict the credit scores of customers using the Logistic Regression machine learning algorithm. The research objective of this project is to perform a comparative study between feature selection and feature extraction, against the same dataset using the Logistic Regression machine learning algorithm. For feature selection, we have used Stepwise Logistic Regression. For feature extraction, we have used Singular Value Decomposition (SVD) and Weighted Singular Value Decomposition (SVD). In order to test the accuracy obtained using feature selection and feature extraction, we used a public credit dataset having 11 features and 150,000 records. After performing …
A Chatbot Framework For Yioop, Harika Nukala
A Chatbot Framework For Yioop, Harika Nukala
Master's Projects
Over the past few years, messaging applications have become more popular than Social networking sites. Instead of using a specific application or website to access some service, chatbots are created on messaging platforms to allow users to interact with companies’ products and also give assistance as needed. In this project, we designed and implemented a chatbot Framework for Yioop. The goal of the Chatbot Framework for Yioop project is to provide a platform for developers in Yioop to build and deploy chatbot applications. A chatbot is a web service that can converse with users using artificial intelligence in messaging platforms. …
Cascaded Facial Detection Algorithms To Improve Recognition, Edmund Yee
Cascaded Facial Detection Algorithms To Improve Recognition, Edmund Yee
Master's Projects
The desire to be able to use computer programs to recognize certain biometric qualities of people have been desired by several different types of organizations. One of these qualities worked on and has achieved moderate success is facial detection and recognition. Being able to use computers to determine where and who a face is has generated several different algorithms to solve this problem with different benefits and drawbacks. At the backbone of each algorithm is the desire for it to be quick and accurate. By cascading face detection algorithms, accuracy can be improved but runtime will subsequently be increased. Neural …
Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar
Named Entity Recognition And Classification For Natural Language Inputs At Scale, Shreeraj Dabholkar
Master's Projects
Natural language processing (NLP) is a technique by which computers can analyze, understand, and derive meaning from human language. Phrases in a body of natural text that represent names, such as those of persons, organizations or locations are referred to as named entities. Identifying and categorizing these named entities is still a challenging task, research on which, has been carried out for many years. In this project, we build a supervised learning based classifier which can perform named entity recognition and classification (NERC) on input text and implement it as part of a chatbot application. The implementation is then scaled …
Comparing Authentic And Cryptic 5’ Splice Sites Using Hidden Markov Models And Decision Trees, Pratikshya Mishra
Comparing Authentic And Cryptic 5’ Splice Sites Using Hidden Markov Models And Decision Trees, Pratikshya Mishra
Master's Projects
Splicing is the editing of the precursor mRNA produced during transcription. The mRNA contains a large number of nucleotides in the introns and exons which are spliced to remove the introns and bind the exons to produce the mature mRNA which is translated to generate proteins. Hence accurate splicing at 5’ and 3’ splice sites (authentic splice sites (AuthSS)) is of foremost importance. The 5’ and 3’ splice sites are characterized by consensus sequences. Eukaryotic genome also contains splice sites known as Cryptic Splice Sites (CSS) that match the consensus. But the CSS are activated only when there is a …
Application Of Computational Methods To Study The Selection Of Authentic And Cryptic Splice Sites, Tapomay Dey
Application Of Computational Methods To Study The Selection Of Authentic And Cryptic Splice Sites, Tapomay Dey
Master's Projects
Proteins are building blocks of the bodies of eukaryotes, and the process of synthesizing proteins from DNA is crucial for the good health of an organism [13]. However, some mutations in the DNA may disrupt the selection of 5’ or 3’ splice sites by a spliceosome. An important research question is whether the disruptions have a stochastic relation to the position of nucleotides in the vicinity of the known authentic and cryptic splice sites. This can be achieved by proving that the authentic and cryptic splice sites are intrinsically different. However, the behavior of the spliceosome is not accurately known. …
Computational Analysis Of Cryptic Splice Sites, Remya Mohanan
Computational Analysis Of Cryptic Splice Sites, Remya Mohanan
Master's Projects
DNA in the nucleus of all eukaryotes is transcribed into mRNA where it is then translated into proteins. The DNA which is transcribed into mRNA is composed of coding and non-coding regions called exons and introns, respectively. It undergoes a post-trancriptional process called splicing where the introns or the non-coding regions are removed from the pre-mRNA to give the mature mRNA. Splicing of pre-mRNAs at 5 ́ and 3ˊ ends is a crucial step in the gene expression pathway. The mis-splicing by the spliceosome at different sites known as cryptic splice sites is caused by mutations which will affect the …
Headline Generation Using Deep Neural Networks, Dhruven Vora
Headline Generation Using Deep Neural Networks, Dhruven Vora
Master's Projects
News headline generation is one of the important text summarization tasks. Human generated news headlines are generally intended to catch the eye rather than provide useful information. There have been many approaches to generate meaningful headlines by either using neural networks or using linguistic features. In this report, we are proposing a novel approach based on integrating Hedge Trimmer, which is a grammar based extractive summarization system with a deep neural network abstractive summarization system to generate meaningful headlines. We analyze the results against current recurrent neural network based headline generation system.
Shopbot: An Image Based Search Application For E-Commerce Domain, Nishant Goel
Shopbot: An Image Based Search Application For E-Commerce Domain, Nishant Goel
Master's Projects
For the past few years, e-commerce has changed the way people buy and sell products. People use this business model to do business over the Internet. In this domain, Human-Computer Interaction has been gaining momentum. Lately, there has been an upsurge in agent based applications in the form of intelligent personal assistants (also known as Chatbots) which make it easier for users to interact with digital services via a conversation, in the same way we talk to humans. In e- commerce, these assistants offer mainly text-based or speech based search capabilities. They can handle search for most products, but cannot …
Generic Online Learning For Partial Visible & Dynamic Environment With Delayed Feedback, Behrooz Shahriari
Generic Online Learning For Partial Visible & Dynamic Environment With Delayed Feedback, Behrooz Shahriari
Master's Projects
Reinforcement learning (RL) has been applied to robotics and many other domains which a system must learn in real-time and interact with a dynamic environment. In most studies the state- action space that is the key part of RL is predefined. Integration of RL with deep learning method has however taken a tremendous leap forward to solve novel challenging problems such as mastering a board game of Go. The surrounding environment to the agent may not be fully visible, the environment can change over time, and the feedbacks that agent receives for its actions can have a fluctuating delay. In …
Masquerade Detection On Mobile Devices, Swathi Nambiar Kadala Manikoth
Masquerade Detection On Mobile Devices, Swathi Nambiar Kadala Manikoth
Master's Projects
A masquerade is an attack where the attacker avoids detection by impersonating an authorized user of a system. In this research we consider the problem of masquerade detection on mobile devices. Our goal is to improve on previous work by considering more features and a wide variety of machine learning techniques. Our approach consists of verifying the authenticity of users based on individual features and combinations of features for all users to determine which features contribute the most to masquerade detection. Also, we determine which of the two approaches - the combination of features or using individual features has performed …
Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu
Mining Frequency Of Drug Side Effects Over A Large Twitter Dataset Using Apache Spark, Dennis Hsu
Master's Projects
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets …
Image Spam Detection, Aneri Chavda
Image Spam Detection, Aneri Chavda
Master's Projects
Email is one of the most common forms of digital communication. Spam can be de ned as unsolicited bulk email, while image spam includes spam text embedded inside images. Image spam is used by spammers so as to evade text-based spam lters and hence it poses a threat to email based communication. In this research, we analyze image spam detection methods based on various combinations of image processing and machine learning techniques.
Malware Detection Using The Index Of Coincidence, Bhavna Gurnani
Malware Detection Using The Index Of Coincidence, Bhavna Gurnani
Master's Projects
In this research, we apply the Index of Coincidence (IC) to problems in malware analysis. The IC, which is often used in cryptanalysis of classic ciphers, is a technique for measuring the repeat rate in a string of symbols. A score based on the IC is applied to a variety of challenging malware families. We nd that this relatively simple IC score performs surprisingly well, with superior results in comparison to various machine learning based scores, at least in some cases.
Real-Time Online Chinese Character Recognition, Wenlong Zhang
Real-Time Online Chinese Character Recognition, Wenlong Zhang
Master's Projects
In this project, I built a web application for handwritten Chinese characters recognition in real time. This system determines a Chinese character while a user is drawing/writing it. The techniques and steps I use to build the recognition system include data preparation, preprocessing, features extraction, and classification. To increase the accuracy, two different types of neural networks ared used in the system: a multi-layer neural network and a convolutional neural network.
Dna Analysis Using Grammatical Inference, Cory Cook
Dna Analysis Using Grammatical Inference, Cory Cook
Master's Projects
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for …
Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi
Analysis On Alergia Algorithm: Pattern Recognition By Automata Theory, Xuanyi Qi
Master's Projects
Based on Kolmogorov Complexity, a finite set x of strings has a pattern if the set x can be output by a Turing machine of length that is less than minimum of all |x|; this Turing machine, that may not be unique, is called a pattern of the finite set of string. In order to find a pattern of a given finite set of strings (assuming such a pattern exists), the ALERGIA algorithm is used to approximate such a pattern (Turing machine) in terms of finite automata. Note that each finite automaton defines a partition on formal language Σ*, ALERGIA …
Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le
Analyze Large Multidimensional Datasets Using Algebraic Topology, David Le
Master's Projects
This paper presents an efficient algorithm to extract knowledge from high-dimensionality, high- complexity datasets using algebraic topology, namely simplicial complexes. Based on concept of isomorphism of relations, our method turn a relational table into a geometric object (a simplicial complex is a polyhedron). So, conceptually association rule searching is turned into a geometric traversal problem. By leveraging on the core concepts behind Simplicial Complex, we use a new technique (in computer science) that improves the performance over existing methods and uses far less memory. It was designed and developed with a strong emphasis on scalability, reliability, and extensibility. This paper …
Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen
Machine Learning On The Cloud For Pattern Recognition, Tien Nguyen
Master's Projects
Pattern recognition is a field of machine learning with applications to areas such as text recognition and computer vision. Machine learning algorithms, such as convolutional neural networks, may be trained to classify images. However, such tasks may be computationally intensive for a commercial computer for larger volumes or larger sizes of images. Cloud computing allows one to overcome the processing and memory constraints of average commercial computers, allowing computations on larger amounts of data. In this project, we developed a system for detection and tracking of moving human and vehicle objects in videos in real time or near real time. …
Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala
Multi Faceted Text Classification Using Supervised Machine Learning Models, Abhiteja Gajjala
Master's Projects
In recent year’s document management tasks (known as information retrieval) increased a lot due to availability of digital documents everywhere. The need of automatic methods for extracting document information became a prominent method for organizing information and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. In my research classification of text is mainly focused on sentiment label classification. The idea proposed for sentiment analysis is multi-class classification of online movie reviews. Many research papers discussed the classification of sentiment either positive or …
Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy
Supervised Learning For Multi-Domain Text Classification, Siva Charan Reddy Gangireddy
Master's Projects
Digital information available on the Internet is increasing day by day. As a result of this, the demand for tools that help people in finding and analyzing all these resources are also growing in number. Text Classification, in particular, has been very useful in managing the information. Text Classification is the process of assigning natural language text to one or more categories based on the content. It has many important applications in the real world. For example, finding the sentiment of the reviews, posted by people on restaurants, movies and other such things are all applications of Text classification. In …