Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

2019

Machine Learning

Institution
Publication
Publication Type
File Type

Articles 1 - 30 of 70

Full-Text Articles in Physical Sciences and Mathematics

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg Dec 2019

Information Extraction From Biomedical Text Using Machine Learning, Deepti Garg

Master's Projects

Inadequate drug experimental data and the use of unlicensed drugs may cause adverse drug reactions, especially in pediatric populations. Every year the U.S. Food and Drug Administration approves human prescription drugs for marketing. The labels associated with these drugs include information about clinical trials and drug response in pediatric population. In order for doctors to make an informed decision about the safety and effectiveness of these drugs for children, there is a need to analyze complex and often unstructured drug labels. In this work, first, an exploratory analysis of drug labels using a Natural Language Processing pipeline is performed. Second, …


Ordinal Hyperplane Loss, Bob Vanderheyden Dec 2019

Ordinal Hyperplane Loss, Bob Vanderheyden

Doctor of Data Science and Analytics Dissertations

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …


Finding A Viable Neural Network Architecture For Use With Upper Limb Prosthetics, Maxwell Lavin Dec 2019

Finding A Viable Neural Network Architecture For Use With Upper Limb Prosthetics, Maxwell Lavin

Master of Science in Computer Science Theses

This paper attempts to answer the question of if it’s possible to produce a simple, quick, and accurate neural network for the use in upper-limb prosthetics. Through the implementation of convolutional and artificial neural networks and feature extraction on electromyographic data different possible architectures are examined with regards to processing time, complexity, and accuracy. It is found that the most accurate architecture is a multi-entry categorical cross entropy convolutional neural network with 100% accuracy. The issue is that it is also the slowest method requiring 9 minutes to run. The next best method found was a single-entry binary cross entropy …


Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira Dec 2019

Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira

Dissertations

Cardiovascular disease (CVD) is the most common cause of death in Ireland, and probably, worldwide. According to the Health Service Executive (HSE) cardiovascular disease accounting for 36% of all deaths, and one important fact, 22% of premature deaths (under age 65) are from CVD.

Using data from the Heart Disease UCI Data Set (UCI Machine Learning), we use machine learning techniques to detect the presence or absence of heart disease in the patient according to 14 features provide for this dataset. The different results are compared based on accuracy performance, confusion matrix and area under the Receiver Operating Characteristics (ROC) …


Factor Analysis Of Mixed Data (Famd) And Multiple Linear Regression In R, Nestor Pereira Dec 2019

Factor Analysis Of Mixed Data (Famd) And Multiple Linear Regression In R, Nestor Pereira

Dissertations

In the previous projects, it has been worked to statistically analysis of the factors to impact the score of the subjects of Mathematics and Portuguese for several groups of the student from secondary school from Portugal.

In this project will be interested in finding a model, hypothetically multiple linear regression, to predict the final score, dependent variable G3, of the student according to some features divide into two groups. One group, analyses the features or predictors which impact in the final score more related to the performance of the students, means variables like study time or past failures. The second …


Sensor Emulation With Physiolocal Data In Immersive Virtual Reality Driving Simulator, Jungsu Pak, Oliver Mathias, Ariane Guirguis, Uri Maoz Dec 2019

Sensor Emulation With Physiolocal Data In Immersive Virtual Reality Driving Simulator, Jungsu Pak, Oliver Mathias, Ariane Guirguis, Uri Maoz

Student Scholar Symposium Abstracts and Posters

Can we enhance the safety and comfort of AVs by training AVs with physiological data of human drivers? We will train and compare AV algorithm with/without physiological data.


Classifying Fiction And Non-Fiction Works Using Machine Learning, Rachna Gupta '21 Oct 2019

Classifying Fiction And Non-Fiction Works Using Machine Learning, Rachna Gupta '21

Student Publications & Research

The objective of this project was to create a program that can determine whether an unknown text is a work of fiction or non-fiction using machine learning. Various datasets of speeches, ebooks, poems, scientific papers, and texts from Project Gutenberg and the Wolfram Example Data were utilized to train and test a Markov Chain machine learning model. A microsite was deployed with the final product that returns a probability of fictionality based on input from the user with 95% accuracy.


Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru Oct 2019

Enhancing Timeliness Of Drug Overdose Mortality Surveillance: A Machine Learning Approach, Patrick J. Ward, Peter J. Rock, Svetla Slavova, April M. Young, Terry L. Bunn, Ramakanth Kavuluru

Kentucky Injury Prevention and Research Center Faculty Publications

BACKGROUND: Timely data is key to effective public health responses to epidemics. Drug overdose deaths are identified in surveillance systems through ICD-10 codes present on death certificates. ICD-10 coding takes time, but free-text information is available on death certificates prior to ICD-10 coding. The objective of this study was to develop a machine learning method to classify free-text death certificates as drug overdoses to provide faster drug overdose mortality surveillance.

METHODS: Using 2017–2018 Kentucky death certificate data, free-text fields were tokenized and features were created from these tokens using natural language processing (NLP). Word, bigram, and trigram features were created …


Automatic Inference Of Causal Reasoning Chains From Student Essays, Simon Mark Hughes Oct 2019

Automatic Inference Of Causal Reasoning Chains From Student Essays, Simon Mark Hughes

College of Computing and Digital Media Dissertations

While there has been an increasing focus on higher-level thinking skills arising from the Common Core Standards, many high-school and middle-school students struggle to combine and integrate information from multiple sources when writing essays. Writing is an important learning skill, and there is increasing evidence that writing about a topic develops a deeper understanding in the student. However, grading essays is time consuming for teachers, resulting in an increasing focus on shallower forms of assessment that are easier to automate, such as multiple-choice tests. Existing essay grading software has attempted to ease this burden but relies on shallow lexico-syntactic features …


Development Of An Autonomous Aerial Toolset For Agricultural Applications, Terrance Life Oct 2019

Development Of An Autonomous Aerial Toolset For Agricultural Applications, Terrance Life

Mahurin Honors College Capstone Experience/Thesis Projects

According to the United Nations, the world population is expected to grow from its current 7 billion to 9.7 billion by the year 2050. During this time, global food demand is also expected to increase by between 59% and 98% due to the population increase, accompanied by an increasing demand for protein due to a rising standard of living throughout developing countries. [1] Meeting this increase in required food production using present agricultural practices would necessitate a similar increase in farmland; a resource which does not exist in abundance. Therefore, in order to meet growing food demands, new methods will …


Hypergraph Partitioning With Embeddings, Justin Sybrandt, Ruslan Shaydulin, Ilya Safro Sep 2019

Hypergraph Partitioning With Embeddings, Justin Sybrandt, Ruslan Shaydulin, Ilya Safro

Publications

The problem of placing circuits on a chip or distributing sparse matrix operations can be modeled as the hypergraph partitioning problem. A hypergraph is a generalization of the traditional graph wherein each "hyperedge" may connect any number of nodes. Hypergraph partitioning, therefore, is the NP-Hard problem of dividing nodes into k" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; font-size: 16px; overflow-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; color: rgb(93, 93, 93); font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; position: relative;">kk similarly sized disjoint sets while …


Prediction Of Hierarchical Classification Of Transposable Elements Using Machine Learning Techniques, Manisha Panta Aug 2019

Prediction Of Hierarchical Classification Of Transposable Elements Using Machine Learning Techniques, Manisha Panta

University of New Orleans Theses and Dissertations

Transposable Elements (TEs) or jumping genes are the DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even promote gross genetic arrangements. Thus, the proper classification of the identified jumping genes is important to understand their genetic and evolutionary effects. While computational methods have been developed that perform either binary classification or multi-label classification of TEs, few studies …


Static Malware Detection Using Deep Neural Networks On Portable Executables, Piyush Aniruddha Puranik Aug 2019

Static Malware Detection Using Deep Neural Networks On Portable Executables, Piyush Aniruddha Puranik

UNLV Theses, Dissertations, Professional Papers, and Capstones

There are two main components of malware analysis. One is static malware analysis and the other is dynamic malware analysis. Static malware analysis involves examining the basic structure of the malware executable without executing it, while dynamic malware analysis relies on examining malware behavior after executing it in a controlled environment. Static malware analysis is typically done by modern anti-malware software by using signature-based analysis or heuristic-based analysis.

This thesis proposes the use of deep neural networks to learn features from a malware’s portable executable (PE) to minimize the occurrences of false positives when recognizing new malware. We use the …


Feature Selection And Analysis For Standard Machine Learning Classification Of Audio Beehive Samples, Chelsi Gupta Aug 2019

Feature Selection And Analysis For Standard Machine Learning Classification Of Audio Beehive Samples, Chelsi Gupta

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The beekeepers need to inspect their hives regularly in order to protect them from various stressors. Manual inspection of hives require a lot of time and effort. Hence, many researchers have started using electronic beehive monitoring (EBM) systems to collect critical information from beehives, so as to alert the beekeepers of possible threats to the hive. EBM collects information by applying multiple sensors into the hive. The sensors collect information in the form of video, audio or temperature data from the hives.

This thesis involves the automatic classification of audio samples from a beehive into bee buzzing, cricket chirping and …


Enhancing Scalability In Genetic Programming With Adaptable Constraints, Type Constraints And Automatically Defined Functions, George Gerules Jul 2019

Enhancing Scalability In Genetic Programming With Adaptable Constraints, Type Constraints And Automatically Defined Functions, George Gerules

Dissertations

Genetic Programming is a type of biological inspired machine learning. It is composed of a population of stochastic individuals. Those individuals can exchange portions of themselves with others in the population through the crossover operation that draws its inspiration from biology. Other biologically inspired operations include mutation and reproduction. The form an individual takes can be many things. It, however, is represented most of the time as a computer program. Constructing correct efficient programs can be notoriously difficult. Various grammar, typing, function constraint, or counting mechanisms can guide creation and evolution of those individuals. These mechanisms can reduce search space …


Df 2.0: An Automated, Privacy Preserving, And Efficient Digital Forensic Framework That Leverages Machine Learning For Evidence Prediction And Privacy Evaluation, Robin Verma, Jayaprakash Govindaraj Dr, Saheb Chhabra, Gaurav Gupta Jun 2019

Df 2.0: An Automated, Privacy Preserving, And Efficient Digital Forensic Framework That Leverages Machine Learning For Evidence Prediction And Privacy Evaluation, Robin Verma, Jayaprakash Govindaraj Dr, Saheb Chhabra, Gaurav Gupta

Journal of Digital Forensics, Security and Law

The current state of digital forensic investigation is continuously challenged by the rapid technological changes, the increase in the use of digital devices (both the heterogeneity and the count), and the sheer volume of data that these devices could contain. Although data privacy protection is not a performance measure, however, preventing privacy violations during the digital forensic investigation, is also a big challenge. With a perception that the completeness of investigation and the data privacy preservation are incompatible with each other, the researchers have provided solutions to address the above-stated challenges that either focus on the effectiveness of the investigation …


Supervised Machine Learning Models For Fake News Detection, Andrea Lopez, Adelo Vieira, Zafar Ahsan, Farooq Sabib, Shirley Marinho Jun 2019

Supervised Machine Learning Models For Fake News Detection, Andrea Lopez, Adelo Vieira, Zafar Ahsan, Farooq Sabib, Shirley Marinho

ICT

Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information. The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning …


Multimodal Data Analytics And Fusion For Data Science, Haiman Tian Jun 2019

Multimodal Data Analytics And Fusion For Data Science, Haiman Tian

FIU Electronic Theses and Dissertations

Advances in technologies have rapidly accumulated a zettabyte of “new” data every two years. The huge amount of data have a powerful impact on various areas in science and engineering and generates enormous research opportunities, which calls for the design and development of advanced approaches in data analytics. Given such demands, data science has become an emerging hot topic in both industry and academia, ranging from basic business solutions, technological innovations, and multidisciplinary research to political decisions, urban planning, and policymaking. Within the scope of this dissertation, a multimodal data analytics and fusion framework is proposed for data-driven knowledge discovery …


Identifying Hourly Traffic Patterns With Python Deep Learning, Christopher L. Leavitt Jun 2019

Identifying Hourly Traffic Patterns With Python Deep Learning, Christopher L. Leavitt

Computer Engineering

This project was designed to explore and analyze the potential abilities and usefulness of applying machine learning models to data collected by parking sensors at a major metro shopping mall. By examining patterns in rates at which customer enter and exit parking garages on the campus of the Bellevue Collection shopping mall in Bellevue, Washington, a recurrent neural network will use data points from the previous hours will be trained to forecast future trends.


Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison May 2019

Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison

Computational and Data Sciences (MS) Theses

The understanding and treatment of challenging behaviors in individuals with Autism Spectrum Disorder is paramount to enabling the success of behavioral therapy; an essential step in this process being the labeling of challenging behaviors demonstrated in therapy sessions. These manifestations differ across individuals and within individuals over time and thus, the appropriate classification of a challenging behavior when considering purely qualitative factors can be unclear. In this thesis we seek to add quantitative depth to this otherwise qualitative task of challenging behavior classification. We do so through the application of natural language processing techniques to behavioral descriptions extracted from the …


Classifying Classic Ciphers Using Machine Learning, Nivedhitha Ramarathnam Krishna May 2019

Classifying Classic Ciphers Using Machine Learning, Nivedhitha Ramarathnam Krishna

Master's Projects

We consider the problem of identifying the classic cipher that was used to generate a given ciphertext message. We assume that the plaintext is English and we restrict our attention to ciphertext consisting only of alphabetic characters. Among the classic ciphers considered are the simple substitution, Vigenère cipher, playfair cipher, and column transposition cipher. The problem of classification is approached in two ways. The first method uses support vector machines (SVM) trained directly on ciphertext to classify the ciphers. In the second approach, we train hidden Markov models (HMM) on each ciphertext message, then use these trained HMMs as features …


Emulation Vs Instrumentation For Android Malware Detection, Anukriti Sinha May 2019

Emulation Vs Instrumentation For Android Malware Detection, Anukriti Sinha

Master's Projects

In resource constrained devices, malware detection is typically based on offline analysis using emulation. In previous work it has been claimed that such emulation fails for a significant percentage of Android malware because well-designed malware detects that the code is being emulated. An alternative to emulation is malware analysis based on code that is executing on an actual Android device. In this research, we collect features from a corpus of Android malware using both emulation and on-phone instrumentation. We train machine learning models based on emulated features and also train models based on features collected via instrumentation, and we compare …


Differential Estimation Of Audiograms Using Gaussian Process Active Model Selection, Trevor Larsen May 2019

Differential Estimation Of Audiograms Using Gaussian Process Active Model Selection, Trevor Larsen

McKelvey School of Engineering Theses & Dissertations

Classical methods for psychometric function estimation either require excessive resources to perform, as in the method of constants, or produce only a low resolution approximation of the target psychometric function, as in adaptive staircase or up-down procedures. This thesis makes two primary contributions to the estimation of the audiogram, a clinically relevant psychometric function estimated by querying a patient’s for audibility of a collection of tones. First, it covers the implementation of a Gaussian process model for learning an audiogram using another audiogram as a prior belief to speed up the learning procedure. Second, it implements a use case of …


Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia May 2019

Visualization And Machine Learning Techniques For Nasa’S Em-1 Big Data Problem, Antonio P. Garza Iii, Jose Quinonez, Misael Santana, Nibhrat Lohia

SMU Data Science Review

In this paper, we help NASA solve three Exploration Mission-1 (EM-1) challenges: data storage, computation time, and visualization of complex data. NASA is studying one year of trajectory data to determine available launch opportunities (about 90TBs of data). We improve data storage by introducing a cloud-based solution that provides elasticity and server upgrades. This migration will save $120k in infrastructure costs every four years, and potentially avoid schedule slips. Additionally, it increases computational efficiency by 125%. We further enhance computation via machine learning techniques that use the classic orbital elements to predict valid trajectories. Our machine learning model decreases trajectory …


Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi May 2019

Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi

SMU Data Science Review

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …


Supervised Machine Learning Models For Fake News Detection, Gofaas Group, Andrea Lopez, Adelo Vieira, Zafar Ahsan, Farooq Saqib, Shirley Marinho May 2019

Supervised Machine Learning Models For Fake News Detection, Gofaas Group, Andrea Lopez, Adelo Vieira, Zafar Ahsan, Farooq Saqib, Shirley Marinho

ICT

Fake news or the distribution of disinformation has become one of the most challenging issues in society. News and information are churned out across online websites and platforms in real-time, with little or no way for the viewing public to determine what is real or manufactured. But an awareness of what we are consuming online is becoming apparent and efforts are underway to explore how we separate fake content from genuine and truthful information.

The most challenging part of fake news is determining how to spot it. In technology, there are ways to help us do this. Supervised machine learning …


Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little May 2019

Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little

Honors Projects

Isolation-Based Scene Generation (IBSG) is a process for creating synthetic datasets made to train machine learning detectors and classifiers. In this project, we formalize the IBSG process and describe the scenarios—object detection and object classification given audio or image input—in which it can be useful. We then look at the Stanford Street View House Number (SVHN) dataset and build several different IBSG training datasets based on existing SVHN data. We try to improve the compositing algorithm used to build the IBSG dataset so that models trained with synthetic data perform as well as models trained with the original SVHN training …


Commonsense Knowledge In Sentiment Analysis Of Ordinance Reactions For Smart Governance, Manish Puri May 2019

Commonsense Knowledge In Sentiment Analysis Of Ordinance Reactions For Smart Governance, Manish Puri

Theses, Dissertations and Culminating Projects

Smart Governance is an emerging research area which has attracted scientific as well as policy interests, and aims to improve collaboration between government and citizens, as well as other stakeholders. Our project aims to enable lawmakers to incorporate data driven decision making in enacting ordinances. Our first objective is to create a mechanism for mapping ordinances (local laws) and tweets to Smart City Characteristics (SCC). The use of SCC has allowed us to create a mapping between a huge number of ordinances and tweets, and the use of Commonsense Knowledge (CSK) has allowed us to utilize human judgment in mapping. …


Deep Embedding Kernel, Linh Le Apr 2019

Deep Embedding Kernel, Linh Le

Doctor of Data Science and Analytics Dissertations

Kernel methods and deep learning are two major branches of machine learning that have achieved numerous successes in both analytics and artificial intelligence. While having their own unique characteristics, both branches work through mapping data to a feature space that is supposedly more favorable towards the given task. This dissertation addresses the strengths and weaknesses of each mapping method through combining them and forming a family of novel deep architectures that center around the Deep Embedding Kernel (DEK). In short, DEK is a realization of a kernel function through a newly deep architecture. The mapping in DEK is both implicit …


Detection And Prevention Of Abuse In Online Social Networks, Sajedul Karim Talukder Mar 2019

Detection And Prevention Of Abuse In Online Social Networks, Sajedul Karim Talukder

FIU Electronic Theses and Dissertations

Adversaries leverage social networks to collect sensitive data about regular users and target them with abuse that includes fake news, cyberbullying, malware distribution, and propaganda. Such behavior is more effective when performed by the social network friends of victims. In two preliminary user studies we found that 71 out of 80 participants have at least 1 Facebook friend with whom (1) they never interact, either in Facebook or in real life, or whom they believe is (2) likely to abuse their posted photos or status updates, or (3) post offensive, false or malicious content. Such friend abuse is often considered …