Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2019

Classification

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 30 of 51

Full-Text Articles in Physical Sciences and Mathematics

A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater Dec 2019

A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater

SMU Data Science Review

In this paper, we present a common definition and list of skills for a Data Scientist using online job postings. The overlap and ambiguity of various roles such as data scientist, data engineer, data analyst, software engineer, database administrator, and statistician motivate the problem. To arrive at a single Data Scientist definition, we collect over 8,000 job postings from Indeed.com for the six job titles. Each corpus contains text on job qualifications, skills, responsibilities, educational preferences, and requirements. Our data science methodology and analysis rendered the single definition of a data scientist: A data scientist codes, collaborates, and communicates – …


Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur Dec 2019

Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur

Master's Projects

Myocardial Infarction (MI), commonly known as a heart attack, occurs when one of the three major blood vessels carrying blood to the heart get blocked, causing the death of myocardial (heart) cells. If not treated immediately, MI may cause cardiac arrest, which can ultimately cause death. Risk factors for MI include diabetes, family history, unhealthy diet and lifestyle. Medical treatments include various types of drugs and surgeries which can prove very expensive for patients due to high healthcare costs. Therefore, it is imperative that MI is diagnosed at the right time. Electrocardiography (ECG) is commonly used to detect MI. ECG …


Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira Dec 2019

Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira

Dissertations

Cardiovascular disease (CVD) is the most common cause of death in Ireland, and probably, worldwide. According to the Health Service Executive (HSE) cardiovascular disease accounting for 36% of all deaths, and one important fact, 22% of premature deaths (under age 65) are from CVD.

Using data from the Heart Disease UCI Data Set (UCI Machine Learning), we use machine learning techniques to detect the presence or absence of heart disease in the patient according to 14 features provide for this dataset. The different results are compared based on accuracy performance, confusion matrix and area under the Receiver Operating Characteristics (ROC) …


Characterizing Dryland Ecosystems Using Remote Sensing And Dynamic Global Vegetation Modeling, Abdolhamid Dashtiahangar Dec 2019

Characterizing Dryland Ecosystems Using Remote Sensing And Dynamic Global Vegetation Modeling, Abdolhamid Dashtiahangar

Boise State University Theses and Dissertations

Drylands include all terrestrial regions where the production of crops, forage, wood and other ecosystem services are limited by water. These ecosystems cover approximately 40% of the earth terrestrial surface and accommodate more than 2 billion people (Millennium Ecosystem Assessment, 2005). Moreover, the interannual variability of the global carbon budget is strongly regulated by vegetation dynamics in drylands. Understanding the dynamics of such ecosystems is significant for assessing the potential for and impacts of natural or anthropogenic disturbances and mitigation planning, and a necessary step toward enhancing the economic and social well-being of dryland communities in a sustainable manner (Global …


Convergence Rates For Empirical Estimation Of Binary Classification Bounds, Salimeh Yasaei Sekeh, Morteza Noshad, Kevin R. Moon, Alfred O. Hero Nov 2019

Convergence Rates For Empirical Estimation Of Binary Classification Bounds, Salimeh Yasaei Sekeh, Morteza Noshad, Kevin R. Moon, Alfred O. Hero

Mathematics and Statistics Faculty Publications

Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze–Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman–Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is …


Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Nov 2019

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

CSE Conference and Workshop Papers

Includes framing, overview, and discussion of the explorations pursued as part of the Digital Libraries, Intelligent Data Analytics, and Augmented Description demonstration project, pursued by members of the Aida digital libraries research team at the University of Nebraska-Lincoln through a research services contract with the Library of Congress. This presentation covered: Aida research team and background for the demonstration project; broad outlines of “Digital Libraries, Intelligent Data Analytics, and Augmented Description”; what changed for us as a research team over the collaboration and why; deliverables of our work; thoughts toward “What next”; and deep-dives into the explorations. The machine learning …


Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter Nov 2019

Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter

USF Tampa Graduate Theses and Dissertations

Ensemble methods are commonly used for building predictive models for classification. Models that are unstable to perturbations in the training set, such as the decision tree, often see considerable reductions in error when grouped, using bootstrapped resamples of the training data to train many models. The non-parametric bootstrap, however, has limited efficacy when used on severely imbalanced data, especially when the number of observations of one or more classes is exceptionally small. We explore the fractional random weighted bootstrap, which randomly assigns fractional weights to observations, as an alternative resampling pro cedure in training machine learning ensembles, particularly decision tree …


Classifying Fiction And Non-Fiction Works Using Machine Learning, Rachna Gupta '21 Oct 2019

Classifying Fiction And Non-Fiction Works Using Machine Learning, Rachna Gupta '21

Student Publications & Research

The objective of this project was to create a program that can determine whether an unknown text is a work of fiction or non-fiction using machine learning. Various datasets of speeches, ebooks, poems, scientific papers, and texts from Project Gutenberg and the Wolfram Example Data were utilized to train and test a Markov Chain machine learning model. A microsite was deployed with the final product that returns a probability of fictionality based on input from the user with 95% accuracy.


Multimodal Emotion Recognition Using 3d Facial Landmarks, Action Units, And Physiological Data, Diego Fabiano Oct 2019

Multimodal Emotion Recognition Using 3d Facial Landmarks, Action Units, And Physiological Data, Diego Fabiano

USF Tampa Graduate Theses and Dissertations

To fully understand the complexities of human emotion, the integration of multiple physical features from different modalities can be advantageous. Considering this, this thesis presents an approach to emotion recognition using handcrafted features that consist of 3D facial data, action units, and physiological data. Each modality independently, as well as the combination of each for recognizing human emotion were analyzed.

This analysis includes the use of principal component analysis to determine which dimensions of the feature vector are most important for emotion recognition. The proposed features are shown to be able to be used to accurately recognize emotion and that …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Regional Scale Dryland Vegetation Classification With An Integrated Lidar-Hyperspectral Approach, Hamid Dashti, Andrew Poley, Nancy Glenn, Nayani Ilangakoon, Lucas Spaete, Dar Roberts, Et. Al. Sep 2019

Regional Scale Dryland Vegetation Classification With An Integrated Lidar-Hyperspectral Approach, Hamid Dashti, Andrew Poley, Nancy Glenn, Nayani Ilangakoon, Lucas Spaete, Dar Roberts, Et. Al.

Michigan Tech Publications

The sparse canopy cover and large contribution of bright background soil, along with the heterogeneous vegetation types in close proximity, are common challenges for mapping dryland vegetation with remote sensing. Consequently, the results of a single classification algorithm or one type of sensor to characterize dryland vegetation typically show low accuracy and lack robustness. In our study, we improved classification accuracy in a semi-arid ecosystem based on the use of vegetation optical (hyperspectral) and structural (lidar) information combined with the environmental characteristics of the landscape. To accomplish this goal, we used both spectral angle mapper (SAM) and multiple endmember spectral …


Regional Scale Dryland Vegetation Classification With An Integrated Lidar-Hyperspectral Approach, Hamid Dashti, Nancy F. Glenn, Nayani Ilangakoon, Josh Enterkine, Alejandro N. Flores Sep 2019

Regional Scale Dryland Vegetation Classification With An Integrated Lidar-Hyperspectral Approach, Hamid Dashti, Nancy F. Glenn, Nayani Ilangakoon, Josh Enterkine, Alejandro N. Flores

Geosciences Faculty Publications and Presentations

The sparse canopy cover and large contribution of bright background soil, along with the heterogeneous vegetation types in close proximity, are common challenges for mapping dryland vegetation with remote sensing. Consequently, the results of a single classification algorithm or one type of sensor to characterize dryland vegetation typically show low accuracy and lack robustness. In our study, we improved classification accuracy in a semi-arid ecosystem based on the use of vegetation optical (hyperspectral) and structural (lidar) information combined with the environmental characteristics of the landscape. To accomplish this goal, we used both spectral angle mapper (SAM) and multiple endmember spectral …


The Application Of Synthetic Signals For Ecg Beat Classification, Elliot Morgan Brown Sep 2019

The Application Of Synthetic Signals For Ecg Beat Classification, Elliot Morgan Brown

Theses and Dissertations

A brief overview of electrocardiogram (ECG) properties and the characteristics of various cardiac conditions is given. Two different models are used to generate synthetic ECG signals. Domain knowledge is used to create synthetic examples of 16 different heart beat types with these models. Other techniques for synthesizing ECG signals are explored. Various machine learning models with different combinations of real and synthetic data are used to classify individual heart beats. The performance of the different methods and models are compared, and synthetic data is shown to be useful in beat classification.


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Learnfca: A Fuzzy Fca And Probability Based Approach For Learning And Classification, Suraj Ketan Samal Aug 2019

Learnfca: A Fuzzy Fca And Probability Based Approach For Learning And Classification, Suraj Ketan Samal

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.

This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide …


Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo Aug 2019

Classification With Measurement Error In Covariates Or Response, With Application To Prostate Cancer Imaging Study, Kexin Luo

Electronic Thesis and Dissertation Repository

The research is motivated by the prostate cancer imaging study conducted at the University of Western Ontario to classify cancer status using multiple in-vivo images. The prostate cancer histological image and the in-vivo images are subject to misalignment in the co-registration procedure, which can be viewed as measurement error in covariates or response. We investigate methods to correct this problem.

The first proposed method corrects the predicted class probability when the data has misclassified labels. The correction equation is derived from the relationship between the true response and the error-prone response. The probability for the observed class label is adjusted …


Classification Of Isometry Algebras Of Solutions Of Einstein's Field Equations, Eugene Hwang Aug 2019

Classification Of Isometry Algebras Of Solutions Of Einstein's Field Equations, Eugene Hwang

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Since Schwarzschild found the first solution of the Einstein’s equations, more than 800 solutions were found. Solutions of Einstein’s equations are classified according to their Lie algebras of isometries and their isotropy subalgebras. Solutions were taken from the USU electronic library of solutions of Einstein’s field equations and the classification used Maple code developed at USU. This classification adds to the data contained in the library of solutions and provides additional tools for addressing the equivalence problem for solutions to the Einstein field equations. In this thesis, homogeneous spacetimes, hypersurface-homogeneous spacetimes, Robinson-Trautman solutions, and some famous black hole solutions have …


A Machine Learning Approach To Predicting Community Engagement On Social Media During Disasters, Adel Alshehri Jul 2019

A Machine Learning Approach To Predicting Community Engagement On Social Media During Disasters, Adel Alshehri

USF Tampa Graduate Theses and Dissertations

The use of social media is expanding significantly and can serve a variety of purposes. Over the last few years, users of social media have played an increasing role in the dissemination of emergency and disaster information. It is becoming more common for affected populations and other stakeholders to turn to Twitter to gather information about a crisis when decisions need to be made, and action is taken. However, social media platforms, especially on Twitter, presents some drawbacks when it comes to gathering information during disasters. These drawbacks include information overload, messages are written in an informal format, the presence …


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko Jun 2019

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in …


Coral Reef Change Detection In Remote Pacific Islands Using Support Vector Machine Classifiers, Justin J. Gapper, Hesham El-Askary, Erik Linstead, Thomas Piechota Jun 2019

Coral Reef Change Detection In Remote Pacific Islands Using Support Vector Machine Classifiers, Justin J. Gapper, Hesham El-Askary, Erik Linstead, Thomas Piechota

Mathematics, Physics, and Computer Science Faculty Articles and Research

Despite the abundance of research on coral reef change detection, few studies have been conducted to assess the spatial generalization principles of a live coral cover classifier trained using remote sensing data from multiple locations. The aim of this study is to develop a machine learning classifier for coral dominated benthic cover-type class (CDBCTC) based on ground truth observations and Landsat images, evaluate the performance of this classifier when tested against new data, then deploy the classifier to perform CDBCTC change analysis of multiple locations. The proposed framework includes image calibration, support vector machine (SVM) training and tuning, statistical assessment …


An Adaptive Weighted Average (Wav) Reprojection Algorithm For Image Denoising, Halimah Alsurayhi May 2019

An Adaptive Weighted Average (Wav) Reprojection Algorithm For Image Denoising, Halimah Alsurayhi

Electronic Thesis and Dissertation Repository

Patch-based denoising algorithms have an effective improvement in the image denoising domain. The Non-Local Means (NLM) algorithm is the most popular patch-based spatial domain denoising algorithm. Many variants of the NLM algorithm have proposed to improve its performance. Weighted Average (WAV) reprojection algorithm is one of the most effective improvements of the NLM denoising algorithm. Contrary to the NLM algorithm, all the pixels in the patch contribute into the averaging process in the WAV reprojection algorithm, which enhances the denoising performance. The key parameters in the WAV reprojection algorithm are kept fixed regardless of the image structure. In this thesis, …


Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison May 2019

Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison

Computational and Data Sciences (MS) Theses

The understanding and treatment of challenging behaviors in individuals with Autism Spectrum Disorder is paramount to enabling the success of behavioral therapy; an essential step in this process being the labeling of challenging behaviors demonstrated in therapy sessions. These manifestations differ across individuals and within individuals over time and thus, the appropriate classification of a challenging behavior when considering purely qualitative factors can be unclear. In this thesis we seek to add quantitative depth to this otherwise qualitative task of challenging behavior classification. We do so through the application of natural language processing techniques to behavioral descriptions extracted from the …


Human Activity Recognition Based On Multimodal Body Sensing, Anish Hemant Narkhede May 2019

Human Activity Recognition Based On Multimodal Body Sensing, Anish Hemant Narkhede

Master's Projects

In the recent years, human activity recognition has been widely popularized by a lot of smartphone manufacturers and fitness tracking companies. It has allowed us to gain a deeper insight into our physical health on a daily basis. However, with the evolution of fitness tracking devices and smartphones, the amount of data that is being captured by these devices is growing exponentially. This paper aims at understanding the process of dimensionality reduction such as PCA so that the data can be used to make meaningful predictions along with novel techniques using autoencoders with different activation functions. The paper also looks …


Toward On-Demand Profile Hidden Markov Models For Genetic Barcode Identification, Jessica Sheu May 2019

Toward On-Demand Profile Hidden Markov Models For Genetic Barcode Identification, Jessica Sheu

Master's Projects

Genetic identification aims to solve the shortcomings of morphological identification. By using the cytochrome c oxidase subunit 1 (COI) gene as the Eukaryotic “barcode,” scientists hope to research species that may be morphologically ambiguous, elusive, or similarly difficult to visually identify. Current COI databases allow users to search only for existing database records. However, as the number of sequenced, potential COI genes increases, COI identification tools should ideally also be informative of novel, previously unreported sequences that may represent new species. If an unknown COI sequence does not represent a reported organism, an ideal identification tool would report taxonomic ranks …


Species Classification Using Dna Barcoding And Profile Hidden Markov Models, Sphoorti Poojary May 2019

Species Classification Using Dna Barcoding And Profile Hidden Markov Models, Sphoorti Poojary

Master's Projects

Traditional classification systems for living organisms like the Linnaean taxonomy involved classification based on morphological features of species. This traditional system is being replaced by molecular approaches which involve using gene sequences. The COI gene, also known as the ”DNA barcode” since it is unique in every species, can be used to uniquely identify organisms and thus, classify them. Classifying using gene sequences has many advantages, including correct identification of cryptic species(individuals which appear similar but belong to different species) and species which are extremely small in size. In this project, I worked on classifying COI sequences of unknown species …


Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi May 2019

Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi

SMU Data Science Review

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …


Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little May 2019

Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little

Honors Projects

Isolation-Based Scene Generation (IBSG) is a process for creating synthetic datasets made to train machine learning detectors and classifiers. In this project, we formalize the IBSG process and describe the scenarios—object detection and object classification given audio or image input—in which it can be useful. We then look at the Stanford Street View House Number (SVHN) dataset and build several different IBSG training datasets based on existing SVHN data. We try to improve the compositing algorithm used to build the IBSG dataset so that models trained with synthetic data perform as well as models trained with the original SVHN training …


Classification Of Vegetation In Aerial Imagery Via Neural Network, Gevand Balayan May 2019

Classification Of Vegetation In Aerial Imagery Via Neural Network, Gevand Balayan

UNLV Theses, Dissertations, Professional Papers, and Capstones

This thesis focuses on the task of trying to find a Neural Network that is best suited for identifying vegetation from aerial imagery. The goal is to find a way to quickly classify items in an image as highly likely to be vegetation(trees, grass, bushes and shrubs) and then interpolate that data and use it to mark sections of an image as vegetation. This has practical applications as well. The main motivation of this work came from the effort that our town takes in conserving water. By creating an AI that can easily recognize plants, we can better monitor the …


Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman May 2019

Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman

Journal Articles

Watershed technique from mathematical morphology (MM) is one of the most widely used operators for image segmentation. Recently watersheds are adapted to edge weighted graphs, allowing for wider applicability. However, a few questions remain to be answered - How do the boundaries of the watershed operator behave? Which loss function does the watershed operator optimize? How does watershed operator relate with existing ideas from machine learning. In this letter, a framework is developed, which allows one to answer these questions. This is achieved by generalizing the maximum margin principle to maximum margin partition and proposing a generic solution, morphMedian, resulting …


A Comparison Of Machine Learning Techniques For Taxonomic Classification Of Teeth From The Family Bovidae, Gregory J. Matthews, Juliet K. Brophy, Maxwell Luetkemeier, Hongie Gu, George K. Thiruvathukal Apr 2019

A Comparison Of Machine Learning Techniques For Taxonomic Classification Of Teeth From The Family Bovidae, Gregory J. Matthews, Juliet K. Brophy, Maxwell Luetkemeier, Hongie Gu, George K. Thiruvathukal

George K. Thiruvathukal

This study explores the performance of machine learning algorithms on the classification of fossil teeth in the Family Bovidae. Isolated bovid teeth are typically the most common fossils found in southern Africa and they often constitute the basis for paleoenvironmental reconstructions. Taxonomic identification of fossil bovid teeth, however, is often imprecise and subjective. Using modern teeth with known taxons, machine learning algorithms can be trained to classify fossils. Previous work by Brophy et al. [Quantitative morphological analysis of bovid teeth and implications for paleoenvironmental reconstruction of plovers lake, Gauteng Province, South Africa, J. Archaeol. Sci. 41 (2014), pp. …