Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

2019

Classification

Institution
Publication
Publication Type
File Type

Articles 1 - 30 of 38

Full-Text Articles in Physical Sciences and Mathematics

A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater Dec 2019

A Data Science Approach To Defining A Data Scientist, Andy Ho, An Nguyen, Jodi L. Pafford, Robert Slater

SMU Data Science Review

In this paper, we present a common definition and list of skills for a Data Scientist using online job postings. The overlap and ambiguity of various roles such as data scientist, data engineer, data analyst, software engineer, database administrator, and statistician motivate the problem. To arrive at a single Data Scientist definition, we collect over 8,000 job postings from Indeed.com for the six job titles. Each corpus contains text on job qualifications, skills, responsibilities, educational preferences, and requirements. Our data science methodology and analysis rendered the single definition of a data scientist: A data scientist codes, collaborates, and communicates – …


Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur Dec 2019

Detecting Myocardial Infarctions Using Machine Learning Methods, Aniruddh Mathur

Master's Projects

Myocardial Infarction (MI), commonly known as a heart attack, occurs when one of the three major blood vessels carrying blood to the heart get blocked, causing the death of myocardial (heart) cells. If not treated immediately, MI may cause cardiac arrest, which can ultimately cause death. Risk factors for MI include diabetes, family history, unhealthy diet and lifestyle. Medical treatments include various types of drugs and surgeries which can prove very expensive for patients due to high healthcare costs. Therefore, it is imperative that MI is diagnosed at the right time. Electrocardiography (ECG) is commonly used to detect MI. ECG …


Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira Dec 2019

Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira

Dissertations

Cardiovascular disease (CVD) is the most common cause of death in Ireland, and probably, worldwide. According to the Health Service Executive (HSE) cardiovascular disease accounting for 36% of all deaths, and one important fact, 22% of premature deaths (under age 65) are from CVD.

Using data from the Heart Disease UCI Data Set (UCI Machine Learning), we use machine learning techniques to detect the presence or absence of heart disease in the patient according to 14 features provide for this dataset. The different results are compared based on accuracy performance, confusion matrix and area under the Receiver Operating Characteristics (ROC) …


Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack Nov 2019

Virtual Wrap-Up Presentation: Digital Libraries, Intelligent Data Analytics, And Augmented Description, Elizabeth Lorang, Leen-Kiat Soh, Yi Liu, Chulwoo Pack

CSE Conference and Workshop Papers

Includes framing, overview, and discussion of the explorations pursued as part of the Digital Libraries, Intelligent Data Analytics, and Augmented Description demonstration project, pursued by members of the Aida digital libraries research team at the University of Nebraska-Lincoln through a research services contract with the Library of Congress. This presentation covered: Aida research team and background for the demonstration project; broad outlines of “Digital Libraries, Intelligent Data Analytics, and Augmented Description”; what changed for us as a research team over the collaboration and why; deliverables of our work; thoughts toward “What next”; and deep-dives into the explorations. The machine learning …


Classifying Fiction And Non-Fiction Works Using Machine Learning, Rachna Gupta '21 Oct 2019

Classifying Fiction And Non-Fiction Works Using Machine Learning, Rachna Gupta '21

Student Publications & Research

The objective of this project was to create a program that can determine whether an unknown text is a work of fiction or non-fiction using machine learning. Various datasets of speeches, ebooks, poems, scientific papers, and texts from Project Gutenberg and the Wolfram Example Data were utilized to train and test a Markov Chain machine learning model. A microsite was deployed with the final product that returns a probability of fictionality based on input from the user with 95% accuracy.


Multimodal Emotion Recognition Using 3d Facial Landmarks, Action Units, And Physiological Data, Diego Fabiano Oct 2019

Multimodal Emotion Recognition Using 3d Facial Landmarks, Action Units, And Physiological Data, Diego Fabiano

USF Tampa Graduate Theses and Dissertations

To fully understand the complexities of human emotion, the integration of multiple physical features from different modalities can be advantageous. Considering this, this thesis presents an approach to emotion recognition using handcrafted features that consist of 3D facial data, action units, and physiological data. Each modality independently, as well as the combination of each for recognizing human emotion were analyzed.

This analysis includes the use of principal component analysis to determine which dimensions of the feature vector are most important for emotion recognition. The proposed features are shown to be able to be used to accurately recognize emotion and that …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Learnfca: A Fuzzy Fca And Probability Based Approach For Learning And Classification, Suraj Ketan Samal Aug 2019

Learnfca: A Fuzzy Fca And Probability Based Approach For Learning And Classification, Suraj Ketan Samal

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Formal concept analysis(FCA) is a mathematical theory based on lattice and order theory used for data analysis and knowledge representation. Over the past several years, many of its extensions have been proposed and applied in several domains including data mining, machine learning, knowledge management, semantic web, software development, chemistry ,biology, medicine, data analytics, biology and ontology engineering.

This thesis reviews the state-of-the-art of theory of Formal Concept Analysis(FCA) and its various extensions that have been developed and well-studied in the past several years. We discuss their historical roots, reproduce the original definitions and derivations with illustrative examples. Further, we provide …


A Machine Learning Approach To Predicting Community Engagement On Social Media During Disasters, Adel Alshehri Jul 2019

A Machine Learning Approach To Predicting Community Engagement On Social Media During Disasters, Adel Alshehri

USF Tampa Graduate Theses and Dissertations

The use of social media is expanding significantly and can serve a variety of purposes. Over the last few years, users of social media have played an increasing role in the dissemination of emergency and disaster information. It is becoming more common for affected populations and other stakeholders to turn to Twitter to gather information about a crisis when decisions need to be made, and action is taken. However, social media platforms, especially on Twitter, presents some drawbacks when it comes to gathering information during disasters. These drawbacks include information overload, messages are written in an informal format, the presence …


Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko Jun 2019

Discovery Of Topological Constraints On Spatial Object Classes Using A Refined Topological Model, Ivan Majic, Elham Naghizade, Stephan Winter, Martin Tomko

Journal of Spatial Information Science

In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in …


An Adaptive Weighted Average (Wav) Reprojection Algorithm For Image Denoising, Halimah Alsurayhi May 2019

An Adaptive Weighted Average (Wav) Reprojection Algorithm For Image Denoising, Halimah Alsurayhi

Electronic Thesis and Dissertation Repository

Patch-based denoising algorithms have an effective improvement in the image denoising domain. The Non-Local Means (NLM) algorithm is the most popular patch-based spatial domain denoising algorithm. Many variants of the NLM algorithm have proposed to improve its performance. Weighted Average (WAV) reprojection algorithm is one of the most effective improvements of the NLM denoising algorithm. Contrary to the NLM algorithm, all the pixels in the patch contribute into the averaging process in the WAV reprojection algorithm, which enhances the denoising performance. The key parameters in the WAV reprojection algorithm are kept fixed regardless of the image structure. In this thesis, …


Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison May 2019

Classifying Challenging Behaviors In Autism Spectrum Disorder With Neural Document Embeddings, Abigail Atchison

Computational and Data Sciences (MS) Theses

The understanding and treatment of challenging behaviors in individuals with Autism Spectrum Disorder is paramount to enabling the success of behavioral therapy; an essential step in this process being the labeling of challenging behaviors demonstrated in therapy sessions. These manifestations differ across individuals and within individuals over time and thus, the appropriate classification of a challenging behavior when considering purely qualitative factors can be unclear. In this thesis we seek to add quantitative depth to this otherwise qualitative task of challenging behavior classification. We do so through the application of natural language processing techniques to behavioral descriptions extracted from the …


Human Activity Recognition Based On Multimodal Body Sensing, Anish Hemant Narkhede May 2019

Human Activity Recognition Based On Multimodal Body Sensing, Anish Hemant Narkhede

Master's Projects

In the recent years, human activity recognition has been widely popularized by a lot of smartphone manufacturers and fitness tracking companies. It has allowed us to gain a deeper insight into our physical health on a daily basis. However, with the evolution of fitness tracking devices and smartphones, the amount of data that is being captured by these devices is growing exponentially. This paper aims at understanding the process of dimensionality reduction such as PCA so that the data can be used to make meaningful predictions along with novel techniques using autoencoders with different activation functions. The paper also looks …


Toward On-Demand Profile Hidden Markov Models For Genetic Barcode Identification, Jessica Sheu May 2019

Toward On-Demand Profile Hidden Markov Models For Genetic Barcode Identification, Jessica Sheu

Master's Projects

Genetic identification aims to solve the shortcomings of morphological identification. By using the cytochrome c oxidase subunit 1 (COI) gene as the Eukaryotic “barcode,” scientists hope to research species that may be morphologically ambiguous, elusive, or similarly difficult to visually identify. Current COI databases allow users to search only for existing database records. However, as the number of sequenced, potential COI genes increases, COI identification tools should ideally also be informative of novel, previously unreported sequences that may represent new species. If an unknown COI sequence does not represent a reported organism, an ideal identification tool would report taxonomic ranks …


Species Classification Using Dna Barcoding And Profile Hidden Markov Models, Sphoorti Poojary May 2019

Species Classification Using Dna Barcoding And Profile Hidden Markov Models, Sphoorti Poojary

Master's Projects

Traditional classification systems for living organisms like the Linnaean taxonomy involved classification based on morphological features of species. This traditional system is being replaced by molecular approaches which involve using gene sequences. The COI gene, also known as the ”DNA barcode” since it is unique in every species, can be used to uniquely identify organisms and thus, classify them. Classifying using gene sequences has many advantages, including correct identification of cryptic species(individuals which appear similar but belong to different species) and species which are extremely small in size. In this project, I worked on classifying COI sequences of unknown species …


Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi May 2019

Machine Learning Pipeline For Exoplanet Classification, George Clayton Sturrock, Brychan Manry, Sohail Rafiqi

SMU Data Science Review

Planet identification has typically been a tasked performed exclusively by teams of astronomers and astrophysicists using methods and tools accessible only to those with years of academic education and training. NASA’s Exoplanet Exploration program has introduced modern satellites capable of capturing a vast array of data regarding celestial objects of interest to assist with researching these objects. The availability of satellite data has opened up the task of planet identification to individuals capable of writing and interpreting machine learning models. In this study, several classification models and datasets are utilized to assign a probability of an observation being an exoplanet. …


Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little May 2019

Teaching Computers To Teach Themselves: Synthesizing Training Data Based On Human-Perceived Elements, James Little

Honors Projects

Isolation-Based Scene Generation (IBSG) is a process for creating synthetic datasets made to train machine learning detectors and classifiers. In this project, we formalize the IBSG process and describe the scenarios—object detection and object classification given audio or image input—in which it can be useful. We then look at the Stanford Street View House Number (SVHN) dataset and build several different IBSG training datasets based on existing SVHN data. We try to improve the compositing algorithm used to build the IBSG dataset so that models trained with synthetic data perform as well as models trained with the original SVHN training …


Classification Of Vegetation In Aerial Imagery Via Neural Network, Gevand Balayan May 2019

Classification Of Vegetation In Aerial Imagery Via Neural Network, Gevand Balayan

UNLV Theses, Dissertations, Professional Papers, and Capstones

This thesis focuses on the task of trying to find a Neural Network that is best suited for identifying vegetation from aerial imagery. The goal is to find a way to quickly classify items in an image as highly likely to be vegetation(trees, grass, bushes and shrubs) and then interpolate that data and use it to mark sections of an image as vegetation. This has practical applications as well. The main motivation of this work came from the effort that our town takes in conserving water. By creating an AI that can easily recognize plants, we can better monitor the …


Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman May 2019

Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman

Journal Articles

Watershed technique from mathematical morphology (MM) is one of the most widely used operators for image segmentation. Recently watersheds are adapted to edge weighted graphs, allowing for wider applicability. However, a few questions remain to be answered - How do the boundaries of the watershed operator behave? Which loss function does the watershed operator optimize? How does watershed operator relate with existing ideas from machine learning. In this letter, a framework is developed, which allows one to answer these questions. This is achieved by generalizing the maximum margin principle to maximum margin partition and proposing a generic solution, morphMedian, resulting …


Generating Classification Rules From Training Samples, Arun D. Kulkarni Mar 2019

Generating Classification Rules From Training Samples, Arun D. Kulkarni

Arun Kulkarni

In this paper, we describe an algorithm to extract classification rules from training samples using fuzzy membership functions. The algorithm includes steps for generating classification rules, eliminating duplicate and conflicting rules, and ranking extracted rules. We have developed software to implement the algorithm using MATLAB scripts. As an illustration, we have used the algorithm to classify pixels in two multispectral images representing areas in New Orleans and Alaska. For each scene, we randomly selected 10 per cent of the samples from our training set data for generating an optimized rule set and used the remaining 90 per cent of samples …


Multiple-Attribute Entity Recommendation Based On Classification, Meina Song, Xuejun Zhao, Haihong E Jan 2019

Multiple-Attribute Entity Recommendation Based On Classification, Meina Song, Xuejun Zhao, Haihong E

Journal of System Simulation

Abstract: In the process of exploring entity recommendation, the entity containing diverse attributes has gained more and more attention. Most of the current researchers mainly select one attribute, and embody it in the related algorithms and their extensions even though the entity is combined with multiple attributes in entity recommendation. In this paper, on the basis of the classification method, we delve into physical properties of the recommended entities, divide entity’s attribute information network into multiple sub ones. In sub information network, bounded by the amount of attributes, the single attribute and even multiple attributes can be diverted into diverse …


Transfer Learning For Detecting Unknown Network Attacks, Juan Zhao, Sachin Shetty, Jan Wei Pan, Charles Kamhoua, Kevin Kwiat Jan 2019

Transfer Learning For Detecting Unknown Network Attacks, Juan Zhao, Sachin Shetty, Jan Wei Pan, Charles Kamhoua, Kevin Kwiat

VMASC Publications

Network attacks are serious concerns in today’s increasingly interconnected society. Recent studies have applied conventional machine learning to network attack detection by learning the patterns of the network behaviors and training a classification model. These models usually require large labeled datasets; however, the rapid pace and unpredictability of cyber attacks make this labeling impossible in real time. To address these problems, we proposed utilizing transfer learning for detecting new and unseen attacks by transferring the knowledge of the known attacks. In our previous work, we have proposed a transfer learning-enabled framework and approach, called HeTL, which can find the common …


Computer-Aided Classification Of Impulse Oscillometric Measures Of Respiratory Small Airways Function In Children, Nancy Selene Avila Jan 2019

Computer-Aided Classification Of Impulse Oscillometric Measures Of Respiratory Small Airways Function In Children, Nancy Selene Avila

Open Access Theses & Dissertations

Computer-aided classification of respiratory small airways dysfunction is not an easy task. There is a need to develop more robust classifiers, specifically for children as the classification studies performed to date have the following limitations: 1) they include features derived from tests that are not suitable for children and 2) they cannot distinguish between mild and severe small airway dysfunction.

This Dissertation describes the classification algorithms with high discriminative capacity to distinguish different levels of respiratory small airways function in children (Asthma, Small Airways Impairment, Possible Small Airways Impairment, and Normal lung function). This ability came from innovative feature selection, …


Classification Of Generic System Dynamics Model Outputs Via Supervised Time Series Pattern Discovery, Mert Edali, Mustafa Gökçe Baydoğan, Gönenç Yücel Jan 2019

Classification Of Generic System Dynamics Model Outputs Via Supervised Time Series Pattern Discovery, Mert Edali, Mustafa Gökçe Baydoğan, Gönenç Yücel

Turkish Journal of Electrical Engineering and Computer Sciences

System dynamics (SD) is a simulation-based approach for analyzing feedback-rich systems. An ideal SD modeling cycle requires evaluating the qualitative pattern characteristics of a large set of time series model output for testing, validation, scenario analysis, and policy analysis purposes. This traditionally requires expert judgement, which limits the extent of experimentation due to time constraints. Although time series recognition approaches can help to automate such an evaluation, utilization of them has been limited to a hidden Markov model classifier, namely the Indirect Structure Testing Software (ISTS) algorithm. Despite being used within several automated model-analysis tools, ISTS has several shortcomings. In …


Polyhedral Conic Kernel-Like Functions For Svms, Gürkan Öztürk, Emre Çi̇men Jan 2019

Polyhedral Conic Kernel-Like Functions For Svms, Gürkan Öztürk, Emre Çi̇men

Turkish Journal of Electrical Engineering and Computer Sciences

In this study, we propose a new approach that can be used as a kernel-like function for support vector machines (SVMs) in order to get nonlinear classification surfaces. We combined polyhedral conic functions (PCFs) with the SVM method. To get nonlinear classification surfaces, kernel functions are used with SVMs. However, the parameter selection of the kernel function affects the classification accuracy. Generally, in order to get successful classifiers which can predict unknown data accurately, best parameters are explored with the grid search method which is computationally expensive. We solved this problem with the proposed method. There is no need to …


A Novel Hybrid Teaching-Learning-Based Optimization Algorithm For The Classification Of Data By Using Extreme Learning Machines, Ender Sevi̇nç, Tansel Dökeroğlu Jan 2019

A Novel Hybrid Teaching-Learning-Based Optimization Algorithm For The Classification Of Data By Using Extreme Learning Machines, Ender Sevi̇nç, Tansel Dökeroğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Data classification is the process of organizing data by relevant categories. In this way, the data can be understood and used more efficiently by scientists. Numerous studies have been proposed in the literature for the problem of data classification. However, with recently introduced metaheuristics, it has continued to be riveting to revisit this classical problem and investigate the efficiency of new techniques. Teaching-learning-based optimization (TLBO) is a recent metaheuristic that has been reported to be very effective for combinatorial optimization problems. In this study, we propose a novel hybrid TLBO algorithm with extreme learning machines (ELM) for the solution of …


Classification Of The Likelihood Of Colon Cancer With Machine Learning Techniques Using Ftir Signals Obtained From Plasma, Suat Toraman, Mustafa Gi̇rgi̇n, Bi̇lal Üstündağ, İbrahi̇m Türkoğlu Jan 2019

Classification Of The Likelihood Of Colon Cancer With Machine Learning Techniques Using Ftir Signals Obtained From Plasma, Suat Toraman, Mustafa Gi̇rgi̇n, Bi̇lal Üstündağ, İbrahi̇m Türkoğlu

Turkish Journal of Electrical Engineering and Computer Sciences

Colon cancer is one of the major causes of human mortality worldwide and the same can be said for Turkey. Various methods are used for the determination of cancer. One of these methods is Fourier transform infrared (FTIR) spectroscopy, which has the ability to reveal biochemical changes. The most common features used to distinguish patients with cancer and healthy subjects are peak densities, peak height ratios, and peak area ratios. The greatest challenge of studies conducted to distinguish cancer patients from healthy subjects using FTIR signals is that the signals of cancer patients and healthy subjects are similar. In the …


Extracting Accent Information From Urdu Speech For Forensic Speaker Recognition, Falak Tahir, Sajid Saleem, Ayaz Ahmad Jan 2019

Extracting Accent Information From Urdu Speech For Forensic Speaker Recognition, Falak Tahir, Sajid Saleem, Ayaz Ahmad

Turkish Journal of Electrical Engineering and Computer Sciences

This paper presents a new method for extraction of accent information from Urdu speech signals. Accent is used in speaker recognition system especially in forensic cases and plays a vital role in discriminating people of different groups, communities and origins due to their different speaking styles. The proposed method is based on Gaussian mixture model-universal background model (GMM-UBM), mel-frequency cepstral coefficients (MFCC), and a data augmentation (DA) process. The DA process appends features to base MFCC features and improves the accent extraction and forensic speaker recognition performances of GMM-UBM. Experiments are performed on an Urdu forensic speaker corpus. The experimental …


Deep Neural Network Learning-Based Classifier Design For Big-Data Analytics, Krishnan Raghavan Jan 2019

Deep Neural Network Learning-Based Classifier Design For Big-Data Analytics, Krishnan Raghavan

Doctoral Dissertations

"In this digital age, big-data sets are commonly found in the field of healthcare, manufacturing and others where sustainable analysis is necessary to create useful information. Big-data sets are often characterized by high-dimensionality and massive sample size. High dimensionality refers to the presence of unwanted dimensions in the data where challenges such as noise, spurious correlation and incidental endogeneity are observed. Massive sample size, on the other hand, introduces the problem of heterogeneity because complex and unstructured data types must analyzed. To mitigate the impact of these challenges while considering the application of classification, a two step analysis approach is …