Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 60 of 74

Full-Text Articles in Physical Sciences and Mathematics

Semantic, Integrated Keyword Search Over Structured And Loosely Structured Databases, Xinge Lu Dec 2020

Semantic, Integrated Keyword Search Over Structured And Loosely Structured Databases, Xinge Lu

Dissertations

Keyword search has been seen in recent years as an attractive way for querying data with some form of structure. Indeed, it allows simple users to extract information from databases without mastering a complex structured query language and without having knowledge of the schema of the data. It also allows for integrated search of heterogeneous data sources. However, as keyword queries are ambiguous and not expressive enough, keyword search cannot scale satisfactorily on big datasets and the answers are, in general, of low accuracy. Therefore, flat keyword search alone cannot efficiently return high quality results on large data with structure. …


Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou Aug 2020

Hybrid Deep Neural Networks For Mining Heterogeneous Data, Xiurui Hou

Dissertations

In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.

The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and …


Energy And Performance-Optimized Scheduling Of Tasks In Distributed Cloud And Edge Computing Systems, Haitao Yuan Aug 2020

Energy And Performance-Optimized Scheduling Of Tasks In Distributed Cloud And Edge Computing Systems, Haitao Yuan

Dissertations

Infrastructure resources in distributed cloud data centers (CDCs) are shared by heterogeneous applications in a high-performance and cost-effective way. Edge computing has emerged as a new paradigm to provide access to computing capacities in end devices. Yet it suffers from such problems as load imbalance, long scheduling time, and limited power of its edge nodes. Therefore, intelligent task scheduling in CDCs and edge nodes is critically important to construct energy-efficient cloud and edge computing systems. Current approaches cannot smartly minimize the total cost of CDCs, maximize their profit and improve quality of service (QoS) of tasks because of aperiodic arrival …


Changing The Focus: Worker-Centric Optimization In Human-In-The-Loop Computations, Mohammadreza Esfandiari Aug 2020

Changing The Focus: Worker-Centric Optimization In Human-In-The-Loop Computations, Mohammadreza Esfandiari

Dissertations

A myriad of emerging applications from simple to complex ones involve human cognizance in the computation loop. Using the wisdom of human workers, researchers have solved a variety of problems, termed as “micro-tasks” such as, captcha recognition, sentiment analysis, image categorization, query processing, as well as “complex tasks” that are often collaborative, such as, classifying craters on planetary surfaces, discovering new galaxies (Galaxyzoo), performing text translation. The current view of “humans-in-the-loop” tends to see humans as machines, robots, or low-level agents used or exploited in the service of broader computation goals. This dissertation is developed to shift the focus back …


Towards Practical Homomorphic Encryption And Efficient Implementation, Gyana R. Sahu Aug 2020

Towards Practical Homomorphic Encryption And Efficient Implementation, Gyana R. Sahu

Dissertations

Cloud computing has gained significant traction over the past few years and its application continues to soar as evident from its rapid adoption in various industries. One of the major challenges involved in cloud computing services is the security of sensitive information as cloud servers have been often found to be vulnerable to snooping by malicious adversaries. Such data privacy concerns can be addressed to a greater extent by enforcing cryptographic measures. Fully homomorphic encryption (FHE), a special form of public key encryption has emerged as a primary tool in deploying such cryptographic security assurances without sacrificing many of the …


Image Instance Segmentation: Using The Cirsy System To Identify Small Objects In Low Resolution Images, Orghomisan William Omatsone Jan 2020

Image Instance Segmentation: Using The Cirsy System To Identify Small Objects In Low Resolution Images, Orghomisan William Omatsone

Dissertations

The CIRSY system (or Chick Instance Recognition System) is am image processing system developed as part of this research to detect images of chicks in highly-populated images that uses the leading algorithm in instance segmentation tasks, called the Mask R-CNN. It extends on the Faster R-CNN framework used in object detection tasks, and this extension adds a branch to predict the mask of an object along with the bounding box prediction. Mask R-CNN has proven to be effective ininstance segmentation and object de-tection tasks after outperforming all existing models on evaluation of the Microsoft Common Objects in Context (MS COCO) …


Brain Disease Detection From Eegs: Comparing Spiking And Recurrent Neural Networks For Non-Stationary Time Series Classification, Hristo Stoev Jan 2020

Brain Disease Detection From Eegs: Comparing Spiking And Recurrent Neural Networks For Non-Stationary Time Series Classification, Hristo Stoev

Dissertations

Modeling non-stationary time series data is a difficult problem area in AI, due to the fact that the statistical properties of the data change as the time series progresses. This complicates the classification of non-stationary time series, which is a method used in the detection of brain diseases from EEGs. Various techniques have been developed in the field of deep learning for tackling this problem, with recurrent neural networks (RNN) approaches utilising Long short-term memory (LSTM) architectures achieving a high degree of success. This study implements a new, spiking neural network-based approach to time series classification for the purpose of …


An Evaluation Of Text Representation Techniques For Fake News Detection Using: Tf-Idf, Word Embeddings, Sentence Embeddings With Linear Support Vector Machine., Sangita Sriram Jan 2020

An Evaluation Of Text Representation Techniques For Fake News Detection Using: Tf-Idf, Word Embeddings, Sentence Embeddings With Linear Support Vector Machine., Sangita Sriram

Dissertations

In a world where anybody can share their views, opinions and make it sound like these are facts about the current situation of the world, Fake News poses a huge threat especially to the reputation of people with high stature and to organizations. In the political world, this could lead to opposition parties making use of this opportunity to gain popularity in their elections. In the medical world, a fake scandalous message about a medicine giving side effects, hospital treatment gone wrong or even a false message against a practicing doctor could become a big menace to everyone involved in …


Drug Reviews: Cross-Condition And Cross-Source Analysis By Review Quantification Using Regional Cnn-Lstm Models, Ajith Mathew Thoomkuzhy Jan 2020

Drug Reviews: Cross-Condition And Cross-Source Analysis By Review Quantification Using Regional Cnn-Lstm Models, Ajith Mathew Thoomkuzhy

Dissertations

Pharmaceutical drugs are usually rated by customers or patients (i.e. in a scale from 1 to 10). Often, they also give reviews or comments on the drug and its side effects. It is desirable to quantify the reviews to help analyze drug favorability in the market, in the absence of ratings. Since these reviews are in the form of text, we should use lexical methods for the analysis. The intent of this study was two-fold: First, to understand how better the efficiency will be if CNN-LSTM models are used to predict ratings or sentiment from reviews. These models are known …


Classification Of Animal Sound Using Convolutional Neural Network, Neha Singh Jan 2020

Classification Of Animal Sound Using Convolutional Neural Network, Neha Singh

Dissertations

Recently, labeling of acoustic events has emerged as an active topic covering a wide range of applications. High-level semantic inference can be conducted based on main audioeffects to facilitate various content-based applications for analysis, efficient recovery and content management. This paper proposes a flexible Convolutional neural network-based framework for animal audio classification. The work takes inspiration from various deep neural network developed for multimedia classification recently. The model is driven by the ideology of identifying the animal sound in the audio file by forcing the network to pay attention to core audio effect present in the audio to generate Mel-spectrogram. …


A Comparative Study Of Text Summarization On E-Mail Data Using Unsupervised Learning Approaches, Tijo Thomas Jan 2020

A Comparative Study Of Text Summarization On E-Mail Data Using Unsupervised Learning Approaches, Tijo Thomas

Dissertations

Over the last few years, email has met with enormous popularity. People send and receive a lot of messages every day, connect with colleagues and friends, share files and information. Unfortunately, the email overload outbreak has developed into a personal trouble for users as well as a financial concerns for businesses. Accessing an ever-increasing number of lengthy emails in the present generation has become a major concern for many users. Email text summarization is a promising approach to resolve this challenge. Email messages are general domain text, unstructured and not always well developed syntactically. Such elements introduce challenges for study …


Content-Based Filtering Recommendation Approach To Label Irish Legal Judgements, Sandesh Gangadhar Jan 2020

Content-Based Filtering Recommendation Approach To Label Irish Legal Judgements, Sandesh Gangadhar

Dissertations

Machine learning approaches are applied across several domains to either simplify or automate tasks which directly result in saved time or cost. Text document labelling is one such task that requires immense human knowledge about the domain and efforts to review, understand and label the documents. The company Stare Decisis summarises legal judgements and labels them as they are made available on Irish public legal source www.courts.ie. This research presents a recommendation-based approach to reduce the time for solicitors at Stare Decisis by reducing many numbers of available labels to pick from to a concentrated few that potentially contains the …


Customer Churn Prediction, Deepshikha Wadikar Jan 2020

Customer Churn Prediction, Deepshikha Wadikar

Dissertations

Churned customers identification plays an essential role for the functioning and growth of any business. Identification of churned customers can help the business to know the reasons for the churn and they can plan their market strategies accordingly to enhance the growth of a business. This research is aimed at developing a machine learning model that can precisely predict the churned customers from the total customers of a Credit Union financial institution. A quantitative and deductive research strategies are employed to build a supervised machine learning model that addresses the class imbalance problem handled feature selection and efficiently predict the …


An Examination Of The Smote And Other Smote-Based Techniques That Use Synthetic Data To Oversample The Minority Class In The Context Of Credit-Card Fraud Classification, Eduardo Parkinson De Castro Jan 2020

An Examination Of The Smote And Other Smote-Based Techniques That Use Synthetic Data To Oversample The Minority Class In The Context Of Credit-Card Fraud Classification, Eduardo Parkinson De Castro

Dissertations

This research project seeks to investigate some of the different sampling techniques that generate and use synthetic data to oversample the minority class as a means of handling the imbalanced distribution between non-fraudulent (majority class) and fraudulent (minority class) classes in a credit-card fraud dataset. The purpose of the research project is to assess the effectiveness of these techniques in the context of fraud detection which is a highly imbalanced and cost-sensitive dataset. Machine learning tasks that require learning from datasets that are highly unbalanced have difficulty learning since many of the traditional learning algorithms are not designed to cope …


Machine Learning Assisted Gait Analysis For The Determination Of Handedness In Able-Bodied People, Hugh Gallagher Jan 2020

Machine Learning Assisted Gait Analysis For The Determination Of Handedness In Able-Bodied People, Hugh Gallagher

Dissertations

This study has investigated the potential application of machine learning for video analysis, with a view to creating a system which can determine a person’s hand laterality (handedness) from the way that they walk (their gait). To this end, the convolutional neural network model VGG16 underwent transfer learning in order to classify videos under two ‘activities’: “walking left-handed” and “walking right-handed”. This saw varying degrees of success across five transfer learning trained models: Everything – the entire dataset; FiftyFifty – the dataset with enough right-handed samples removed to produce a set with parity between activities; Female – only the female …


Identifying Online Sexual Predators Using Support Vector Machine, Yifan Li Jan 2020

Identifying Online Sexual Predators Using Support Vector Machine, Yifan Li

Dissertations

A two-stage classification model is built in the research for online sexual predator identification. The first stage identifies the suspicious conversations that have predator participants. The second stage identifies the predators in suspicious conversations. Support vector machines are used with word and character n-grams, combined with behavioural features of the authors to train the final classifier. The unbalanced dataset is downsampled to test the performance of re-balancing an unbalanced dataset. An age group classification model is also constructed to test the feasibility of extracting the age profile of the authors, which can be used as features for classifier training. The …


Transformer Neural Networks For Automated Story Generation, Kemal Araz Jan 2020

Transformer Neural Networks For Automated Story Generation, Kemal Araz

Dissertations

Towards the last two-decade Artificial Intelligence (AI) proved its use on tasks such as image recognition, natural language processing, automated driving. As discussed in the Moore’s law the computational power increased rapidly over the few decades (Moore, 1965) and made it possible to use the techniques which were computationally expensive. These techniques include Deep Learning (DL) changed the field of AI and outperformed other models in a lot of fields some of which mentioned above. However, in natural language generation especially for creative tasks that needs the artificial intelligent models to have not only a precise understanding of the given …


Quantitative Metrics For Mutation Testing, Amani M. Ayad Dec 2019

Quantitative Metrics For Mutation Testing, Amani M. Ayad

Dissertations

Program mutation is the process of generating versions of a base program by applying elementary syntactic modifications; this technique has been used in program testing in a variety of applications, most notably to assess the quality of a test data set. A good test set will discover the difference between the original program and mutant except if the mutant is semantically equivalent to the original program, despite being syntactically distinct.

Equivalent mutants are a major nuisance in the practice of mutation testing, because they introduce a significant amount of bias and uncertainty in the analysis of test results; indeed, mutants …


Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira Dec 2019

Using Machine Learning Classification Methods To Detect The Presence Of Heart Disease, Nestor Pereira

Dissertations

Cardiovascular disease (CVD) is the most common cause of death in Ireland, and probably, worldwide. According to the Health Service Executive (HSE) cardiovascular disease accounting for 36% of all deaths, and one important fact, 22% of premature deaths (under age 65) are from CVD.

Using data from the Heart Disease UCI Data Set (UCI Machine Learning), we use machine learning techniques to detect the presence or absence of heart disease in the patient according to 14 features provide for this dataset. The different results are compared based on accuracy performance, confusion matrix and area under the Receiver Operating Characteristics (ROC) …


Factor Analysis Of Mixed Data (Famd) And Multiple Linear Regression In R, Nestor Pereira Dec 2019

Factor Analysis Of Mixed Data (Famd) And Multiple Linear Regression In R, Nestor Pereira

Dissertations

In the previous projects, it has been worked to statistically analysis of the factors to impact the score of the subjects of Mathematics and Portuguese for several groups of the student from secondary school from Portugal.

In this project will be interested in finding a model, hypothetically multiple linear regression, to predict the final score, dependent variable G3, of the student according to some features divide into two groups. One group, analyses the features or predictors which impact in the final score more related to the performance of the students, means variables like study time or past failures. The second …


Optimal Sampling Paths For Autonomous Vehicles In Uncertain Ocean Flows, Andrew J. De Stefan Aug 2019

Optimal Sampling Paths For Autonomous Vehicles In Uncertain Ocean Flows, Andrew J. De Stefan

Dissertations

Despite an extensive history of oceanic observation, researchers have only begun to build a complete picture of oceanic currents. Sparsity of instrumentation has created the need to maximize the information extracted from every source of data in building this picture. Within the last few decades, autonomous vehicles, or AVs, have been employed as tools to aid in this research initiative. Unmanned and self-propelled, AVs are capable of spending weeks, if not months, exploring and monitoring the oceans. However, the quality of data acquired by these vehicles is highly dependent on the paths along which they collect their observational data. The …


Forecasting Anomalous Events And Performance Correlation Analysis In Event Data, Sonya Leech [Thesis] Jan 2019

Forecasting Anomalous Events And Performance Correlation Analysis In Event Data, Sonya Leech [Thesis]

Dissertations

Classical and Deep Learning methods are quite common approaches for anomaly detection. Extensive research has been conducted on single point anomalies. Collective anomalies that occur over a set of two or more durations are less likely to happen by chance than that of a single point anomaly. Being able to observe and predict these anomalous events may reduce the risk of a server’s performance. This paper presents a comparative analysis into time-series forecasting of collective anomalous events using two procedures. One is a classical SARIMA model and the other is a deep learning Long-Short Term Memory (LSTM) model. It then …


An Investigation Into The Predictive Capability Of Customer Spending In Modelling Mortgage Default, Donal Finn [Thesis] Jan 2019

An Investigation Into The Predictive Capability Of Customer Spending In Modelling Mortgage Default, Donal Finn [Thesis]

Dissertations

The mortgage arrears crisis in Ireland was and is among the most severe experienced on record and although there has been a decreasing trend in the number of mortgages in default in the past four years, it still continues to cause distress to borrowers and vulnerabilities to lenders. There are indications that one of the main factors associated with mortgage default is loan affordability, of which the level of disposable income is a driver. Additionally, guidelines set out by the European Central Bank instructed financial institutions to adopt measures to further reduce and prevent loans defaulting, including the implementation and …


An Evaluation Of The Information Security Awareness Of University Students, Alan Pike Jan 2019

An Evaluation Of The Information Security Awareness Of University Students, Alan Pike

Dissertations

Between January 2017 and March 2018, it is estimated that more than 1.9 billion personal and sensitive data records were compromised online. The average cost of a data breach in 2018 was reported to be in the region of US$3.62 million. These figures alone highlight the need for computer users to have a high level of information security awareness (ISA). This research was conducted to establish the ISA of students in a university. There were three aspects to this piece of research. The first was to review and analyse the security habits of students in terms of their own personal …


Noise Reduction In Eeg Signals Using Convolutional Autoencoding Techniques, Conor Hanrahan Jan 2019

Noise Reduction In Eeg Signals Using Convolutional Autoencoding Techniques, Conor Hanrahan

Dissertations

The presence of noise in electroencephalography (EEG) signals can significantly reduce the accuracy of the analysis of the signal. This study assesses to what extent stacked autoencoders designed using one-dimensional convolutional neural network layers can reduce noise in EEG signals. The EEG signals, obtained from 81 people, were processed by a two-layer one-dimensional convolutional autoencoder (CAE), whom performed 3 independent button pressing tasks. The signal-to-noise ratios (SNRs) of the signals before and after processing were calculated and the distributions of the SNRs were compared. The performance of the model was compared to noise reduction performance of Principal Component Analysis, with …


Predicting Violent Crime Reports From Geospatial And Temporal Attributes Of Us 911 Emergency Call Data, Vincent Corcoran Jan 2019

Predicting Violent Crime Reports From Geospatial And Temporal Attributes Of Us 911 Emergency Call Data, Vincent Corcoran

Dissertations

The aim of this study is to create a model to predict which 911 calls will result in crime reports of a violent nature. Such a prediction model could be used by the police to prioritise calls which are most likely to lead to violent crime reports. The model will use geospatial and temporal attributes of the call to predict whether a crime report will be generated. To create this model, a dataset of characteristics relating to the neighbourhood where the 911 call originated will be created and combined with characteristics related to the time of the 911 call. Geospatial …


Enhancing Partially Labelled Data: Self Learning And Word Vectors In Natural Language Processing, Eamon Mcentee Jan 2019

Enhancing Partially Labelled Data: Self Learning And Word Vectors In Natural Language Processing, Eamon Mcentee

Dissertations

There has been an explosion in unstructured text data in recent years with services like Twitter, Facebook and WhatsApp helping drive this growth. Many of these companies are facing pressure to monitor the content on their platforms and as such Natural Language Processing (NLP) techniques are more important than ever. There are many applications of NLP ranging from spam filtering, sentiment analysis of social media, automatic text summarisation and document classification.


Detection Of Offensive Youtube Comments, A Performance Comparison Of Deep Learning Approaches, Priyam Bansal Jan 2019

Detection Of Offensive Youtube Comments, A Performance Comparison Of Deep Learning Approaches, Priyam Bansal

Dissertations

Social media data is open, free and available in massive quantities. However, there is a significant limitation in making sense of this data because of its high volume, variety, uncertain veracity, velocity, value and variability. This work provides a comprehensive framework of text processing and analysis performed on YouTube comments having offensive and non-offensive contents.

YouTube is a platform where every age group of people logs in and finds the type of content that most appeals to them. Apart from this, a massive increase in the use of offensive language has been apparent. As there are massive volume of new …


Performance Comparison Of Hybrid Cnn-Svm And Cnn-Xgboost Models In Concrete Crack Detection, Sahana Thiyagarajan Jan 2019

Performance Comparison Of Hybrid Cnn-Svm And Cnn-Xgboost Models In Concrete Crack Detection, Sahana Thiyagarajan

Dissertations

Detection of cracks mainly has been a sort of essential step in visual inspection involved in construction engineering as it is the commonly used building material and cracks in them is an early sign of de-basement. It is hard to find cracks by a visual check for the massive structures. So, the development of crack detecting systems generally has been a critical issue. The utilization of contextual image processing in crack detection is constrained, as image data usually taken under real-world situations vary widely and also includes the complex modelling of cracks and the extraction of handcrafted features. Therefore the …


An Evaluation Of Learning Employing Natural Language Processing And Cognitive Load Assessment, Mrunal Tipari Jan 2019

An Evaluation Of Learning Employing Natural Language Processing And Cognitive Load Assessment, Mrunal Tipari

Dissertations

One of the key goals of Pedagogy is to assess learning. Various paradigms exist and one of this is Cognitivism. It essentially sees a human learner as an information processor and the mind as a black box with limited capacity that should be understood and studied. With respect to this, an approach is to employ the construct of cognitive load to assess a learner's experience and in turn design instructions better aligned to the human mind. However, cognitive load assessment is not an easy activity, especially in a traditional classroom setting. This research proposes a novel method for evaluating learning …