Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 35

Full-Text Articles in Physical Sciences and Mathematics

Data Heterogeneity And Its Implications For Fairness, Ghazaleh Noroozi Aug 2023

Data Heterogeneity And Its Implications For Fairness, Ghazaleh Noroozi

Electronic Thesis and Dissertation Repository

Data heterogeneity, referring to the differences in underlying generative processes that produce the data, presents challenges in analyzing and utilizing datasets for decision-making tasks. This thesis examines the impact of data heterogeneity on biases and fairness in predictive models. The research investigates the correlation between heterogeneity and protected attributes, such as race and gender, and explores the implications of such heterogeneity on biases that may arise in downstream applications.

The contributions of this thesis are fourfold. Firstly, a comprehensive definition of data heterogeneity based on differences in underlying generative processes is provided, establishing a conceptual framework for understanding and quantifying …


On Computing Optimal Repairs For Conditional Independence, Alireza Pirhadi Aug 2023

On Computing Optimal Repairs For Conditional Independence, Alireza Pirhadi

Electronic Thesis and Dissertation Repository

This thesis focuses on the concept of Conditional Independence (CI) and its testing, which holds immense significance across various fields, including economics, social sciences, and biomedical research. Notably, within computer science, CI has become an integral part of building probabilistic and causal models. It aids efficient inference and plays a key role in uncovering causal relationships.

The primary aim of this thesis is to broaden the scope of CI beyond its testing aspect. We introduce the pioneering problem of data repair, designed to adhere to particular CI constraints. The value and pertinence of this problem are highlighted through two contrasting …


Improving Deep Entity Resolution By Constraints, Soudeh Nilforoushan Aug 2022

Improving Deep Entity Resolution By Constraints, Soudeh Nilforoushan

Electronic Thesis and Dissertation Repository

Entity resolutions the problem of finding duplicate data in a dataset and resolving possible differences and inconsistencies. ER is a long-standing data management and information retrieval problem and a core data integration and cleaning task. There are diverse solutions for ER that apply rule-based techniques, pairwise binary classification, clustering, and probabilistic inference, among other techniques. Deep learning (DL) has been extensively used for ER and has shown competitive performance compared to conventional ER solutions. The state-of-the-art (SOTA) ER solutions using DL are based on pairwise comparison and binary classification. They transform pairs of records into a latent space that can …


Reputation-Based Trust Assessment Of Transacting Service Components, Konstantinos Tsiounis Jul 2022

Reputation-Based Trust Assessment Of Transacting Service Components, Konstantinos Tsiounis

Electronic Thesis and Dissertation Repository

As Service-Oriented Systems rely for their operation on many different, and most often, distributed software components, a key issue that emerges is how one component can trust the services offered by another component. Here, the concept of trust is considered in the context of reputation systems and is viewed as a meta-requirement, that is, the level of belief a service requestor has that a service provider will provide the service in a way that meets the requestor’s expectations. We refer to the service offering components as service providers (SPs) and the service requesting components as service clients (SCs).

In this …


Exploratory Search With Archetype-Based Language Models, Brent D. Davis Aug 2021

Exploratory Search With Archetype-Based Language Models, Brent D. Davis

Electronic Thesis and Dissertation Repository

This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …


Regional Integration: Physician Perceptions On Electronic Medical Record Use And Impact In South West Ontario, Sadiq Raji Dec 2020

Regional Integration: Physician Perceptions On Electronic Medical Record Use And Impact In South West Ontario, Sadiq Raji

Electronic Thesis and Dissertation Repository

Regional initiatives in the health care context in Canada are typically organized and administered along geographic boundaries or operational units. Regional integration of Electronic Medical Records (EMR) has been continuing across Canadian provinces in recent years, yet the use and impact of regionally integrated EMRs are not routinely assessed and questions remain about their impact on and use in physicians’ practices. Are stated goals of simplifying connections and sharing of electronic health information collected and managed by many health services providers being met? What are physicians’ perspectives on the use and impact of regionally integrated EMR? In this thesis, I …


Making Sense Of Online Public Health Debates With Visual Analytics Systems, Anton Ninkov Nov 2020

Making Sense Of Online Public Health Debates With Visual Analytics Systems, Anton Ninkov

Electronic Thesis and Dissertation Repository

Online debates occur frequently and on a wide variety of topics. Particularly, online debates about various public health topics (e.g., vaccines, statins, cannabis, dieting plans) are prevalent in today’s society. These debates are important because of the real-world implications they can have on public health. Therefore, it is important for public health stakeholders (i.e., those with a vested interest in public health) and the general public to have the ability to make sense of these debates quickly and effectively. This dissertation investigates ways of enabling sense-making of these debates with the use of visual analytics systems (VASes). VASes are computational …


Visual Analytics Of Electronic Health Records With A Focus On Acute Kidney Injury, Sheikh S. Abdullah Jul 2020

Visual Analytics Of Electronic Health Records With A Focus On Acute Kidney Injury, Sheikh S. Abdullah

Electronic Thesis and Dissertation Repository

The increasing use of electronic platforms in healthcare has resulted in the generation of unprecedented amounts of data in recent years. The amount of data available to clinical researchers, physicians, and healthcare administrators continues to grow, which creates an untapped resource with the ability to improve the healthcare system drastically. Despite the enthusiasm for adopting electronic health records (EHRs), some recent studies have shown that EHR-based systems hardly improve the ability of healthcare providers to make better decisions. One reason for this inefficacy is that these systems do not allow for human-data interaction in a manner that fits and supports …


Hierarchical Group And Attribute-Based Access Control: Incorporating Hierarchical Groups And Delegation Into Attribute-Based Access Control, Daniel Servos Mar 2020

Hierarchical Group And Attribute-Based Access Control: Incorporating Hierarchical Groups And Delegation Into Attribute-Based Access Control, Daniel Servos

Electronic Thesis and Dissertation Repository

Attribute-Based Access Control (ABAC) is a promising alternative to traditional models of access control (i.e. Discretionary Access Control (DAC), Mandatory Access Control (MAC) and Role-Based Access control (RBAC)) that has drawn attention in both recent academic literature and industry application. However, formalization of a foundational model of ABAC and large-scale adoption is still in its infancy. The relatively recent popularity of ABAC still leaves a number of problems unexplored. Issues like delegation, administration, auditability, scalability, hierarchical representations, etc. have been largely ignored or left to future work. This thesis seeks to aid in the adoption of ABAC by filling in …


A Visual Analytics System For Making Sense Of Real-Time Twitter Streams, Amir Haghighatimaleki Jan 2020

A Visual Analytics System For Making Sense Of Real-Time Twitter Streams, Amir Haghighatimaleki

Electronic Thesis and Dissertation Repository

Through social media platforms, massive amounts of data are being produced. Twitter, as one such platform, enables users to post “tweets” on an unprecedented scale. Once analyzed by machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This thesis describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and …


Automatic Recall Of Software Lessons Learned For Software Project Managers, Tamer Mohamed Abdellatif Mohamed, Luiz Fernando Capretz, Danny Ho Nov 2019

Automatic Recall Of Software Lessons Learned For Software Project Managers, Tamer Mohamed Abdellatif Mohamed, Luiz Fernando Capretz, Danny Ho

Electrical and Computer Engineering Publications

Context: Lessons learned (LL) records constitute the software organization memory of successes and failures. LL are recorded within the organization repository for future reference to optimize planning, gain experience, and elevate market competitiveness. However, manually searching this repository is a daunting task, so it is often disregarded. This can lead to the repetition of previous mistakes or even missing potential opportunities. This, in turn, can negatively affect the organization’s profitability and competitiveness.

Objective: We aim to present a novel solution that provides an automatic process to recall relevant LL and to push those LL to project managers. This will dramatically …


Ml4iot: A Framework To Orchestrate Machine Learning Workflows On Internet Of Things Data, Jose Miguel Alves, Leonardo Honorio, Miriam A M Capretz Oct 2019

Ml4iot: A Framework To Orchestrate Machine Learning Workflows On Internet Of Things Data, Jose Miguel Alves, Leonardo Honorio, Miriam A M Capretz

Electrical and Computer Engineering Publications

Internet of Things (IoT) applications generate vast amounts of real-time data. Temporal analysis of these data series to discover behavioural patterns may lead to qualified knowledge affecting a broad range of industries. Hence, the use of machine learning (ML) algorithms over IoT data has the potential to improve safety, economy, and performance in critical processes. However, creating ML workflows at scale is a challenging task that depends upon both production and specialized skills. Such tasks require investigation, understanding, selection, and implementation of specific ML workflows, which often lead to bottlenecks, production issues, and code management complexity and even then may …


Secured Data Masking Framework And Technique For Preserving Privacy In A Business Intelligence Analytics Platform, Osama Ali Dec 2018

Secured Data Masking Framework And Technique For Preserving Privacy In A Business Intelligence Analytics Platform, Osama Ali

Electronic Thesis and Dissertation Repository

The main concept behind business intelligence (BI) is how to use integrated data across different business systems within an enterprise to make strategic decisions. It is difficult to map internal and external BI’s users to subsets of the enterprise’s data warehouse (DW), resulting that protecting the privacy of this data while maintaining its utility is a challenging task. Today, such DW systems constitute one of the most serious privacy breach threats that an enterprise might face when many internal users of different security levels have access to BI components. This thesis proposes a data masking framework (iMaskU: Identify, Map, Apply, …


Text Mining In Chinese Ancient Attires, Lu Wang Mar 2018

Text Mining In Chinese Ancient Attires, Lu Wang

Western Research Forum

Starting from the Shang Dynasty (1600-1046 BCE) when writing system appeared in China, clothing was recorded as symbols to denote social statuses. The hierarchical signification of clothing remained in the following dynasties until the end of imperial China in 1911. The imperial period produced twenty-five official dynastic histories with rich corpuses on the subject of attire, documenting regulations and prohibitions of detailed dress code, a subject being scarcely studied and treated with assumptions today. This research will use text mining tools to identify descriptive words of clothing that reflect Chinese hierarchal ideology from the twenty-five histories. The method is to …


Nbpmf: Novel Network-Based Inference Methods For Peptide Mass Fingerprinting, Zhewei Liang Nov 2017

Nbpmf: Novel Network-Based Inference Methods For Peptide Mass Fingerprinting, Zhewei Liang

Electronic Thesis and Dissertation Repository

Proteins are large, complex molecules that perform a vast array of functions in every living cell. A proteome is a set of proteins produced in an organism, and proteomics is the large-scale study of proteomes. Several high-throughput technologies have been developed in proteomics, where the most commonly applied are mass spectrometry (MS) based approaches. MS is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification, quantification, and post translational modification (PTM) characterization in proteomics research. There are usually two different ways to identify proteins: top-down and bottom-up. Top-down approaches …


Mining Of Primary Healthcare Patient Data With Selective Multimorbid Diseases, Annette Megerdichian Azad May 2017

Mining Of Primary Healthcare Patient Data With Selective Multimorbid Diseases, Annette Megerdichian Azad

Electronic Thesis and Dissertation Repository

Despite a large volume of research on the prognosis, diagnosis and overall burden of multimorbidity, very little is known about socio-demographic characteristics of multimorbid patients. This thesis aims to analyze the socio-demographic characteristics of patients with multiple chronic conditions (multimorbidity), focusing on patient groups sharing the same combination of diseases. Several methods were explored to analyze the co-occurrence of multiple chronic diseases as well as the associations between socio-demographics and chronic conditions. These methods include disease pair distributions over gender, age groups and income level quintiles, Multimorbidity Coefficients for measuring the concurrence of disease pairs and triples, and k-modes clustering …


A Gamification Framework For Sensor Data Analytics, Alexandra L'Heureux, Katarina Grolinger, Wilson A. Higashino, Miriam A. M. Capretz Jan 2017

A Gamification Framework For Sensor Data Analytics, Alexandra L'Heureux, Katarina Grolinger, Wilson A. Higashino, Miriam A. M. Capretz

Electrical and Computer Engineering Publications

The Internet of Things (IoT) enables connected objects to capture, communicate, and collect information over the network through a multitude of sensors, setting the foundation for applications such as smart grids, smart cars, and smart cities. In this context, large scale analytics is needed to extract knowledge and value from the data produced by these sensors. The ability to perform analytics on these data, however, is highly limited by the difficulties of collecting labels. Indeed, the machine learning techniques used to perform analytics rely upon data labels to learn and to validate results. Historically, crowdsourcing platforms have been used to …


Complex Event Processing As A Service In Multi-Cloud Environments, Wilson A. Higashino Aug 2016

Complex Event Processing As A Service In Multi-Cloud Environments, Wilson A. Higashino

Electronic Thesis and Dissertation Repository

The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data generation have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate.

To address these problems, this research …


Advanced Driving Assistance Prediction Systems, Maedeh Hesabgar Apr 2016

Advanced Driving Assistance Prediction Systems, Maedeh Hesabgar

Electronic Thesis and Dissertation Repository

Future automobiles are going to experience a fundamental evolution by installing semiotic predictor driver assistance equipment. To meet these equipment, Continuous driving-behavioral data have to be observed and processed to construct powerful predictive driving assistants. In this thesis, we focus on raw driving-behavioral data and present a prediction method which is able to prognosticate the next driving-behavioral state. This method has been constructed based on the unsupervised double articulation analyzer method (DAA) which is able to segment meaningless continuous driving-behavioral data into a meaningful sequence of driving situations. Thereafter, our novel model by mining the sequences of driving situations can …


Clustering-Based Personalization, Seyed Nima Mirbakhsh Sep 2015

Clustering-Based Personalization, Seyed Nima Mirbakhsh

Electronic Thesis and Dissertation Repository

Recommendation systems have been the most emerging technology in the last decade as one of the key parts in e-commerce ecosystem. Businesses offer a wide variety of items and contents through different channels such as Internet, Smart TVs, Digital Screens, etc. The number of these items sometimes goes over millions for some businesses. Therefore, users can have trouble finding the products that they are looking for. Recommendation systems address this problem by providing powerful methods which enable users to filter through large information and product space based on their preferences. Moreover, users have different preferences. Thus, businesses can employ recommendation …


Application Of Risk Metrics For Role Mining, Sharmin Ahmed Aug 2014

Application Of Risk Metrics For Role Mining, Sharmin Ahmed

Electronic Thesis and Dissertation Repository

Incorporating risk consideration in access control systems has recently become a popular research topic. Related to this is risk awareness which is needed to enable access control in an agile and dynamic way. While risk awareness is probably known for an established access control system, being aware of risk even before the access control system is defined can mean identification of users and permissions that are most likely to lead to dangerous or error-prone situations from an administration point of view. Having this information available during the role engineering phase allows data analysts and role engineers to highlight potentially risky …


Contextual Anomaly Detection In Big Sensor Data, Michael Hayes, Miriam A M Capretz Jun 2014

Contextual Anomaly Detection In Big Sensor Data, Michael Hayes, Miriam A M Capretz

Electrical and Computer Engineering Publications

Performing predictive modelling, such as anomaly detection, in Big Data is a difficult task. This problem is compounded as more and more sources of Big Data are generated from environmental sensors, logging applications, and the Internet of Things. Further, most current techniques for anomaly detection only consider the content of the data source, i.e. the data itself, without concern for the context of the data. As data becomes more complex it is increasingly important to bias anomaly detection techniques for the context, whether it is spatial, temporal, or semantic. The work proposed in this paper outlines a contextual anomaly detection …


Semantic Privacy Policies For Service Description And Discovery In Service-Oriented Architecture, Diego Z. Garcia, Miriam A M Capretz, M. Beatriz F. Toledo Mar 2014

Semantic Privacy Policies For Service Description And Discovery In Service-Oriented Architecture, Diego Z. Garcia, Miriam A M Capretz, M. Beatriz F. Toledo

Electrical and Computer Engineering Publications

Privacy preservation in Service-Oriented Architecture (SOA) is an open problem. This paper focuses on the areas of service description and discovery. The problems in these areas are that currently it is not possible to describe how a service provider deals with information received from a service consumer as well as discover a service that satisfies the privacy preferences of a consumer. There is currently no framework which offers a solution that supports a rich description of privacy policies and their integration in the process of service discovery. Thus, the main goal of this paper is to propose a privacy preservation …


Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz Dec 2013

Data Management In Cloud Environments: Nosql And Newsql Data Stores, Katarina Grolinger, Wilson A. Higashino, Abhinav Tiwari, Miriam Am Capretz

Electrical and Computer Engineering Publications

: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the …


Disaster Data Management In Cloud Environments, Katarina Grolinger Dec 2013

Disaster Data Management In Cloud Environments, Katarina Grolinger

Electronic Thesis and Dissertation Repository

Facilitating decision-making in a vital discipline such as disaster management requires information gathering, sharing, and integration on a global scale and across governments, industries, communities, and academia. A large quantity of immensely heterogeneous disaster-related data is available; however, current data management solutions offer few or no integration capabilities and limited potential for collaboration. Moreover, recent advances in cloud computing, Big Data, and NoSQL have opened the door for new solutions in disaster data management.

In this thesis, a Knowledge as a Service (KaaS) framework is proposed for disaster cloud data management (Disaster-CDM) with the objectives of 1) facilitating information gathering …


An Access Control Model For Nosql Databases, Motahera Shermin Dec 2013

An Access Control Model For Nosql Databases, Motahera Shermin

Electronic Thesis and Dissertation Repository

Current development platforms are web scale, unlike recent platforms which were just network scale. There has been a rapid evolution in computing paradigm that has created the need for data storage as agile and scalable as the applications they support. Relational databases with their joins and locks influence performance in web scale systems negatively. Thus, various types of non-relational databases have emerged in recent years, commonly referred to as NoSQL databases. To fulfill the gaps created by their relational counter-part, they trade consistency and security for performance and scalability. With NoSQL databases being adopted by an increasing number of organizations, …


Hierarchical Classification And Its Application In University Search, Xiao Li Aug 2013

Hierarchical Classification And Its Application In University Search, Xiao Li

Electronic Thesis and Dissertation Repository

Web search engines have been adopted by most universities for searching webpages in their own domains. Basically, a user sends keywords to the search engine and the search engine returns a flat ranked list of webpages. However, in university search, user queries are usually related to topics. Simple keyword queries are often insufficient to express topics as keywords. On the other hand, most E-commerce sites allow users to browse and search products in various hierarchies. It would be ideal if hierarchical browsing and keyword search can be seamlessly combined for university search engines. The main difficulty is to automatically classify …


Knowledge As A Service Framework For Disaster Data Management, Katarina Grolinger, Emna Mezghani, Miriam Am Capretz, Ernesto Exposito Jan 2013

Knowledge As A Service Framework For Disaster Data Management, Katarina Grolinger, Emna Mezghani, Miriam Am Capretz, Ernesto Exposito

Electrical and Computer Engineering Publications

Each year, a number of natural disasters strike across the globe, killing hundreds and causing billions of dollars in property and infrastructure damage. Minimizing the impact of disasters is imperative in today’s society. As the capabilities of software and hardware evolve, so does the role of information and communication technology in disaster mitigation, preparation, response, and recovery. A large quantity of disaster-related data is available, including response plans, records of previous incidents, simulation data, social media data, and Web sites. However, current data management solutions offer few or no integration capabilities. Moreover, recent advances in cloud computing, big data, and …


Hearts And Minds: Examining The Evolution Of The Egyptian Excerebration And Evisceration Traditions Through The Impact Mummy Database, Andrew D. Wade Apr 2012

Hearts And Minds: Examining The Evolution Of The Egyptian Excerebration And Evisceration Traditions Through The Impact Mummy Database, Andrew D. Wade

Electronic Thesis and Dissertation Repository

Egyptian mummification and funerary rituals were a transformative process, making the deceased a pure being; free of disease, injury, and disfigurements, as well as ethical and moral impurities. Consequently, the features of mummification available to specific categories of individuals hold social and ideological significance. This study refutes long-held classical stereotypes, particularly dogmatic class associations; demonstrates the apocryphal nature of universal heart retention; and expands on the purposes of excerebration and evisceration implied by synthetic and radiological analyses.

Features of the embalming traditions, specifically the variable excerebration and evisceration traditions, represented the Egyptian view of death. Fine-grain analyses, through primary imaging …


Autonomic Database Management: State Of The Art And Future Trends, Katarina Grolinger, Miriam Am Capretz Jan 2012

Autonomic Database Management: State Of The Art And Future Trends, Katarina Grolinger, Miriam Am Capretz

Electrical and Computer Engineering Publications

In recent years, Database Management Systems (DBMS) have increased significantly in size and complexity, increasing the extent to which database administration is a time-consuming and expensive task. Database Administrator (DBA) expenses have become a significant part of the total cost of ownership. This results in the need to develop Autonomous Database Management systems (ADBMS) that would manage themselves without human intervention. Accordingly, this paper evaluates the current state of autonomous database systems and identifies gaps and challenges in the achievement of fully autonomic databases. In addition to highlighting technical challenges and gaps, we identify one human factor, gaining the trust …