Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee Dec 2021

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee

Dissertations

A wide spectrum of big data applications in science, engineering, and industry generate large datasets, which must be managed and processed in a timely and reliable manner for knowledge discovery. These tasks are now commonly executed in big data computing systems exemplified by Hadoop based on parallel processing and distributed storage and management. For example, many companies and research institutions have developed and deployed big data systems on top of NoSQL databases such as HBase and MongoDB, and parallel computing frameworks such as MapReduce and Spark, to ensure timely data analyses and efficient result delivery for decision making and business …


Binary Classifiers For Noisy Datasets: A Comparative Study Of Existing Quantum Machine Learning Frameworks And Some New Approaches, Nikolaos Schetakis, Davit Aghamalyan, Paul Robert Griffin, Michael Boguslavsky Nov 2021

Binary Classifiers For Noisy Datasets: A Comparative Study Of Existing Quantum Machine Learning Frameworks And Some New Approaches, Nikolaos Schetakis, Davit Aghamalyan, Paul Robert Griffin, Michael Boguslavsky

Research Collection School Of Computing and Information Systems

This technology offer is a quantum machine learning algorithm applied to binary classification models for noisy datasets which are prevalent in financial and other datasets. By combining hybrid-neural networks, quantum parametric circuits, and data re-uploading we have improved the classification of non-convex 2-dimensional figures by understanding learning stability as noise increases in the dataset. The metric we use for assessing the performance of our quantum classifiers is the area under the receiver operator curve (ROC AUC). We are interested to collaborate with partners with use cases for binary classification of noisy data. Also, as quantum technology is still insufficient for …


Building Legal Datasets, Jerrold Soh Nov 2021

Building Legal Datasets, Jerrold Soh

Research Collection Yong Pung How School Of Law

Data-centric AI calls for better, not just bigger, datasets. As data protection laws with extra-territorial reach proliferate worldwide, ensuring datasets are legal is an increasingly crucial yet overlooked component of “better”. To help dataset builders become more willing and able to navigate this complex legal space, this paper reviews key legal obligations surrounding ML datasets, examines the practical impact of data laws on ML pipelines, and offers a framework for building legal datasets.


Measuring Data Collection Diligence For Community Healthcare, Galawala Ramesha Samurdhi Karunasena, M. S. Ambiya, Arunesh Sinha, R. Nagar, S. Dalal, Abdullah. H., D. Thakkar, D. Narayanan, M. Tambe Oct 2021

Measuring Data Collection Diligence For Community Healthcare, Galawala Ramesha Samurdhi Karunasena, M. S. Ambiya, Arunesh Sinha, R. Nagar, S. Dalal, Abdullah. H., D. Thakkar, D. Narayanan, M. Tambe

Research Collection School Of Computing and Information Systems

Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of highquality public health data is a significant challenge in developing countries primarily due to non-diligent data collection by community health workers (CHWs). Our use of the word non-diligence here is to emphasize that poor data collection is often not a deliberate action by CHW but arises due to a myriad of factors, sometime beyond the control of the CHW. In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is handled by building upon domain expert’s guidance …


Exploratory Search With Archetype-Based Language Models, Brent D. Davis Aug 2021

Exploratory Search With Archetype-Based Language Models, Brent D. Davis

Electronic Thesis and Dissertation Repository

This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian Jun 2021

Designing Targeted Mobile Advertising Campaigns, Kimia Keshanian

USF Tampa Graduate Theses and Dissertations

With the proliferation of smart, handheld devices, there has been a multifold increase in the ability of firms to target and engage with customers through mobile advertising. Therefore, not surprisingly, mobile advertising campaigns have become an integral aspect of firms’ brand building activities, such as improving the awareness and overall visibility of firms' brands. In addition, retailers are increasingly using mobile advertising for targeted promotional activities that increase in-store visits and eventual sales conversions. However, in recent years, mobile or in general online advertising campaigns have been facing one major challenge and one major threat that can negatively impact the …


Data-Driven Recommendation Of Academic Options Based On Personality Traits, Aashish Ghimire May 2021

Data-Driven Recommendation Of Academic Options Based On Personality Traits, Aashish Ghimire

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The choice of academic major and, subsequently, an academic institution has a massive effect on a person’s career. It not only determines their career path but their earning potential, professional happiness, etc. [1] About 40% of people who are admitted to a college do not graduate within six years. Yet, very limited resources are available for students to help make those decisions, and each guidance counselor is responsible for roughly 400 to 900 students across the United States. A tool to help these decisions would benefit students, parents, and guidance counselors.

Various research studies have shown that personality traits affect …


Achieving Differential Privacy And Fairness In Machine Learning, Depeng Xu May 2021

Achieving Differential Privacy And Fairness In Machine Learning, Depeng Xu

Graduate Theses and Dissertations

Machine learning algorithms are used to make decisions in various applications, such as recruiting, lending and policing. These algorithms rely on large amounts of sensitive individual information to work properly. Hence, there are sociological concerns about machine learning algorithms on matters like privacy and fairness. Currently, many studies only focus on protecting individual privacy or ensuring fairness of algorithms separately without taking consideration of their connection. However, there are new challenges arising in privacy preserving and fairness-aware machine learning. On one hand, there is fairness within the private model, i.e., how to meet both privacy and fairness requirements simultaneously in …


Revman: Revenue-Aware Multi-Task Online Insurance Recommendation, Yu Li, Yi Zhang, Lu Gan, Gengwei Hong, Zimu Zhou, Qiang Li Feb 2021

Revman: Revenue-Aware Multi-Task Online Insurance Recommendation, Yu Li, Yi Zhang, Lu Gan, Gengwei Hong, Zimu Zhou, Qiang Li

Research Collection School Of Computing and Information Systems

Online insurance is a new type of e-commerce with exponential growth. An effective recommendation model that maximizes the total revenue of insurance products listed in multiple customized sales scenarios is crucial for the success of online insurance business. Prior recommendation models are ineffective because they fail to characterize the complex relatedness of insurance products in multiple sales scenarios and maximize the overall conversion rate rather than the total revenue. Even worse, it is impractical to collect training data online for total revenue maximization due to the business logic of online insurance. We propose RevMan, a Revenue-aware Multi-task Network for online …


Information Architecture For A Chemical Modeling Knowledge Graph, Adam R. Luxon Jan 2021

Information Architecture For A Chemical Modeling Knowledge Graph, Adam R. Luxon

Theses and Dissertations

Machine learning models for chemical property predictions are high dimension design challenges spanning multiple disciplines. Free and open-source software libraries have streamlined the model implementation process, but the design complexity remains. In order better navigate and understand the machine learning design space, model information needs to be organized and contextualized. In this work, instances of chemical property models and their associated parameters were stored in a Neo4j property graph database. Machine learning model instances were created with permutations of dataset, learning algorithm, molecular featurization, data scaling, data splitting, hyperparameters, and hyperparameter optimization techniques. The resulting graph contains over 83,000 nodes …