Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee Dec 2021

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee

Dissertations

A wide spectrum of big data applications in science, engineering, and industry generate large datasets, which must be managed and processed in a timely and reliable manner for knowledge discovery. These tasks are now commonly executed in big data computing systems exemplified by Hadoop based on parallel processing and distributed storage and management. For example, many companies and research institutions have developed and deployed big data systems on top of NoSQL databases such as HBase and MongoDB, and parallel computing frameworks such as MapReduce and Spark, to ensure timely data analyses and efficient result delivery for decision making and business …


Performance Optimization Of Big Data Computing Workflows For Batch And Stream Data Processing In Multi-Clouds, Huiyan Cao Dec 2020

Performance Optimization Of Big Data Computing Workflows For Batch And Stream Data Processing In Multi-Clouds, Huiyan Cao

Dissertations

Workflow techniques have been widely used as a major computing solution in many science domains. With the rapid deployment of cloud infrastructures around the globe and the economic benefits of cloud-based computing and storage services, an increasing number of scientific workflows have migrated or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy various data- and network-intensive computing workflows such as serial computing workflows, MapReduce/Spark-based workflows, and Storm-based stream data processing workflows in multi-cloud environments, where inter-cloud data transfer oftentimes plays a significant role in both workflow performance …


Dimension Reduction Techniques For High Dimensional And Ultra-High Dimensional Data, Subha Datta Dec 2019

Dimension Reduction Techniques For High Dimensional And Ultra-High Dimensional Data, Subha Datta

Dissertations

This dissertation introduces two statistical techniques to tackle high-dimensional data, which is very commonplace nowadays. It consists of two topics which are inter-related by a common link, dimension reduction.

The first topic is a recently introduced classification technique, the weighted principal support vector machine (WPSVM), which is incorporated into a spatial point process framework. The WPSVM possesses an additional parameter, a weight parameter, besides the regularization parameter. Most statistical techniques, including WPSVM, have an inherent assumption of independence, which means the data points are not connected with each other in any manner. But spatial data violates this assumption. Correlation between …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


Hierarchical Cluster Analysis: A New Type Of Ranking Criteria Based On Arwu Ranking Data, Zhengshuo Li Jan 2019

Hierarchical Cluster Analysis: A New Type Of Ranking Criteria Based On Arwu Ranking Data, Zhengshuo Li

Dissertations

The advent of big data leads to many applications of Machine Learning techniques. University rankings is one of the applicable domains, which is currently playing a crucial role in the assessment of the universities' performance. Currently, the rankings are usually carried out by some authoritative ranking institutions by means of weighting techniques and the results are conveyed in numerical rankings. Three of the most famous university ranking institutions have been introduced from a technical perspective. However, these institutions have been proven to be subjective in relation to their data selection and weighting method.


Performance Optimization And Energy Efficiency Of Big-Data Computing Workflows, Tong Shu Jul 2017

Performance Optimization And Energy Efficiency Of Big-Data Computing Workflows, Tong Shu

Dissertations

Next-generation e-science is producing colossal amounts of data, now frequently termed as Big Data, on the order of terabyte at present and petabyte or even exabyte in the predictable future. These scientific applications typically feature data-intensive workflows comprised of moldable parallel computing jobs, such as MapReduce, with intricate inter-job dependencies. The granularity of task partitioning in each moldable job of such big data workflows has a significant impact on workflow completion time, energy consumption, and financial cost if executed in clouds, which remains largely unexplored. This dissertation conducts an in-depth investigation into the properties of moldable jobs and provides an …


Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron Apr 2017

Big Data Analytics In Computational Biology And Bioinformatics, Kevin Byron

Dissertations

Big data analytics in computational biology and bioinformatics refers to an array of operations including biological pattern discovery, classification, prediction, inference, clustering as well as data mining in the cloud, among others. This dissertation addresses big data analytics by investigating two important operations, namely pattern discovery and network inference.

The dissertation starts by focusing on biological pattern discovery at a genomic scale. Research reveals that the secondary structure in non-coding RNA (ncRNA) is more conserved during evolution than its primary nucleotide sequence. Using a covariance model approach, the stems and loops of an ncRNA secondary structure are represented as a …


Cognitive Big Data Analytics And Persuasive Social Influence Diffusion, Eman Ahmed Ghanim Abukhousa May 2016

Cognitive Big Data Analytics And Persuasive Social Influence Diffusion, Eman Ahmed Ghanim Abukhousa

Dissertations

Current demands in local and global economies and the pursuit of competitiveness are calling for data-driven strategies. Data-driven solutions analyze trends, make predictions about future events, and prescribe what to do next in an actionable manner. However, cognitive and behavioral data are distinguished by their multiplicity and rapid changes to meet evolving and dynamic goals of individuals. This research work is concerned with the utility of analytical solutions to synthesize and influence cognitive and behavioral adoption. We propose a multidimensional data model to identify and extract cognitive indicators for analysis and persuasive interventions. The process starts by discovering behavioral features …