Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Engineering

Scheduling And Prefetching In Hadoop With Block Access Pattern Awareness And Global Memory Sharing With Load Balancing Scheme, Sai Suman Jun 2019

Scheduling And Prefetching In Hadoop With Block Access Pattern Awareness And Global Memory Sharing With Load Balancing Scheme, Sai Suman

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Although several scheduling and prefetching algorithms have been proposed to improve data locality in Hadoop, there has not been much research to increase cluster performance by targeting the issue of data locality while considering the 1) cluster memory, 2) data access patterns and 3) real-time scheduling issues together.

Firstly, considering the data access patterns is crucial because the computation might access some portion of the data in the cluster only once while the rest could be accessed multiple times. Blindly retaining data in memory might eventually lead to inefficient memory utilization.

Secondly, several studies found that the cluster memory goes …


Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi Jan 2019

Big Data Investment And Knowledge Integration In Academic Libraries, Saher Manaseer, Afnan R. Alawneh, Dua Asoudi

Copyright, Fair Use, Scholarly Communication, etc.

Recently, big data investment has become important for organizations, especially with the fast growth of data following the huge expansion in the usage of social media applications, and websites. Many organizations depend on extracting and reaching the needed reports and statistics. As the investments on big data and its storage have become major challenges for organizations, many technologies and methods have been developed to tackle those challenges.

One of such technologies is Hadoop, a framework that is used to divide big data into packages and distribute those packages through nodes to be processed, consuming less cost than the traditional storage …