Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Series

Hadoop

Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Engineering

Scheduling And Prefetching In Hadoop With Block Access Pattern Awareness And Global Memory Sharing With Load Balancing Scheme, Sai Suman Jun 2019

Scheduling And Prefetching In Hadoop With Block Access Pattern Awareness And Global Memory Sharing With Load Balancing Scheme, Sai Suman

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Although several scheduling and prefetching algorithms have been proposed to improve data locality in Hadoop, there has not been much research to increase cluster performance by targeting the issue of data locality while considering the 1) cluster memory, 2) data access patterns and 3) real-time scheduling issues together.

Firstly, considering the data access patterns is crucial because the computation might access some portion of the data in the cluster only once while the rest could be accessed multiple times. Blindly retaining data in memory might eventually lead to inefficient memory utilization.

Secondly, several studies found that the cluster memory goes …


Dnn: A Distributed Namenode Filesystem For Hadoop, Ziling Huang May 2014

Dnn: A Distributed Namenode Filesystem For Hadoop, Ziling Huang

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single NameNode architecture has long been viewed as the Achilles' heel of the Hadoop Distributed file system, as it not only represents a single point of failure, but also limits the scalability of the storage tier in the system stack. Since Hadoop is now being deployed at increasing scale, …


Managing Large Data Sets Using Support Vector Machines, Ranjini Srinivas Aug 2010

Managing Large Data Sets Using Support Vector Machines, Ranjini Srinivas

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for …