Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Series

MapReduce

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Computer Engineering

Scheduling In Mapreduce Clusters, Chen He Feb 2018

Scheduling In Mapreduce Clusters, Chen He

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

MapReduce is a framework proposed by Google for processing huge amounts of data in a distributed environment. The simplicity of the programming model and the fault-tolerance feature of the framework make it very popular in Big Data processing.

As MapReduce clusters get popular, their scheduling becomes increasingly important. On one hand, many MapReduce applications have high performance requirements, for example, on response time and/or throughput. On the other hand, with the increasing size of MapReduce clusters, the energy-efficient scheduling of MapReduce clusters becomes inevitable. These scheduling challenges, however, have not been systematically studied.

The objective of this dissertation is to …


Managing Large Data Sets Using Support Vector Machines, Ranjini Srinivas Aug 2010

Managing Large Data Sets Using Support Vector Machines, Ranjini Srinivas

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for …