Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Engineering

Autopath: Harnessing Parallel Execution Paths For Efficient Resource Allocation In Multi-Stage Big Data Frameworks, Han Gao, Zhengyu Yang, Janki Bhimani, Teng Wang, Jiayin Wang, Ningfang Mi, Bo Sheng Dec 2016

Autopath: Harnessing Parallel Execution Paths For Efficient Resource Allocation In Multi-Stage Big Data Frameworks, Han Gao, Zhengyu Yang, Janki Bhimani, Teng Wang, Jiayin Wang, Ningfang Mi, Bo Sheng

Zhengyu Yang

Due to the flexibility of data operations and scalability of in-memory cache, Spark has revealed the potential to become the standard distributed framework to replace Hadoop for data-intensive processing in both industry and academia. However, we observe that the built-in scheduling algorithms in Spark (i.e., FIFO and FAIR) are not optimized for the applications with multiple parallel and independent branches in stages. Specifically, the child stage needs to wait and collect data from all its parent branches, but this wait has no guaranteed upper bound since it is tightly coupled with each branch’s workload characteristic, stage order, and their corresponding …


Finite Element Simulation Of Prevention Thermal Cracking In Mass Concrete, Juncai Xu, Zhengzhong Shen, Song Yang, Xin Xie, Zhengyu Yang Dec 2016

Finite Element Simulation Of Prevention Thermal Cracking In Mass Concrete, Juncai Xu, Zhengzhong Shen, Song Yang, Xin Xie, Zhengyu Yang

Zhengyu Yang

Mass concrete structures play a very important role in civil engineering. The cracking of concrete is regarded as one of the biggest engineering problems. Therefore, it is very necessary for the cracking of mass concrete to do the control analysis. Some factors should be considered in mass concrete crack control analysis, mainly including the heat releases model of concrete, the mechanical model to the concrete, the process of temperature control in the pipe model. Differential evolution algorithm and equivalent algorithm are adopted to solve the coefficient of adiabatic temperature and cool water effect. In the paper, stress field calculation, back …


Fim: Performance Prediction For Parallel Computation In Iterative Data Processing Applications, Janki Bhimani, Ningfang Mi, Miriam Leeser, Zhengyu Yang Dec 2016

Fim: Performance Prediction For Parallel Computation In Iterative Data Processing Applications, Janki Bhimani, Ningfang Mi, Miriam Leeser, Zhengyu Yang

Zhengyu Yang

Predicting performance of an application running on high performance computing (HPC) platforms in a cloud environment is increasingly becoming important because of its influence on development time and resource management. However, predicting the performance with respect to parallel processes is complex for iterative, multi-stage applications. This research proposes a performance approximation approach FiM to model the computing performance of iterative, multi-stage applications running on a master-compute framework. FiM consists of two key components that are coupled with each other: 1) Stochastic Markov Model to capture non-deterministic runtime that often depends on parallel resources, e.g., number of processes. 2) Machine Learning …


Ea2s2: An Efficient Application-Aware Storage System For Big Data Processing In Heterogeneous Clusters, Teng Wang, Jiayin Wang, Son Nam Nguyen, Zhengyu Yang, Ningfang Mi, Bo Sheng Dec 2016

Ea2s2: An Efficient Application-Aware Storage System For Big Data Processing In Heterogeneous Clusters, Teng Wang, Jiayin Wang, Son Nam Nguyen, Zhengyu Yang, Ningfang Mi, Bo Sheng

Zhengyu Yang

Big data processing frameworks such as Hadoop have been widely adopted to process a large volume of data. A lot of prior work has focused on the allocation of resources and the execution order of jobs/tasks to improve the performance in a homogeneous cluster. In this paper, we investigate storage layer design in a heterogeneous system considering a new type of bundled jobs where the input data and associated application jobs are submitted in a bundle. Our goal is to break the barrier between resource management and the underlying storage layer, and improve data locality, an important performance factor for …


Accelerating Big Data Applications Using Lightweight Virtualization Framework On Enterprise Cloud, Janki Bhimani, Zhengyu Yang, Miriam Leeser, Ningfang Mi Dec 2016

Accelerating Big Data Applications Using Lightweight Virtualization Framework On Enterprise Cloud, Janki Bhimani, Zhengyu Yang, Miriam Leeser, Ningfang Mi

Zhengyu Yang

Hypervisor-based virtualization technology has been successfully used to deploy high-performance and scalable infrastructure for Hadoop, and now Spark applications. Container-based virtualization techniques are becoming an important option, which is increasingly used due to their lightweight operation and better scaling when compared to Virtual Machines (VM). With containerization techniques such as Docker becoming mature and promising better performance, we can use Docker to speed-up big data applications. However, as applications have different behaviors and resource requirements, before replacing traditional hypervisor-based virtual machines with Docker, it is important to analyze and compare performance of applications running in the cloud with VMs and …


Enhancing Ssds With Multi-Stream: What? Why? How?, Janki Bhimani, Jingpei Yang, Zhengyu Yang, Ningfang Mi, N. H. V. Krishna Giri, Rajinikanth Pandurangan, Changho Choi, Vijay Balakrishnan Dec 2016

Enhancing Ssds With Multi-Stream: What? Why? How?, Janki Bhimani, Jingpei Yang, Zhengyu Yang, Ningfang Mi, N. H. V. Krishna Giri, Rajinikanth Pandurangan, Changho Choi, Vijay Balakrishnan

Zhengyu Yang

The adoption of SSDs has become very prominent, but they still suffer from challenges to control write amplification. Traditional SSDs have single active append point where new data writes can be stored. Data of different lifetime stored together causes high write amplification. Recently, multi-stream SSDs are developed that allows multiple active append points. These multiple active append points can be used to store data of different lifetime in different locations within SSD. Such a data placement according to the lifetime of data would considerably reduce internal write amplification of SSD. For using multistream SSDs it is required to attach stream-id …


Seina: A Stealthy And Effective Internal Attack In Hadoop Systems, Jiayin Wang, Teng Wang, Zhengyu Yang, Ying Mao, Ningfang Mi, Bo Sheng Dec 2016

Seina: A Stealthy And Effective Internal Attack In Hadoop Systems, Jiayin Wang, Teng Wang, Zhengyu Yang, Ying Mao, Ningfang Mi, Bo Sheng

Zhengyu Yang

Big data processing frameworks such as Hadoop [1] have been widely adopted in the past few years. However, the security issues in such large scale systems have not been well studied yet. While most of the prior work is focused on the data privacy and protection, this paper investigates a potential attack from a compromised internal node against the overall system performance. We explore the vulnerabilities of the existing Hadoop system, and develop an effective attack launched from the compromised node that can significantly degrade the data processing performance of the cluster without being detected and blacklisted for job execution. …


An Algorithm For Non-Steady Thermal Dynamics Finite Element Simulation And Differential Evolution, Juncai Xu, Zhenzhong Shen, Qingwen Ren, Xin Xie, Zhengyu Yang Dec 2016

An Algorithm For Non-Steady Thermal Dynamics Finite Element Simulation And Differential Evolution, Juncai Xu, Zhenzhong Shen, Qingwen Ren, Xin Xie, Zhengyu Yang

Zhengyu Yang

Thermodynamic parameters of concrete are the significant condition in the preventing mass concrete cracking computation. The concrete inversion problem of thermodynamic parameters is a multi-parameter optimization problem. Differential Evolution is one of the optimization method of evolutionary and developed from GA (genetic algorithm). In this paper, non-steady temperature field finite element simulation and DE (Differential Evolution) combined together to establish DE inverse solution for concrete thermodynamic parameter including equivalent heat source method to realize water-pipe cooling simulation. The procedure was implemented in mode with high computational efficiency and being high accuracy. It is an effective way to select thermal parameters …


Automatic And Scalable Data Replication Manager In Distributed Computation And Storage Infrastructure Of Cyber-Physical Systems, Zhengyu Yang, Janki Bhimani, Jiayin Wang, David Evans, Ningfang Mi Dec 2016

Automatic And Scalable Data Replication Manager In Distributed Computation And Storage Infrastructure Of Cyber-Physical Systems, Zhengyu Yang, Janki Bhimani, Jiayin Wang, David Evans, Ningfang Mi

Zhengyu Yang

Cyber-Physical System (CPS) is a rising technology that utilizes computation and storage resources for sensing, processing, analysis, predicting, understanding of field-data, and then uses communication resources for interaction, intervene, and interface management, and finally provides control for systems so that they can inter-operate, evolve, and run in a stable evidence-based environment. There are two major demands when building the storage infrastructure for a CPS cluster to support above-mentioned functionalities: (1) high I/O and network throughput requirements during runtime, and (2) low latency demand for disaster recovery. To address challenges brought by these demands, in this paper, we propose a complete …


Efficient Data Caching Management In Scalable Multi-Stage Data Processing Systems, Jiayin Wang, Zhengyu Yang, David Evans Dec 2016

Efficient Data Caching Management In Scalable Multi-Stage Data Processing Systems, Jiayin Wang, Zhengyu Yang, David Evans

Zhengyu Yang

According to some example embodiments, a method includes: receiving, by a processor, from a data source, a processing profile comprising input data blocks and a plurality of operations for executing using the input data blocks; executing, by the processor, one or more of the operations of the processing profile to generate a new output data after each of the executed one or more operations; storing, by the processor, the new output data from at least one of the one or more operations as intermediate cache data; and transmitting, by the processor, the new output data from a final operation from …


Automatic Data Replica Manager In Distributed Caching And Data Processing System, Zhengyu Yang, Jiayin Wang, David Evans Dec 2016

Automatic Data Replica Manager In Distributed Caching And Data Processing System, Zhengyu Yang, Jiayin Wang, David Evans

Zhengyu Yang

A method of data storage includes determining a latency distance from a primary node to each of two or more replica nodes, choosing a preferred replica node of the two or more replica nodes based on the determined latency distances, and write-caching data into the preferred replica node.


Adaptive Caching Replacement Manager With Dynamic Updating Granularities And Partitions For Shared Flash-Based Storage System, Zhengyu Yang, David Evans, Jiayin Wang Dec 2016

Adaptive Caching Replacement Manager With Dynamic Updating Granularities And Partitions For Shared Flash-Based Storage System, Zhengyu Yang, David Evans, Jiayin Wang

Zhengyu Yang

A method of adjusting temporal and spatial granularities associated with operation of a virtualized file system, the method including analyzing past workloads of a plurality of virtual machines associated with the virtualized file system, and adjusting the temporal and spatial granularities to be similar to average re-access temporal and spatial distances of data sets corresponding to the past workloads.