Physical Sciences and Mathematics | Open Access Articles

Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson Jun 2013

Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson

Gang-Ryung Uh

A new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with respect to energy usage. We propose static pipelining as a new instruction set architecture to enable more efficient instruction flow through the pipeline, which is accomplished by exposing the pipeline structure to the compiler. While this approach simplifies hardware pipeline requirements, significant modifications to the compiler are required. This paper describes the code generation and compiler optimizations we implemented to exploit the features of this architecture. We show that …

Go to article

Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh Sep 2011

Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh

Gang-Ryung Uh

It is advantageous to not only calculate the WCET of an application, but to also perform transformations to reduce the WCET since an application with a lower WCET will be less likely to violate its timing constraints. In this paper we describe an environment consisting of an interactive compilation system and a timing analyzer, where a user can interactively tune the WCET of an application. After each optimization phase is applied, the timing analyzer is automatically invoked to calculate the WCET of the function being tuned. Thus, a user can easily gauge the progress of reducing the WCET. In addition, …

Go to article

Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek Sep 2011

Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek

Gang-Ryung Uh

To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often prevent the modulo scheduler from achieving ideal loop initiation intervals. As a result, replicated functional units in multi-issue DSPs are frequently underutilized. In response to this resource underutilization problem, this paper describes a compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2. The core of the proposed techniques lies in the direct relaxation of cyclic …

Go to article

Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari Sep 2011

Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari

Gang-Ryung Uh

Robust and powerful software instrumentation tools are essential for dynamic program analysis tasks such as profiling, performance evaluation, and bug detection. Dynamic binary instrumentation (DBI) is a general purpose technique that eases the development of program analysis tools by facilitating automatic low-level instrumentation. DBI-based program analysis can introduce high overhead and it is crucial for tool writers to minimize the cost. Analyzing the performance of instrumentation tools is challenging because most systems use a just-in-time compiler (JIT) to dynamically generate code. In this paper, we describe our method for analyzing the performance of instrumentation tools. The instrumented code is itself …

Go to article

Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao Feb 2000

Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao

Gang-Ryung Uh

A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times. Unlike loop unrolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson

Gang-Ryung Uh

Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh

Gang-Ryung Uh

Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek

Gang-Ryung Uh

Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari

Gang-Ryung Uh

Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao

Gang-Ryung Uh