Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Physical Sciences and Mathematics

Scheduling Instruction Effects For A Statically Pipelined Processor, F. Rasapour, G. Cook, G.-R. Uh May 2016

Scheduling Instruction Effects For A Statically Pipelined Processor, F. Rasapour, G. Cook, G.-R. Uh

Gang-Ryung Uh

Statically pipelined processors have a fully exposed datapath where all portions of the pipeline are directly controlled by effects within an instruction, which simplifies hardware and enables a new level of compiler optimizations. This paper describes an effect scheduling strategy to aggressively compact instructions, which has a critical impact on code size and performance. Unique scheduling challenges include more frequent name dependences and fewer renaming opportunities due to static pipeline (SP) registers being dedicated for specific operations. We also realized the SP in a hardware implementation language (VHDL) to evaluate the real energy benefits. Despite the compiler challenges, we achieve …


Instruction Re-Selection For Iterative Modulo Scheduling On High Performance Multi-Issue Dsps, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek Jun 2015

Instruction Re-Selection For Iterative Modulo Scheduling On High Performance Multi-Issue Dsps, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek

Gang-Ryung Uh

An iterative modulo scheduling is very important for compilers targeting high performance multi-issue digital signal processors. This is because these processors are often severely limited by idle state functional units and thus the reduced idle units can have a positively significant impact on their performance. However, complex instructions, which are used in most recent DSPs such as mac, usually increase data dependence complexity, and such complex dependencies that exist in signal processing applications often restrict modulo scheduling freedom and therefore, become a limiting factor of the iterative modulo scheduler.

In this work, we propose a technique that efficiently reselects instructions …


Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Själander, Gary Tyson Sep 2013

Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Själander, Gary Tyson

Gang-Ryung Uh

A new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with respect to energy usage. We propose static pipelining as a new instruction set architecture to enable more efficient instruction flow through the pipeline, which is accomplished by exposing the pipeline structure to the compiler. While this approach simplifies hardware pipeline requirements, significant modifications to the compiler are required. This paper describes the code generation and compiler optimizations we implemented to exploit the features of this architecture. We show that …


Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson Jun 2013

Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson

Gang-Ryung Uh

A new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with respect to energy usage. We propose static pipelining as a new instruction set architecture to enable more efficient instruction flow through the pipeline, which is accomplished by exposing the pipeline structure to the compiler. While this approach simplifies hardware pipeline requirements, significant modifications to the compiler are required. This paper describes the code generation and compiler optimizations we implemented to exploit the features of this architecture. We show that …


An Overview Of Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson Dec 2011

An Overview Of Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson

Gang-Ryung Uh

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate …


Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh Sep 2011

Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh

Gang-Ryung Uh

It is advantageous to not only calculate the WCET of an application, but to also perform transformations to reduce the WCET since an application with a lower WCET will be less likely to violate its timing constraints. In this paper we describe an environment consisting of an interactive compilation system and a timing analyzer, where a user can interactively tune the WCET of an application. After each optimization phase is applied, the timing analyzer is automatically invoked to calculate the WCET of the function being tuned. Thus, a user can easily gauge the progress of reducing the WCET. In addition, …


Branch Elimination Via Multi-Variable Condition Merging, William Kreahling, David Whalley, Mark Bailey, Xin Yuan, Gang-Ryung Uh, Robert Van Engelen Sep 2011

Branch Elimination Via Multi-Variable Condition Merging, William Kreahling, David Whalley, Mark Bailey, Xin Yuan, Gang-Ryung Uh, Robert Van Engelen

Gang-Ryung Uh

Conditional branches are expensive. Branches require a significant percentage of execution cycles since they occur frequently and cause pipeline flushes when mispredicted. In addition, branches result in forks in the control flow, which can prevent other code-improving transformations from being applied. In this paper we describe profile-based techniques for replacing the execution of a set of two or more branches with a single branch on a conventional scalar processor. First, we gather profile information to detect the frequently executed paths in a program. Second, we detect sets of conditions in frequently executed paths that can be merged into a single …


Code Optimizations For A Vliw-Style Network Processing Unit, Jinhwan Kim, Yunheung Paek, Gang-Ryung Uh Sep 2011

Code Optimizations For A Vliw-Style Network Processing Unit, Jinhwan Kim, Yunheung Paek, Gang-Ryung Uh

Gang-Ryung Uh

The explosive growth in network bandwidth and Internet services such as QoS (quality of service) and SLA (service level agreement) monitoring have created the need for new networking hardware called a Network Processing Unit (NPU). In order to rapidly reconfigure the NPU for frequently varying Internet services and technologies, a high-performance C compiler is urgently needed. Several code generation techniques, which are intended to meet the high code quality demands of other types of application specific instruction-set processors (ASIPs) like digital signal processors (DSPs), have already been developed. However, these techniques are insufficient for NPUs due to striking architectural …


Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek Sep 2011

Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek

Gang-Ryung Uh

To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often prevent the modulo scheduler from achieving ideal loop initiation intervals. As a result, replicated functional units in multi-issue DSPs are frequently underutilized. In response to this resource underutilization problem, this paper describes a compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2. The core of the proposed techniques lies in the direct relaxation of cyclic …


Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari Sep 2011

Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari

Gang-Ryung Uh

Robust and powerful software instrumentation tools are essential for dynamic program analysis tasks such as profiling, performance evaluation, and bug detection. Dynamic binary instrumentation (DBI) is a general purpose technique that eases the development of program analysis tools by facilitating automatic low-level instrumentation. DBI-based program analysis can introduce high overhead and it is crucial for tool writers to minimize the cost. Analyzing the performance of instrumentation tools is challenging because most systems use a just-in-time compiler (JIT) to dynamically generate code. In this paper, we describe our method for analyzing the performance of instrumentation tools. The instrumented code is itself …


Efficient And Effective Branch Reordering Using Profile Data, Minghui Yang, Gang-Ryung Uh, David B. Whalley Aug 2011

Efficient And Effective Branch Reordering Using Profile Data, Minghui Yang, Gang-Ryung Uh, David B. Whalley

Gang-Ryung Uh

The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches that compare a common variable to constants. The goal is to obtain an ordering where the fewest average number of branches in the sequence will be executed. First, sequences of branches that can be reordered are detected in the control flow. Second, …


Experience With A Retargetable Compiler For A Commercial Network Processor, Jinhwan Kim, Sungjoon Jung, Yunheung Paek, Gang-Ryung Uh Aug 2011

Experience With A Retargetable Compiler For A Commercial Network Processor, Jinhwan Kim, Sungjoon Jung, Yunheung Paek, Gang-Ryung Uh

Gang-Ryung Uh

The Paion PPII network processor is designed to meet the growing need for new high bandwidth network equipment. In order to rapidly reconfigure the processor for frequently varying internet services and technologies, a high performance compiler is urgently needed. Albeit various code generation techniques have been proposed for DSPs or ASIPs, we experienced these techniques are not easily tailored towards the target Paion PPII processor due to striking architectural differences. First, we will show the architectural challenges posed by the target processor. Second, novel compiler techniques will be described that effectively exploit unorthogonal architectural features. The techniques include virtual data …


Improving Low Power Processor Efficiency With Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson Aug 2011

Improving Low Power Processor Efficiency With Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson

Gang-Ryung Uh

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate …


Lego Rover Control, Gang-Ryung Uh Dec 2010

Lego Rover Control, Gang-Ryung Uh

Gang-Ryung Uh

Four-wheel Lego Mars Rover that can be wirelessly controlled over Bluetooth.


Use A Mobile Phone To Control Zigbee Enabled Harex Dynamic Natural Light Leds, Gang-Ryung Uh Aug 2010

Use A Mobile Phone To Control Zigbee Enabled Harex Dynamic Natural Light Leds, Gang-Ryung Uh

Gang-Ryung Uh

Zigbee-based Wireless ad-hoc LED (Light Emitting Diode) network that can be accessed and controlled using smart phones.


Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao Feb 2000

Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao

Gang-Ryung Uh

A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times. Unlike loop unrolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common …


Improving Performance By Branch Reordering, Minghui Yang, Gang-Ryung Uh, David B. Whalley May 1998

Improving Performance By Branch Reordering, Minghui Yang, Gang-Ryung Uh, David B. Whalley

Gang-Ryung Uh

The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed can often result in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch …