Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 17 of 17
Full-Text Articles in Physical Sciences and Mathematics
Scheduling Instruction Effects For A Statically Pipelined Processor, F. Rasapour, G. Cook, G.-R. Uh
Scheduling Instruction Effects For A Statically Pipelined Processor, F. Rasapour, G. Cook, G.-R. Uh
Gang-Ryung Uh
Statically pipelined processors have a fully exposed datapath where all portions of the pipeline are directly controlled by effects within an instruction, which simplifies hardware and enables a new level of compiler optimizations. This paper describes an effect scheduling strategy to aggressively compact instructions, which has a critical impact on code size and performance. Unique scheduling challenges include more frequent name dependences and fewer renaming opportunities due to static pipeline (SP) registers being dedicated for specific operations. We also realized the SP in a hardware implementation language (VHDL) to evaluate the real energy benefits. Despite the compiler challenges, we achieve …
Instruction Re-Selection For Iterative Modulo Scheduling On High Performance Multi-Issue Dsps, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek
Instruction Re-Selection For Iterative Modulo Scheduling On High Performance Multi-Issue Dsps, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek
Gang-Ryung Uh
An iterative modulo scheduling is very important for compilers targeting high performance multi-issue digital signal processors. This is because these processors are often severely limited by idle state functional units and thus the reduced idle units can have a positively significant impact on their performance. However, complex instructions, which are used in most recent DSPs such as mac, usually increase data dependence complexity, and such complex dependencies that exist in signal processing applications often restrict modulo scheduling freedom and therefore, become a limiting factor of the iterative modulo scheduler.
In this work, we propose a technique that efficiently reselects instructions …
Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Själander, Gary Tyson
Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Själander, Gary Tyson
Gang-Ryung Uh
A new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with respect to energy usage. We propose static pipelining as a new instruction set architecture to enable more efficient instruction flow through the pipeline, which is accomplished by exposing the pipeline structure to the compiler. While this approach simplifies hardware pipeline requirements, significant modifications to the compiler are required. This paper describes the code generation and compiler optimizations we implemented to exploit the features of this architecture. We show that …
Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson
Improving Processor Efficiency By Statically Pipelining Instructions, Ian Finlayson, Brandon Davis, Peter Gavin, Gang-Ryung Uh, David Whalley, Magnus Sjalander, Gary Tyson
Gang-Ryung Uh
A new generation of applications requires reduced power consumption without sacrificing performance. Instruction pipelining is commonly used to meet application performance requirements, but some implementation aspects of pipelining are inefficient with respect to energy usage. We propose static pipelining as a new instruction set architecture to enable more efficient instruction flow through the pipeline, which is accomplished by exposing the pipeline structure to the compiler. While this approach simplifies hardware pipeline requirements, significant modifications to the compiler are required. This paper describes the code generation and compiler optimizations we implemented to exploit the features of this architecture. We show that …
An Overview Of Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson
An Overview Of Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson
Gang-Ryung Uh
A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate …
Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh
Tuning The Wcet Of Embedded Applications, Wankang Zhao, Prasad Kulkarni, David Whalley, Christopher Healy, Frank Mueller, Gang-Ryung Uh
Gang-Ryung Uh
It is advantageous to not only calculate the WCET of an application, but to also perform transformations to reduce the WCET since an application with a lower WCET will be less likely to violate its timing constraints. In this paper we describe an environment consisting of an interactive compilation system and a timing analyzer, where a user can interactively tune the WCET of an application. After each optimization phase is applied, the timing analyzer is automatically invoked to calculate the WCET of the function being tuned. Thus, a user can easily gauge the progress of reducing the WCET. In addition, …
Branch Elimination Via Multi-Variable Condition Merging, William Kreahling, David Whalley, Mark Bailey, Xin Yuan, Gang-Ryung Uh, Robert Van Engelen
Branch Elimination Via Multi-Variable Condition Merging, William Kreahling, David Whalley, Mark Bailey, Xin Yuan, Gang-Ryung Uh, Robert Van Engelen
Gang-Ryung Uh
Conditional branches are expensive. Branches require a significant percentage of execution cycles since they occur frequently and cause pipeline flushes when mispredicted. In addition, branches result in forks in the control flow, which can prevent other code-improving transformations from being applied. In this paper we describe profile-based techniques for replacing the execution of a set of two or more branches with a single branch on a conventional scalar processor. First, we gather profile information to detect the frequently executed paths in a program. Second, we detect sets of conditions in frequently executed paths that can be merged into a single …
Code Optimizations For A Vliw-Style Network Processing Unit, Jinhwan Kim, Yunheung Paek, Gang-Ryung Uh
Code Optimizations For A Vliw-Style Network Processing Unit, Jinhwan Kim, Yunheung Paek, Gang-Ryung Uh
Gang-Ryung Uh
The explosive growth in network bandwidth and Internet services such as QoS (quality of service) and SLA (service level agreement) monitoring have created the need for new networking hardware called a Network Processing Unit (NPU). In order to rapidly reconfigure the NPU for frequently varying Internet services and technologies, a high-performance C compiler is urgently needed. Several code generation techniques, which are intended to meet the high code quality demands of other types of application specific instruction-set processors (ASIPs) like digital signal processors (DSPs), have already been developed. However, these techniques are insufficient for NPUs due to striking architectural …
Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek
Preprocessing Strategy For Effective Modulo Scheduling On Multi-Issue Digital Signal Processors, Doosan Cho, Ravi Ayyagari, Gang-Ryung Uh, Yunheung Paek
Gang-Ryung Uh
To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often prevent the modulo scheduler from achieving ideal loop initiation intervals. As a result, replicated functional units in multi-issue DSPs are frequently underutilized. In response to this resource underutilization problem, this paper describes a compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2. The core of the proposed techniques lies in the direct relaxation of cyclic …
Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari
Analyzing Dynamic Binary Instrumentation Overhead, Gang-Ryung Uh, Robert Cohn, Bharadwaj Yadavalli, Ramesh Peri, Ravi Ayyagari
Gang-Ryung Uh
Robust and powerful software instrumentation tools are essential for dynamic program analysis tasks such as profiling, performance evaluation, and bug detection. Dynamic binary instrumentation (DBI) is a general purpose technique that eases the development of program analysis tools by facilitating automatic low-level instrumentation. DBI-based program analysis can introduce high overhead and it is crucial for tool writers to minimize the cost. Analyzing the performance of instrumentation tools is challenging because most systems use a just-in-time compiler (JIT) to dynamically generate code. In this paper, we describe our method for analyzing the performance of instrumentation tools. The instrumented code is itself …
Efficient And Effective Branch Reordering Using Profile Data, Minghui Yang, Gang-Ryung Uh, David B. Whalley
Efficient And Effective Branch Reordering Using Profile Data, Minghui Yang, Gang-Ryung Uh, David B. Whalley
Gang-Ryung Uh
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches that compare a common variable to constants. The goal is to obtain an ordering where the fewest average number of branches in the sequence will be executed. First, sequences of branches that can be reordered are detected in the control flow. Second, …
Experience With A Retargetable Compiler For A Commercial Network Processor, Jinhwan Kim, Sungjoon Jung, Yunheung Paek, Gang-Ryung Uh
Experience With A Retargetable Compiler For A Commercial Network Processor, Jinhwan Kim, Sungjoon Jung, Yunheung Paek, Gang-Ryung Uh
Gang-Ryung Uh
The Paion PPII network processor is designed to meet the growing need for new high bandwidth network equipment. In order to rapidly reconfigure the processor for frequently varying internet services and technologies, a high performance compiler is urgently needed. Albeit various code generation techniques have been proposed for DSPs or ASIPs, we experienced these techniques are not easily tailored towards the target Paion PPII processor due to striking architectural differences. First, we will show the architectural challenges posed by the target processor. Second, novel compiler techniques will be described that effectively exploit unorthogonal architectural features. The techniques include virtual data …
Improving Low Power Processor Efficiency With Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson
Improving Low Power Processor Efficiency With Static Pipelining, Ian Finlayson, Gang-Ryung Uh, David Whalley, Gary Tyson
Gang-Ryung Uh
A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate …
Lego Rover Control, Gang-Ryung Uh
Lego Rover Control, Gang-Ryung Uh
Gang-Ryung Uh
Use A Mobile Phone To Control Zigbee Enabled Harex Dynamic Natural Light Leds, Gang-Ryung Uh
Use A Mobile Phone To Control Zigbee Enabled Harex Dynamic Natural Light Leds, Gang-Ryung Uh
Gang-Ryung Uh
Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao
Techniques For Effectively Exploiting A Zero Overhead Loop Buffer, Gang-Ryung Uh, Yuhong Wang, David Whalley, Sanjay Jinturkar, Chris Burns, Vincent Cao
Gang-Ryung Uh
Improving Performance By Branch Reordering, Minghui Yang, Gang-Ryung Uh, David B. Whalley
Improving Performance By Branch Reordering, Minghui Yang, Gang-Ryung Uh, David B. Whalley
Gang-Ryung Uh