Can Fine-Grain Multi-threading Subsume VLIW?
We explore the question: "Can a fine-grain multi-threaded architecture form the basis for an efficient, VLIW style, statically scheduled architecture?'' We illustrate that operations comprising a VLIW instruction can indeed be viewed as belonging to separate threads, such that the number of such operations is equivalent to the number of threads representing the program's semantics. On the other hand, a more efficient synchronization mechanism than data synchronization is needed to realize the lock-step execution model of VLIW processors. This synchronization is accomplished through the instruction space, by using a small number of bits in each instruction under the compiler control. We call the resulting architecture a “Synchronized Lane Architecture (SLA)”
The SLA approach makes embedding of no-operations in the code unnecessary in the majority of the cases, and the architecture can dynamically adapt to changing levels of ILP, while permitting code compiled for a narrow-width processor to run unmodified on a larger width processor.
We present this novel architecture paradigm, as well as the mechanism of transforming traditional VLIW code so that it can be executed by our SLA processor. We provide an evaluation of the new paradigm with respect to a conventional VLIW architecture, and demonstrate that the SLA approach delivers similar levels of performance to that of VLIW processors while providing significant energy and code size savings.
Tue 16 JunDisplayed time zone: Mountain Time (US & Canada) change
13:40 - 15:20 | Session 4: Specialized Hardware and Accelerator DesignLCTES at Flatirons 3 Chair(s): Jongouk Choi University of Central Florida | ||
13:40 22mTalk | Can Fine-Grain Multi-threading Subsume VLIW? LCTES Scott Pomerville Northern Michigan University, Soner Onder Michigan Technological University, Gang-Ryung Uh Florida State University, David Whalley Florida State University DOI | ||
14:02 22mTalk | Sirop: A Small IR for HLS with Parallel Patterns LCTES DOI | ||
14:24 22mTalk | A Functional Approach to Synthesizing Routable Programmable Accelerators for Neural Networks LCTES Tzung-Han Juang McGill University, Paul Teng McGill University, Canada, Christophe Dubach McGill University DOI | ||
14:46 22mTalk | LoopHint: A Compiler-Assisted Loop Branch Predictor for Embedded DSPsRemote LCTES Yuanyang Xiang Institute of Automation, Chinese Academy of Sciences, Chen Xu , xiaoruozhou Institute of Automation, Chinese Academy of Sciences, Zhiwei Zhang Institute of Automation, Chinese Academy of Sciences DOI | ||