We explore the question: "Can a fine-grain multi-threaded architecture form the basis for an efficient, VLIW style, statically scheduled architecture?'' We illustrate that operations comprising a VLIW instruction can indeed be viewed as belonging to separate threads, such that the number of such operations is equivalent to the number of threads representing the program's semantics. On the other hand, a more efficient synchronization mechanism than data synchronization is needed to realize the lock-step execution model of VLIW processors. This synchronization is accomplished through the instruction space, by using a small number of bits in each instruction under the compiler control. We call the resulting architecture a “Synchronized Lane Architecture (SLA)”

The SLA approach makes embedding of no-operations in the code unnecessary in the majority of the cases, and the architecture can dynamically adapt to changing levels of ILP, while permitting code compiled for a narrow-width processor to run unmodified on a larger width processor.

We present this novel architecture paradigm, as well as the mechanism of transforming traditional VLIW code so that it can be executed by our SLA processor. We provide an evaluation of the new paradigm with respect to a conventional VLIW architecture, and demonstrate that the SLA approach delivers similar levels of performance to that of VLIW processors while providing significant energy and code size savings.

Tue 16 Jun

Displayed time zone: Mountain Time (US & Canada) change

13:40 - 15:20
Session 4: Specialized Hardware and Accelerator DesignLCTES at Flatirons 3
Chair(s): Jongouk Choi University of Central Florida
13:40
22m
Talk
Can Fine-Grain Multi-threading Subsume VLIW?
LCTES
Scott Pomerville Northern Michigan University, Soner Onder Michigan Technological University, Gang-Ryung Uh Florida State University, David Whalley Florida State University
DOI
14:02
22m
Talk
Sirop: A Small IR for HLS with Parallel PatternsResults ReproducedArtifacts AvailableArtifacts Evaluated
LCTES
Louis Hildebrand McGill University, Christophe Dubach McGill University
DOI
14:24
22m
Talk
A Functional Approach to Synthesizing Routable Programmable Accelerators for Neural NetworksResults ReproducedArtifacts AvailableArtifacts Evaluated
LCTES
Tzung-Han Juang McGill University, Paul Teng McGill University, Canada, Christophe Dubach McGill University
DOI
14:46
22m
Talk
LoopHint: A Compiler-Assisted Loop Branch Predictor for Embedded DSPsRemote
LCTES
Yuanyang Xiang Institute of Automation, Chinese Academy of Sciences, Chen Xu , xiaoruozhou Institute of Automation, Chinese Academy of Sciences, Zhiwei Zhang Institute of Automation, Chinese Academy of Sciences
DOI