Producing optimized accelerators is tedious, as even modern HDLs (Hardware Description Languages) such as Chisel, require reasoning about low-level concepts. Recent functional approaches, such as Aetherling and SHIR, treat hardware as composition of pure operators. This raises the abstraction level, allowing for systematic optimizations through rewriterules for FPGAs (Field Programmable Gate Arrays).

These approaches have so far been limited to small, fixed-function accelerators. Recent work maps neural networks to FPGAs by sharing coarse-grained functions via the Let construct. However, as the number of call sites or parallelism increases, synthesis fails due to increased routing congestion.

These limitations are addressed with a new way to express sharing in a functional IR (Intermediate Representation). By combining the Reduce and SwitchApply primitives over an instruction stream, functions become programmable, with shared control logic and a datapath, reducing routing pressure. Upper-bounded streams further enable sharing across varying input sizes. Across networks from LeNet 5 to ResNet, the resulting FPGA designs remain routable, delivering high performance with speedups between 1.1×–3.4× compared to prior work.

Tue 16 Jun

Displayed time zone: Mountain Time (US & Canada) change

13:40 - 15:20
Session 4: Specialized Hardware and Accelerator DesignLCTES at Flatirons 3
Chair(s): Jongouk Choi University of Central Florida
13:40
22m
Talk
Can Fine-Grain Multi-threading Subsume VLIW?
LCTES
Scott Pomerville Northern Michigan University, Soner Onder Michigan Technological University, Gang-Ryung Uh Florida State University, David Whalley Florida State University
DOI
14:02
22m
Talk
Sirop: A Small IR for HLS with Parallel PatternsResults ReproducedArtifacts AvailableArtifacts Evaluated
LCTES
Louis Hildebrand McGill University, Christophe Dubach McGill University
DOI
14:24
22m
Talk
A Functional Approach to Synthesizing Routable Programmable Accelerators for Neural NetworksResults ReproducedArtifacts AvailableArtifacts Evaluated
LCTES
Tzung-Han Juang McGill University, Paul Teng McGill University, Canada, Christophe Dubach McGill University
DOI
14:46
22m
Talk
LoopHint: A Compiler-Assisted Loop Branch Predictor for Embedded DSPsRemote
LCTES
Yuanyang Xiang Institute of Automation, Chinese Academy of Sciences, Chen Xu , xiaoruozhou Institute of Automation, Chinese Academy of Sciences, Zhiwei Zhang Institute of Automation, Chinese Academy of Sciences
DOI