PACT'18 tutorial: Tensor Programming with SaC

This tutorial gives a brief introduction to the high-level array language SaC and its attendant tool chain. We show how this technology can be leveraged to conveniently express tensor computations very close to abstract mathematical formulae and, at the same time, how such specifications can be mapped into efficient parallel code for heterogeneous systems that include GPUs.

We intend to combine a quick introduction with live coding session, demonstrating high-productivity and high-performance aspects of SaC by implementing a few neural network algorithms in less than two hours.

14:30 - 16:30 Getting started with array programming in SaC.
16:30 - 17:00 Coffee break
17:00 - 18:30 Live coding session “How to implement your parallel CNN from scratch in 90 minutes”
  • Login to Jupyter Hub here. Pick any username; use the password that will be announced during the tutorial. This will create a session (persists for 24 hours) where you can run notebooks or work via the terminal.
  • The notebook with code snippets from the first part of the tutorial is here
  • Full code for the CNN is here. The cnn.sac is a version without set comprehension, and it uses a few tricks to make it run faster. The cnn_template.sac replaces all the with-loops with set-comprehensions; this becomes more readable but it may have a bit more runtime overheads.

General overview of the compiler

  • Sac: A Functional Array Language for Efficient Multithreaded Execution. Clemens Grelck, Sven-Bodo Scholz (2006). International Journal of Parallel Programming 34 (4) pp. 383–427 pdf

Use cases and performance evaluations

  • Combining High Productivity and High Performance in Image Processing Using Single Assignment C on Multi-core CPUs and Many-core GPUs. V. Wieser, C. Grelck, P. Haslinger, J. Guo, F. Korzeniowski, R. Bernecky, B. Moser, S.B. Scholz (2012). Journal of Electronic Imaging 21 (2) pdf
  • SaC/C Formulations of the All-pair N-body Problem and Their Performance on SMPs and GPGPUs. A. Šinkarovs, S.B. Scholz, R. Bernecky, R. Douma, C. Grelck (2014). Concurrency and Computation: Practice and Experience 26 (4) pp. 952–971. pdf

Code generation

  • For SMPs: Shared Memory Multiprocessor Support for Functional Array Processing in SaC. Clemens Grelck (2005). Journal of Functional Programming 15 (3) pp. 353–401. pdf
  • For GPUs: Breaking the Gpu Programming Barrier with the Auto-parallelising Sac Compiler. Jing Guo, Jeyarajan Thiyagalingam, Sven-Bodo Scholz (2011). In 6th Workshop on Declarative Aspects of Multicore Programming (DAMP'11), Austin, USA. pp. 15–24. ACM Press. pdf

Some key optimisations

  • With-loop-folding in SaC — Condensing Consecutive Array Operations. Sven-Bodo Scholz (1998). In Implementation of Functional Languages, 9th International Workshop (IFL'97), St. Andrews, UK, Selected Papers. pp. 72–92. Springer. pdf
  • With-loop Scalarization: Merging Nested Array Operations. Clemens Grelck, Sven-Bodo Scholz, Kai Trojahner (2004). In Implementation of Functional Languages, 15th International Workshop (IFL'03), Edinburgh, Scotland, UK, Revised Selected Papers. Springer. pdf
  • With-loop Fusion for Data Locality and Parallelism. Clemens Grelck, Karsten Hinckfuß, Sven-Bodo Scholz (2006). In Implementation and Application of Functional Languages, 17th International Workshop (IFL'05), Dublin, Ireland, Revised Selected Papers. pp. 178–195. Springer. pdf
  • Index Vector Elimination: Making Index Vectors Affordable. Robert Bernecky, Stephan Herhut, Sven-Bodo Scholz, Kai Trojahner, Clemens Grelck, Alex Shafarenko (2007). In Implementation and Application of Functional Languages, 18th International Symposium (IFL'06), Budapest, Hungary, Revised Selected Papers. pp. 19–36. Springer. pdf

For more SaC-related publications please refer here.

Arrays have always been a key data structure in high-performance computing. With an increased interest in neural networks and deep learning, the question of effective abstractions for array programming and their translation into highly efficient parallel codes for heterogeneous systems is freshly invigorated: In the presence of cheap compute power, productivity becomes an increasingly important aspect of high-performance computing. At the same time, performance portability takes centre-stage too as the range of parallel hardware  architectures becomes increasingly heterogeneous.

The SaC approach offers a solution to this Performance-Productivity-Portability  challenge. The functional nature of the language offers very powerful declarative abstractions, yet allowing a compiler to run aggressive optimisations like auto-parallelisation, auto-offloading to accelerators, or loop optimisations which lead to code that performs on par with hand-optimised programs. The design of SaC builds on the psi-calculus as underlying array theory and aligns with a rich history of array languages such as APL, SISAL, Fortran90 or Matlab but provides a more generic substrate, suitable for conveniently expressing high-dimensional tensor operations which lie at the core of deep learning applications. Potentially, this turns SaC into an efficient DSL for deep learning.

We believe PACT provides the ideal audience for this tutorial as it primarily attracts compiler experts with practical expertise at the forefront of modern parallel architectures.

Sven-Bodo Scholz and Artjoms Šinkarovs are computer science researchers working in the area of high-performance compilers and functional languages. Sven-Bodo Scholz is the original author of the SaC compiler and Artjoms Šinkarovs is one of the main contributors.