MAERI 2.0 Tutorial @ ICS 2022

Overview

In this tutorial, we will educate the research community about the challenges in the emerging domain of Neural Network Accelerator, demonstrate the capabilities of MAERI 2.0 with examples and discuss ongoing development efforts.

Date

  • Jun 27 th, 2022, 11:00 - 12:30 PM (CET) / 14:00 PM - 15:30 PM (ET)
  • Virtual event.

Organizers

Tushar Krishna Tushar Krishna (Georgia Tech) Website LinkedIn

Tushar Krishna is an Associate Professor in the School of ECE at Georgia Tech since 2015. He also holds the ON Semiconductor Endowed Junior Professorship. He has a Ph.D. in Electrical Engineering and Computer Science from MIT (2014), a M.S.E in Electrical Engineering from Princeton University (2009), and a B.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Delhi (2007). Dr. Krishna’s research spans the computing stack: from circuits/physical design to microarchitecture to system software. His key focus area is in architecting the interconnection networks and communication protocols for efficient data movement within computer systems, both on-chip and in the cloud.
Jianming Tong Jianming Tong (Georgia Tech) Website LinkedIn

Jianming Tong is a Ph.D. student at Georgia Institute of Technology starting from Spring 2021, advised by Prof. Tushar Krishna. He received my bachelor degree at Xi'an Jiaotong University at 2020 supervised by Prof. Pengju Ren. He was a Research Assistant at Tsinghua Unversity working with Prof. Yu Wang. His research interests involve on-chip network, efficient high-perform domain-specific architecture design for neural network processing and homomorphic encryption acceleratrion.

Description

Machine learning (ML) has become pervasive today. ML for science relies on leveraging Deep Neural Networks (DNNs)in scientific problems. This makes it crucial for HPC systems to run DNNs efficiently. To this end, modern HPCsystemsare increasingly deploying customized high-performance dataflow accelerators (e.g., Cerebras, Sambanova,) andFPGAsfor running next-generation DNNs. The focus of this tutorial is on demonstrating the design and deployment of a dataflowaccelerator called MAERI on FPGAs. Existing frameworks like Gemini, ESP and OpenCGRA all support end-to-end deployment of DNN modesl either throughsimulation or real FPGA. However, all of them assume a fixed underlying accelerator and could only explore mappingchoices on a fixed accelerator architecture. There is still a need for answering which hardware architecture tochooseand what parameters for the hardware is optimal for a given pattern of workloads.

As an ongoing project, we have been developing a framework, MAERI-FPGA, to deploy the software model onaflexible-template FPGA accelerator with parallelism, scheduling and hardware size as controllable knobs. Currently, MAERI-FPGA is deployed on Xilinx FPGAs (both embedded boards and PCI-E accelerator card) to support deeplearningmodel directly from Pytorch framework. To the best of our knowledge, MAERI-FPGA is the first open-sourceFPGAframework for exploring dataflow accelerator design space on real FPGA with end-to-end setup. In this tutorial, we will educate the research community about the impact of different design choices on the accelerators, demonstrate the capabilities of MAERI-FPGA with examples and discuss ongoing development efforts.

Target Audience:

The tutorial targets students, faculty, and researchers who want to

  • understand how to design DNN accelerators, or
  • study performance implications of dataflow mapping strategies, or
  • prototype a DNN accelerator (on ASIC of FPGA)

Schedule

Time (EDT) Time (ET) Agenda Presenter Resources
14:00 14:30 Introduction and Background of NN Accelerator Tushar Slide
    Overview of Neural Network Inference Workload    
    Dataflow and Mapping in Neural Network    
    DNN Accelerator Flexibility    
14:30 14:40 break    
14:40 15:10 MAERI 2.0: Flexible NN Accelerator Framework Jianming Slide
    Supported Neural Network Model    
    Quantization Flow    
    Memory Layout    
    Heterogeneous Scheduling    
    MAERI 2.0 Microarchitecture    
15:10 15:30 Demo Jianming Slide
    DEMO 1: Deploy MAERI 2.0 on Xilinx Embedded FPGA Board    
    DEMO 2: Explore Performance Benefit of on-chip Buffer using MAERI 2.0    

Resources

Installation

Installation

Repositories

To be released, send based on the request. Please email jianming.tong@gatech.edu if you need the source code. MAERI-verilog

Core Developers

  • Jianming Tong (jianming.tong@gatech.edu, Georgia Tech)
  • Yangyu Chen (yangyuchen@gatech.edu, Georgia Tech)

Additional Contributors

  • Tushar Krishna (tushar@ece.gatech.edu, Georgia Tech)
  • Abhimanyu Bambhaniya (abambhaniya3@gatech.edu, Georgia Tech)
  • Taekyung Heo (taekyung@gatech.edu, Georgia Tech)
  • Yue Pan (ypan331@gatech.edu, Georgia Tech)