New Course on Accelerator Architectures
This spring I am introducing a new grad course! Details below.
EECE 571T  Compute Accelerator Architectures
Thursdays from 9:30 to 12:30 in FSC 1402 (first meeting Jan 10)
Summary
With the approaching end of Moore’s Law computer systems developers are being confronted with the challenge of increasing computing performance without using faster and more plentiful transistors. This course explores the leading approach to tackling this problem that has emerged in both industry and academic research: The use of computation accelerator architectures.
This course will provide students a foundation for understanding both programmable and fixed function accelerator architectures. The initial portion of the course will involve discussion of graphics processor units which are commonly used today for training deep neural networks. The later portion of the course will focus on more specialized accelerators with an emphasis on machine learning accelerators.
The course will involve programming assignments (to get familiar with using computer architecture simulators), research paper readings and presentations and a final project.
Evaluation
Assignments 15% Weekly Paper Reading Quizzes 25% Presentations 20% Project 40%
Topics

Course overview
 Review of Computer Architecture
 Instructions
 Pipelining
 caches
 memory and memory access scheduling
 multi core and multi threading
 Graphics processor unit architectures
 GPU programming model
 GPU instruction set architecture
 one loop approximation (multithreading and SIMT model)
 two loop approximation (register scoreboard, operand collector)
 three loop approximation (caches, pending memory request tables, memory controller)
 introduction to the GPGPUSim simulator
 Machine learning accelerators:
 Brief review of deep neural networks
 Linear regression and classification
 Single layer networks
 Multilayer networks and back propagation
 Convolutional neural networks
 Survey of some recent deep networks  Inference acceleration architecture
 Approximation (bit width reduction)
 Ineffectual computations (skipping multiplication by zero)
 Memory organization for faster acceleration
 Industry examples (whatever is publicly known about them)  Semiprogrammable ML accelerators
 Other compute accelerators:
 Media encoders/decoders (e.g., H264)
 Network switches and network processors
 Digital signal pr)ocessor1