Professor Guy Lemieux

I am a Professor in the Department of Electrical and Computer Engineering at the University of British Columbia in Vancouver, British Columbia, Canada.

My research is concerned with programmable chips known as FPGAs, which is short for Field-Programmable Gate Arrays. These are universal chips, capable of emulating any other digital chip. Of course, this emulation capability comes with some overhead in the form of cost and performance -- a key goal of my research is to drive down the cost as well as improve the speed and power dissipation of these chips. I have done this through optimization at various levels including the transistor-level design and architecture (internal organization) of the device, as well as the CAD tools that map circuits into the device.

My latest work focuses improving designer productivity, primarily by making FPGAs easier to use. I'm especially interested in compute-oriented applications. FPGAs have a reputation for being very difficult to "program", particularly among software-oriented designers, the key users in compute-oriented applications. To help, I am a strong advocate for the use of overlay architectures, which are digital circuits built on top of FPGAs that make them easier to program. Overlays are like a new type of FPGA, in that they themselves are programmable, but they are more application-specific and have fewer users, making them unlikely to be built as custom chips. Nevertheless, my research has shown that regular C programs can be easily mapped to processor-like overlays, and they can be accelerated significantly.

I co-founded VectorBlox Computing which was acquired by Microchip in September 2019. At VectorBlox, we designed a vector accelerator system known as the VectorBlox MXP (MatriX Processor) that operates directly on 1D, 2D and 3D tensors. MXP provides four key architectural features that provide gains in efficiency: scratchpad, hardware DMA, sub-word SIMD, and custom instructions. Instead of a traditional named vector register, MXP uses an addressable scratchpad. Being addressable, vectors are simply pointers in C; this allows any number of vectors of arbitrary length to be formed without any internal fragmentation, reduces the need for data duplication and data movement, and allows use of a stack-based ABI for nesting accelerated vector functions. Hardware DMA makes efficient use of a wide, dedicated path to external memory for transfering 1D and 2D tensors and operates concurrently with computation. Sub-word SIMD provides increased parallelism for operations on byte and halfword elements. Custom instructions make it easy to attach highly pipelined hardware into a C-programmed environment, where the hardware design effort focuses on data operations not on data storage/staging/movement (which is left within C). Furthermore, the MXP is fully portable, allowing the use of almost any host processor and almost any C compiler without the need for any compiler modifications. We measured nearly 10,000 times speedup on an N-body physics problem. With this level of acceleration, compiler autovectorization is almost useless because it demands careful planning of code structure and data layout which are best done manually using compiler intrinsics.

My work on interconnect design for FPGAs resulted in a book, published in November 2003. I received a Best Paper Award at the 2004 IEEE International Conference on Field-Programmable Technology. My 2001 paper Using Sparse Crossbars within LUT Clusters is included as part of FPGA20, the Top 25 contributions in the First 20 Years of the International Symposium on FPGAs between 1992 and 2011.

Some of my past work on multiprocessing can be found at the University of Toronto.

Affiliations

UBC System-on-Chip Research Group
Institute for Computing, Information & Cognitive Systems (ICICS)
CMC Microsystems
IEEE
ACM
Conference organization and program committees...
IEEE International Conference on Field-Programmable Custom Computing Machines (FCCM)
International Conference on Field-Programmable Technology (ICFPT)
International Symposium on Field-Programmable Logic and Applications (FPL)

Other Distinctions...
Associate Editor, Hindawi International Journal of Reconfigurable Computing
IEEE Senior Member (December 2007)
ACM Senior Member (December 2009)
Registered Professional Engineer with Association of Professional Engineers and Geoscientists of BC (APEGBC)

Students

Current Students

Degree	Name	Email	Graduation	Thesis Topic
Ph.D.	Zhonghua (Sebastian) Zhou	tbd	est. 2021	Machine Learning for ASIC Routing
M.A.Sc.	Mariko Tatsumi	tbd	est. 2021	Machine Learning on FPGAs
M.A.Sc.	Fredy Augusto Maciel Alves	tbd	est. 2021	Processor Design for FPGAs
M.A.Sc.	Caroline White	tbd	est. 2021	tbd
M.A.Sc.	John Deppe	tbd	est. 2021	tbd; co-supervised with Mieszko Lis

Completed Students

Degree	Name	Current Position	Completion	Thesis / Project
M.Sc.	May Young	Vancouver	April 2020	Dynamic Race Detection for Non-Coherent Accelerators pdf primary supervisor was Alan Hu
Ph.D.	Hossein Omidian	Xilinx, San Jose	October 2018	Automated Space/Time Scaling of Streaming Task Graphs on Field Programmable Gate Arrays pdf
M.A.Sc.	Maximilian Golub	Mercedes-Benz, Seattle	August 2018	DropBack: Continuous Pruning During Deep Neural Network Training pdf
M.A.Sc.	Joseph Edwards	VectorBlox Computing, Vancouver	July 2018	Real-time Computer Vision in Software using Custom Vector Overlays pdf
M.Eng.	Nathan van Woudenberg	Programming + Machine Learning Support in ECE Robotics Control Lab, UBC	May 2016	n/a
M.Eng.	Gene Lai	unknown	May 2016	n/a
Ph.D.	Ameer Abdelhadi	ameer	June 2016	Architecture of Block-RAM-Based Massively Parallel Memory Structures: Multi-Ported Memories and Content-Addressable Memories pdf
M.A.Sc.	Keith Lee	Gumstix Inc., Vancouver	January 2016	The DEVBOX development environment: an environment for introducing Verilog to young students pdf video demo
M.Eng.	Danting Li	unknown	December 2015	n/a
Ph.D.	Aaron Severance	VectorBlox Computing Inc.	March 2015	Broadening the Applicability of FPGA-based Soft Vector Processors pdf
M.A.Sc.	Michael (Xi) Yue	unknown	October 2014	Rapid Overlay Building for FPGAs pdf
M.Eng.	Douglas (Hak Hian) Sim	Recon Instruments	May 2014	n/a
M.A.Sc.	Alex Brant	Altera Toronto	November 2012	Coarse and Fine Grain Programmable Overlay Architectures for FPGAs pdf Please check out the open source repository for the ZUMA FPGA Overlay
M.A.Sc.	Zhiduo Liu	upon graduation: Altera San Jose currently: Google, CA	September 2012	Accelerator Compiler for the VENICE Vector Processor pdf
M.A.Sc.	Chris Wang	upon graduation: Xilinx, CA currently: Google, CA	October 2011	Scalable and Deterministic Timing-driven Parallel Placement for FPGAs pdf
Ph.D.	David Grant	Altera Toronto	August 2011	CAD Algorithms and Performance of Malibu: An FPGA with Time-Multiplexed Coarse-Grained Elements pdf
Ph.D.	Usman Ahmed	Altera Toronto	April 2011	Impact of custom interconnect masks on cost and performance of structured ASICs pdf co-supervised with Steve Wilton
M.A.Sc.	Chris Chou	PMC-Sierra	April 2010	VIPERS II: A Soft-core Vector Processor with Single-copy Scratchpad Memory pdf
M.A.Sc.	Darius Chiu	Independent	Sept 2009	Congestion-driven Re-clustering CAD Flow for Low-cost FPGAs pdf
M.A.Sc.	Johnny Ho	upon graduation: Ixia, CA next: Quantlab, CA currently: Microsoft, WA	Sept 2009	PERG-Rx: An FPGA-based pattern-matching engine with limited regular expression support for large pattern databases pdf
M.A.Sc.	Patrick Dong	Xilinx San Jose	Sept 2009	Period and Glitch Reduction via Clock Skew Scheduling, Delay Padding and GlitchLess pdf
M.A.Sc.	Paul Teehan	upon graduation: Ph.D. student, UBC next: EnerNOC currently: Travel Audience, Germany	October 2008	Reliable High-throughput FPGA Interconnect using Source-synchronous Surfing and Wave Pipelining pdf
Ph.D.	Mehdi Alimadadi	Linear Technology	July 2008	Recycling Clock Network Energy in High-performance Digital Designs using On-chip DC-DC Converters pdf 90nm chip layout (4MB bitmap) co-supervised with Patrick Palmer
M.A.Sc.	Jason Yu	Intel Canada	May 2008	Vector Processing as a Soft-CPU Accelerator pdf
M.Eng.	Eric Lai	Amazon.com	April 2008	n/a
M.A.Sc.	Mark Yamashita	upon graduation: IBM Canada next: Oxford MBA currently: at large	November 2007	A Combined Clustering and Placement Algorithm for FPGAs pdf
M.Eng.	Shirley Ma	McKesson Canada	December 2007	n/a
M.Eng.	David Yeager	upon graduation: IBM Canada currently: Dynimize	December 2006	Interconnect Estimation for FPGAs pdf
M.A.Sc.	David Leong	Nokia Canada	December 2006	Incremental Placement for FPGAs pdf
M.Eng.	Wilson Lo	unknown	November 2006	Power Model for Small Custom Embedded Memories supervised by André Ivanov
M.A.Sc.	Edmund Lee	Altera Toronto	Summer 2006	Interconnect Driver Design for Long Wires in Field-Programmable Gate Arrays pdf co-supervised with Shahriar Mirabbasi
M.A.Sc.	Marvin Tom	Major Tech Firm in USA	Spring 2006	Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Grate Arrays pdf
M.A.Sc.	Anthony Yu	Intel Canada	Fall 2005	Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy pdf
M.A.Sc.	Victor Aken'Ova	PMC-Sierra	Spring 2005	Bridging the Gap between Soft and Hard eFPGA Design pdf supervised by Resve Saleh

e-mail addresses above are @ece.ubc.ca (unless otherwise noted)

Funding

Financial support and donations from the following organizations is gratefully acknowledged.

Links

You might also enjoy my old home pages as a graduate student at the University of Toronto.

Try searching Library and Archives Canada . They have Canadian theses and other publications.

If you are looking for information about my book, try here

According to data in this list, my Erdős number is 4.
Lemieux → Sevcik → Klawe → Erdős

Professor Guy Lemieux

Publications

Google Scholar Profile

ACM Digital Library Author Profile

Downloads

Contact Information

Links to Commercial FPGA/FPGA-like Vendors

Affiliations

Students

Funding

Links