Carl Pearson

Electrical and Computer Engineering Ph.D. Candidate

University of Illinois


I am a PhD candidate in the Electrical and Computer Engineering department at the University of Illinois at Urbana-Champaign and a member of the IMPACT Research Group led by Wen-Mei Hwu.

I am working on multi-GPU communication and scaling as part of the joint UIUC / IBM C3SR cognitive computing systems research center. The focus of these activities is to apply tools and techniques developed in the IMPACT group to improve the performance of real-world applications.


  • High-Performance Computing
  • GPU Communication
  • Application Acceleration


  • MS in Electrical and Computer Engineering, 2018

    University of Illinois

  • BSc with High Distinction in Engineering, 2013

    Harvey Mudd College

Recent Publications

Quickly discover relevant content by filtering publications.

At-Scale Sparse Deep Neural Network InferenceWith Efficient GPU Implementation

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge …

Node-Aware Stencil Communication on Heterogeneous Supercomputers

High-performance distributed computing systems increasingly feature nodes that have multiple CPU sockets and multiple GPUs. The …

Accelerating Sparse Deep Neural Networks on FPGAs

Deep neural networks (DNNs) have been widely adopted in many domains, including computer vision, natural language processing, and …

Update on k-truss Decomposition on GPU

In this paper, we present an update to our previous submission on k-truss decomposition from Graph Challenge 2018. For single GPU …

Update on Triangle Counting on GPU

This work presents an update to the triangle-counting portion of the subgraph isomorphism static graph challenge. This work is …


E. A. Reid Fellowship

Best Paper and ACM Artifact Evaluation Stamp for Evaluating CUDA Communication Primitives on High-Bandwidth Interconnects

Dan Vivoli Endowed Fellowship

Mavis Future Faculty Fellowship

Top-20 Poster

Teacher Ranked as Excellent by Students

Recent & Upcoming Talks

Using Nsight Compute and Nsight Systems

Introduction and walkthrough of Nvidia Nsight Compute and Nsight Systems profiling tools.

Optimizing Communication for CPU/GPU Nodes

Optimizing Multi-GPU Stencil Communication

Node-Aware Stencil Communication for Heterogeneous Supercomputers

Optimizing Multi-GPU Stencil Communication

Benchmarking CUDA Communication Primitives on High-Bandwidth Interconnects

Data-intensive applications such as machine learning and analytics have created a demand for faster interconnects to avert the memory …

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects

Data-intensive applications such as machine learning and analytics have created a demand for faster interconnects to avert the memory …

Industry Experience



University YMCA

Aug 2019 – Present Urbana, IL
Community member of the board of governors, serving as the chair of the budget committee, the Treasurer, and on the Bailey Scholarship steering committee.

Research Intern


Aug 2018 – Dec 2018 T.J. Watson Research Center, Yorktown Heights, NY

Research Intern for Optimized CLOUD Systems


Jun 2017 – Sep 2017 T.J. Watson Research Center, Yorktown Heights, NY

Research Intern

MulticoreWare, Inc.

Jun 2015 – Sep 2015 Champaign, IL

Research Intern

MulticoreWare, Inc.

Jun 2014 – Sep 2014 Champaign, IL

Co-op Engineer Floating-Point RTL


Jun 2013 – Sep 2013 Fort Collins, CO

Co-op Engineer Floating-Point RTL


Jun 2012 – Sep 2012 Fort Collins, CO



CUDA Profiling Resources

Examples, Docker images, walkthroughs, and lectures for Nvidia’s profiling tools

Stencil Library

A CUDA+MPI stencil library with automatic data placement and optimized multi-GPU communication,


A cross-platform cli app for managing graph datasets


The nim murmurhash package: a pure-nim Murmurhash implementation

GPU Neural Network for GPGPUSim

A from-scratch feed-forward network in CUDA 4.0 suitable for GPGPUSim

High-Performance Application Studies

Tools and Techniques for Code Acceleration

Graph Library

Accelerating Static Graph Operations


GPU Microbenchmarking

Teaching Tools

Software to support GPU programming classes

Academic Experience


  • 2018 Spring University of Illinois Project TA for ECE408/CS483
  • 2017 Fall University of Illinois Head TA for ECE408/CS483
  • 2017-2018 University of Illinois Mavis Future Faculty Fellow.
  • 2015 Fall University of Illinois TA for ECE408

I have been a teaching assistant for the following courses:

  • ECE408/CS483: Heterogeneous Parallel Programming at the University of Illinois
  • E155: Microprocesser-based Systems: Design & Applications at Harvey Mudd College
  • E85: Digital Electronics and Computer Architecture at Harvey Mudd College

I have also been a teaching assistant for the Programming and Tuning Massively Parallel Systems (PUMPS) summer school in Barcelona since 2014.

I have also mentored undergraduates and a high school student, who is a co-author on two papers.

During the Mavis fellowship, I administered the ECE 408 GPU programming project in spring 2018. I created

  • Four lectures on machine learning ( 1, 2, 3, 4)
  • A course project where students add a GPU convolution operator to MXNet.
  • Project kickoff slides ( repo).

I also created a set of resources on using Nvidia’s Nsight Compute and Nsight Systems performance profiling tools, including a 75 minute recorded lecture. See the Github repository to get started.


Web-based method for physical object delivery though use of 3d printing technology

Recent Posts

Improving MPI_Pack performance in CUDA-aware MPI

Improving CUDA-Aware MPI_Pack speed by 300,000x. Code available.

Nsight Systems and Nsight Compute Teaching Resources

I was invited to give a guest lecture for the Spring 2020 ECE 408 GPU programming course at the University of Illinois. This lecture covers some performance measurement techniques available in CUDA. The 75 minute lecture, available on Youtube in four parts: Slides (pdf) Part 1: Intro Part 2: CUDA Events Part 3: Nsight Compute Part 4: Nsight Systems There is also a repository cwpearson/nvidia-performance-tools which contains all the code examples used in the lecture.

PUMPS+AI 2019 Summer School

TA at PUMPS+AI 2019

Self-host GPU Continuous Integration with Azure Piplines and Docker!

Host your own GPU continuous integration pipeline with a bit of Python, Docker, and Azure Pipelines

Best Paper award at ICPE!

Best research track paper at ICPE


  • pearson@illinois.edu
  • Coordinated Science Lab, 1308 W. Main St., Urbana, IL 61801
  • email to book an appointment