Home
Publications
Talks
Projects
Recognition
Experience
Posts
2022
[IPDPSw] Machine Learning for CUDA+MPI Design Rules
3/4/2022
2021
[HPDC] TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes
6/24/2021
[Ph.D Dissertation] Movement and Placement of Non-Contiguous Data In Distributed GPU Computing
4/20/2021
2020
[HPEC] At-Scale Sparse Deep Neural Network Inference With Efficient GPU Implementation
9/23/2020
[iWAPT] Node-Aware Stencil Communication on Heterogeneous Supercomputers
3/9/2020
2019
[HPEC] Accelerating Sparse Deep Neural Networks on FPGAs
9/26/2019
[HPEC] Update on Triangle Counting on GPU
8/22/2019
[HPEC] Update on k-truss Decomposition on GPU
8/22/2019
[ICPE'19] Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects
4/9/2019
2018
[HPEC] Collaborative (CPU+ GPU) Algorithms for Triangle Counting and Truss Decomposition
9/25/2018
[tech report] SCOPE: C3SR Systems Characterization and Benchmarking Framework
9/18/2018
[IWOPH] NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems
6/28/2018
[thesis] Heterogeneous Application and System Modeling
6/25/2018
[IPDPS] A Fast and Massively-Parallel Solver for Multiple-Scattering Tomographic Image Reconstruction
5/21/2018
2017
[ICRC] Rebooting the Data Access Hierarchy of Computing Systems
11/18/2017
[CEM] Thoughts on Massively-Parallel Heterogeneous Computing for Solving Large Problems
6/21/2017
[CEM] Scalable Parallel DBIM Solutions of Inverse-Scattering Problems
6/21/2017
[CEM] Comparative Performance Evaluation of Multi-GPU MLFMM Implementation for 2-D VIE Problems
6/21/2017
[IPDPSw] RAI: A Scalable Project Submission System for Parallel Programming Courses
5/29/2017
[ACES] Large Inverse-Scattering Solutions with DBIM on GPU-Enabled Supercomputers
3/28/2017
2016
[IPDPSw] WebGPU: A Scalable Online Development Platform for GPU Programming Courses
5/23/2016
2014
[MES] Adaptive Cache Bypass and Insertion for Many-Core Accelerators
6/1/2014