AI compiler
AI compiler translates neural network into low level device code, e.g., CUDA. It plays a critical role to ensure the efficient scaling of neural network.
In the past, we developed a series of compiler techniques to advocate the tile-based abstraction on canonical deep learning compilation on SIMT based AI hardware (e.g., GPU). This includes Rammer (Ma et al., 2020), Roller (Zhu et al., 2022), Welder (Shi et al., 2023), and Cocktailer (Zhang et al., 2023). These techniques were covered in an MSR Research blog. And the tile abstraction is now well recognized in the systems community.
In addition, we correctly envisioned the importance of taking model sparsity into account in compiler techniques, and developed the first sparsity-aware compilers, SparTA (Zheng et al., 2022), PIT (Zheng et al., 2023), and nmSPARSE (Lin et al., 2023), and compilers for low-bit neural models, e.g., Ladder (Wang et al., 2024). They are all successfully unified under the tile abstraction.
Our next focus is compiler techniques for AI hardware with new architecture. For example, the more recent GPUs comes with heterogeneous hardware units, including tensor core, CUDA core, and Tensor Memory Accelerator (TMA). This introduces system opportunities that design new mechanisms that enables sophisticated compute schedule, e.g., pipelining, for extreme performance (Cheng et al., 2025). Another new hardware trend is the distributed memory architecture (i.e., non SIMT) (He et al., 2025). Meanwhile, the programming interface of neural network is an important topic related to compiler techniques. We will continue to investigate in new programming models like tile-lang and FractalTensor (Liu et al., 2024).
Interestingly, we observe that compiler techniques are also useful in distributed deep learning training and automated machine learning. Based on the observation, we developed nnScaler (Lin et al., 2024), a flexible and efficient distributed training framework, and NNI, a popular AutoML toolkit (Zhang et al., 2020).