AI hardware | Fan Yang

The future of AI depends on whether we can design next generation hardware that better supports the scaling laws. At some point of time, AI model architecture will even be influenced by the design decision on AI hardware. The codesign of AI and hardware will become norm in the future.

Here are some considerations on AI hardware.

A hardware architecture considering activation outliers in LLM inferencing. OliVe (Guo et al., 2023)
Taking the low-bit trend of LLMs into account. An adaptive numerical data type: ANT (Guo et al., 2022). Replacing MAC with Lookup Table (LUT) (Mo et al., 2025). Vector quantization (Li et al., 2025).
Future AI chips will show stronger non-uniformity. Wafer scale LLM system is coming (He et al., 2025)

References

2025

ISCA

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

Zhiwen Mo, and 10 more authors

In Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA), 2025

Bib HTML

@inproceedings{lutcore25,
  author = {Mo, Zhiwen and Wang, Lei and Wei, Jianyu and Zeng, Zhichen and Cao, Shijie and Ma, Lingxiao and Jing, Naifeng and Cao, Ting and Xue, Jilong and Yang, Fan and Yang, Mao},
  title = {LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference},
  year = {2025},
  booktitle = {Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA)},
}

HPCA

LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator

Guoyu Li, and 8 more authors

In 31st International Symposium on High-Performance Computer Architecture, HPCA, 2025

Bib HTML

@inproceedings{lutdla25,
  title = {LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator},
  author = {Li, Guoyu and Ye, Shengyu and Chen, Chunyun and Wang, Yang and Yang, Fan and Cao, Ting and Liu, Cheng and Sabry, Mohamed M and Yang, Mao},
  booktitle = {31st International Symposium on High-Performance Computer Architecture, {HPCA}},
  year = {2025}
}

OSDI

WaferLLM: A Wafer-Scale LLM Inference System

Congjie He, and 7 more authors

In 19th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2025

Bib HTML

@inproceedings{waferllm25,
  title = {WaferLLM: A Wafer-Scale LLM Inference System},
  author = {He, Congjie and Huang, Yeqi and Mu, Pei and Miao, Ziming and Xue, Jilong and Ma, Lingxiao and Yang, Fan and Mai, Luo},
  year = {2025},
  booktitle = {19th {USENIX} Symposium on Operating Systems Design and Implementation, {OSDI}},
}

2023

ISCA

OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization

Cong Guo, and 8 more authors

In Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA, 2023

Bib HTML

@inproceedings{DBLP:conf/isca/0003THL00LG023,
  title = {OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization},
  author = {Guo, Cong and Tang, Jiaming and Hu, Weiming and Leng, Jingwen and Zhang, Chen and Yang, Fan and Liu, Yunxin and Guo, Minyi and Zhu, Yuhao},
  year = {2023},
  booktitle = {Proceedings of the 50th Annual International Symposium on Computer Architecture, {ISCA}},
}

2022

MICRO

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

Cong Guo, and 7 more authors

In 55th IEEE/ACM International Symposium on Microarchitecture, MICRO, 2022

IEEE Micro Top Picks 2023 Honorable Mention Bib HTML

Highlighted as an IEEE Micro Top Picks Honorable Mention in the July/August special edition of IEEE Micro 2023

@inproceedings{DBLP:conf/micro/00030LL0LG022,
  title = {{ANT:} Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization},
  author = {Guo, Cong and Zhang, Chen and Leng, Jingwen and Liu, Zihan and Yang, Fan and Liu, Yunxin and Guo, Minyi and Zhu, Yuhao},
  year = {2022},
  booktitle = {55th {IEEE/ACM} International Symposium on Microarchitecture, {MICRO}},
}