publications

Please find the complete list here. Recent selected publications are shown below.

2024

  1. Amanda: Unified Instrumentation Framework for Deep Neural Networks
    Yue Guan , Yuxian Qiu , Jingwen Leng , Fan Yang, Shuo Yu , Yunxin Liu , Yu Feng , Yuhao Zhu , Lidong Zhou , Yun Liang , Chen Zhang , Chao Li , and Minyi Guo
    In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’24 , 2024
  2. Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation
    Guodong Liu , Youshan Miao , Zhiqi Lin , Xiaoxiang Shi , Saeed Maleki , Fan Yang, Yungang Bao , and Sa Wang
    In Proceedings of the Nineteenth European Conference on Computer Systems, EuroSys , 2024
  3. Tessel: Boosting Distributed DNN Execution with Flexible Schedule Search
    Zhiqi Lin , Youshan Miao , Guanbin Xu , Cheng Li , Olli Saarikivi , Saeed Maleki , and Fan Yang
    In 30th International Symposium on High-Performance Computer Architecture, HPCA , 2024

2023

  1. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity
    Qianxi Zhang , Shuotao Xu , Qi Chen , Guoxin Sui , Jiadong Xie , Zhizhen Cai , Yaoqi Chen , Yinxuan He , Yuqing Yang , Fan Yang, Mao Yang , and Lidong Zhou
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2023
  2. Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning
    Chen Zhang , Lingxiao Ma , Jilong Xue , Yining Shi , Ziming Miao , Fan Yang, Jidong Zhai , Zhi Yang , and Mao Yang
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2023
  3. Welder: Scheduling Deep Learning Memory Access via Tile-graph
    Yining Shi , Zhi Yang , Jilong Xue , Lingxiao Ma , Yuqing Xia , Ziming Miao , Yuxiao Guo , Fan Yang, and Lidong Zhou
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2023
  4. Optimizing Dynamic Neural Networks with Brainstorm
    Weihao Cui , Zhenhua Han , Lingji Ouyang , Yichuan Wang , Ningxin Zheng , Lingxiao Ma , Yuqing Yang , Fan Yang, Jilong Xue , Lili Qiu , Lidong Zhou , Quan Chen , Haisheng Tan , and Minyi Guo
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2023
  5. PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
    Ningxin Zheng , Huiqiang Jiang , Quanlu Zhang , Zhenhua Han , Lingxiao Ma , Yuqing Yang , Fan Yang, Chengruidong Zhang , Lili Qiu , Mao Yang , and Lidong Zhou
    In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP , 2023
  6. SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
    Yuming Xu , Hengyu Liang , Jin Li , Shuotao Xu , Qi Chen , Qianxi Zhang , Cheng Li , Ziyue Yang , Fan Yang, Yuqing Yang , Peng Cheng , and Mao Yang
    In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP , 2023
  7. SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters
    Hanyu Zhao , Zhenhua Han , Zhi Yang , Quanlu Zhang , Mingxia Li , Fan Yang, Qianxi Zhang , Binyang Li , Yuqing Yang , Lili Qiu , Lintao Zhang , and Lidong Zhou
    In Proceedings of the Eighteenth European Conference on Computer Systems, EuroSys , 2023
  8. ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
    Diandian Gu , Yihao Zhao , Yinmin Zhong , Yifan Xiong , Zhenhua Han , Peng Cheng , Fan Yang, Gang Huang , Xin Jin , and Xuanzhe Liu
    In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS , 2023
  9. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
    Cong Guo , Jiaming Tang , Weiming Hu , Jingwen Leng , Chen Zhang , Fan Yang, Yunxin Liu , Minyi Guo , and Yuhao Zhu
    In Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA , 2023
  10. On Modular Learning of Distributed Systems for Predicting End-to-End Latency
    Chieh-Jan Mike Liang , Zilin Fang , Yuqing Xie , Fan Yang, Zhao Lucis Li , Li Lyna Zhang , Mao Yang , and Lidong Zhou
    In 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI , 2023
  11. Model-enhanced Vector Index
    Hailin Zhang , Yujing Wang , Qi Chen , Ruiheng Chang , Ting Zhang , Ziming Miao , Yingyan Hou , Yang Ding , Xupeng Miao , Haonan Wang , Bochen Pang , Yuefeng Zhan , Hao Sun , Weiwei Deng , Qi Zhang , Fan Yang, Xing Xie , Mao Yang , and Bin CUI
    In Advances in Neural Information Processing Systems, NeurIPS , 2023

2022

  1. SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation
    Cong Guo , Yuxian Qiu , Jingwen Leng , Xiaotian Gao , Chen Zhang , Yunxin Liu , Fan Yang, Yuhao Zhu , and Minyi Guo
    In The Tenth International Conference on Learning Representations, ICLR , 2022
  2. ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
    Cong Guo , Chen Zhang , Jingwen Leng , Zihan Liu , Fan Yang, Yunxin Liu , Minyi Guo , and Yuhao Zhu
    In 55th IEEE/ACM International Symposium on Microarchitecture, MICRO , 2022
  3. ROLLER: Fast and Efficient Tensor Compilation for Deep Learning
    Hongyu Zhu , Ruofan Wu , Yijia Diao , Shanbin Ke , Haoyu Li , Chen Zhang , Jilong Xue , Lingxiao Ma , Yuqing Xia , Wei Cui , Fan Yang, Mao Yang , Lidong Zhou , Asaf Cidon , and Gennady Pekhimenko
    In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2022
  4. PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
    Wei Zhang , Binghao Chen , Zhenhua Han , Quan Chen , Peng Cheng , Fan Yang, Ran Shu , Yuqing Yang , and Minyi Guo
    In 2022 USENIX Annual Technical Conference, USENIX ATC , 2022
  5. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
    Ningxin Zheng , Bin Lin , Quanlu Zhang , Lingxiao Ma , Yuqing Yang , Fan Yang, Yang Wang , Mao Yang , and Lidong Zhou
    In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2022

2020

  1. Capuchin: Tensor-based GPU Memory Management for Deep Learning
    Xuan Peng , Xuanhua Shi , Hulin Dai , Hai Jin , Weiliang Ma , Qian Xiong , Fan Yang, and Xuehai Qian
    In ASPLOS ’20: Architectural Support for Programming Languages and Operating Systems, ALPLOS , 2020
  2. HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees
    Hanyu Zhao , Zhenhua Han , Zhi Yang , Quanlu Zhang , Fan Yang, Lidong Zhou , Mao Yang , Francis C. M. Lau , Yuqi Wang , Yifan Xiong , and Bin Wang
    In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2020
  3. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
    Lingxiao Ma , Zhiqiang Xie , Zhi Yang , Jilong Xue , Youshan Miao , Wei Cui , Wenxiang Hu , Fan Yang, Lintao Zhang , and Lidong Zhou
    In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2020
  4. Retiarii: A Deep Learning Exploratory-Training Framework
    Quanlu Zhang , Zhenhua Han , Fan Yang, Yuge Zhang , Zhe Liu , Mao Yang , and Lidong Zhou
    In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2020

2019

  1. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
    Myeongjae Jeon , Shivaram Venkataraman , Amar Phanishayee , Junjie Qian , Wencong Xiao , and Fan Yang
    In 2019 USENIX Annual Technical Conference, USENIX ATC , 2019

2018

  1. Gandiva: Introspective Cluster Scheduling for Deep Learning
    Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , Fan Yang, and Lidong Zhou
    In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI , 2018