publications

Please find the complete list here.

2024

  1. SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
    Yizhao Gao, and 7 more authors
    ArXiv, 2024
  2. RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
    Di Liu, and 13 more authors
    ArXiv, 2024
  3. LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration
    Zhiwen Mo, and 10 more authors
    ArXiv, 2024
  4. Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic Consistency
    Zenan Li, and 6 more authors
    In Advances in Neural Information Processing Systems, NeurIPS, 2024
  5. Neuro-Symbolic Data Generation for Math Reasoning
    Zenan Li, and 7 more authors
    In Advances in Neural Information Processing Systems, NeurIPS, 2024
  6. EMNLP
    Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning
    Xijie Huang, and 4 more authors
    In EMNLP (Main), 2024
  7. Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor
    Siran Liu, and 7 more authors
    In SOSP, 2024
  8. ECCV
    IRGen: Generative Modeling for Image Retrieval
    Yidan Zhang, and 13 more authors
    In ECCV, 2024
  9. WWW
    OneSparse: A Unified System for Multi-index Vector Search
    Yaoqi Chen, and 16 more authors
    In Companion Proceedings of the ACM Web Conference 2024, Singapore, Singapore, 2024
  10. WWW
    MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels
    Qi Chen, and 30 more authors
    In Companion Proceedings of the ACM Web Conference 2024, Singapore, Singapore, 2024
  11. KDD
    Understanding the Weakness of Large Language Model Agents within a Complex Android Environment
    Mingzhe Xing, and 5 more authors
    In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 2024
  12. Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
    Zhenting Qi, and 5 more authors
    ArXiv, 2024
  13. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
    Yiran Ding, and 7 more authors
    In ICML 2024, 2024
  14. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
    Marah Abdin, and 83 more authors
    ArXiv. (Applying LongRoPE to Phi-3) , 2024
  15. nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training
    Zhiqi Lin, and 13 more authors
    In 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2024
  16. Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
    Chaofan Lin, and 6 more authors
    In 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2024
  17. Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
    Lei Wang, and 11 more authors
    In 18th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2024
  18. Amanda: Unified Instrumentation Framework for Deep Neural Networks
    Yue Guan, and 12 more authors
    In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’24, 2024
  19. Aceso: Efficient Parallel DNN Training through Iterative Bottleneck Alleviation
    Guodong Liu, and 7 more authors
    In Proceedings of the Nineteenth European Conference on Computer Systems, EuroSys, 2024
  20. Tessel: Boosting Distributed DNN Execution with Flexible Schedule Search
    Zhiqi Lin, and 6 more authors
    In 30th International Symposium on High-Performance Computer Architecture, HPCA, 2024
  21. ICME
    Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
    Yijia Zhang, and 8 more authors
    In IEEE International Conference on Multimedia and Expo, ICME, 2024

2023

  1. Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
    Yijia Zhang, and 7 more authors
    2023
  2. SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction
    Zhiqi Lin, and 12 more authors
    2023
  3. VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity
    Qianxi Zhang, and 11 more authors
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023
  4. Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning
    Chen Zhang, and 8 more authors
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023
  5. Welder: Scheduling Deep Learning Memory Access via Tile-graph
    Yining Shi, and 8 more authors
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023
  6. Optimizing Dynamic Neural Networks with Brainstorm
    Weihao Cui, and 13 more authors
    In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2023
  7. PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
    Ningxin Zheng, and 10 more authors
    In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP, 2023
  8. SPFresh: Incremental In-Place Update for Billion-Scale Vector Search
    Yuming Xu, and 11 more authors
    In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP, 2023
  9. SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters
    Hanyu Zhao, and 11 more authors
    In Proceedings of the Eighteenth European Conference on Computer Systems, EuroSys, 2023
  10. ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
    Diandian Gu, and 9 more authors
    In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS, 2023
  11. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization
    Cong Guo, and 8 more authors
    In Proceedings of the 50th Annual International Symposium on Computer Architecture, ISCA, 2023
  12. On Modular Learning of Distributed Systems for Predicting End-to-End Latency
    Chieh-Jan Mike Liang, and 7 more authors
    In 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2023
  13. Model-enhanced Vector Index
    Hailin Zhang, and 18 more authors
    In Advances in Neural Information Processing Systems, NeurIPS, 2023
  14. MLSys
    Tutel: Adaptive Mixture-of-Experts at Scale
    Changho Hwang, and 14 more authors
    In Proceedings of Machine Learning and Systems, MLSys, 2023
  15. MLSys
    Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning
    Bin Lin, and 10 more authors
    In Proceedings of Machine Learning and Systems, 2023
  16. ACL
    NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
    Shengming Yin, and 15 more authors
    In ACL 2023, Jul 2023
  17. IJCAI
    Learning 3D photography videos via self-supervised diffusion on single images
    Xiaodong Wang, and 11 more authors
    In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, Macao, P.R.China, Jul 2023

2022

  1. SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation
    Cong Guo, and 8 more authors
    In The Tenth International Conference on Learning Representations, ICLR, Jul 2022
  2. ICCD
    Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
    Cong Guo, and 8 more authors
    In 2022 IEEE 40th International Conference on Computer Design (ICCD), Jul 2022
  3. ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
    Cong Guo, and 7 more authors
    In 55th IEEE/ACM International Symposium on Microarchitecture, MICRO, Jul 2022
  4. ROLLER: Fast and Efficient Tensor Compilation for Deep Learning
    Hongyu Zhu, and 14 more authors
    In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Jul 2022
  5. PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
    Wei Zhang, and 8 more authors
    In 2022 USENIX Annual Technical Conference, USENIX ATC, Jul 2022
  6. SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
    Ningxin Zheng, and 8 more authors
    In 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Jul 2022
  7. SIGIR
    Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
    Shitao Xiao, and 10 more authors
    In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, Jul 2022
  8. ECCV
    NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
    Chenfei Wu, and 6 more authors
    In Computer Vision – ECCV 2022, Jul 2022

2021

  1. GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions
    Chenfei Wu, and 7 more authors
    arXiv, Jul 2021

2020

  1. EMNLP
    XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
    Yaobo Liang, and 23 more authors
    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2020
  2. Capuchin: Tensor-based GPU Memory Management for Deep Learning
    Xuan Peng, and 7 more authors
    In ASPLOS ’20: Architectural Support for Programming Languages and Operating Systems, ASPLOS, Nov 2020
  3. HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees
    Hanyu Zhao, and 10 more authors
    In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Nov 2020
  4. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
    Lingxiao Ma, and 9 more authors
    In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Nov 2020
  5. Retiarii: A Deep Learning Exploratory-Training Framework
    Quanlu Zhang, and 6 more authors
    In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Nov 2020

2019

  1. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads
    Myeongjae Jeon, and 5 more authors
    In 2019 USENIX Annual Technical Conference, USENIX ATC, Nov 2019

2018

  1. Gandiva: Introspective Cluster Scheduling for Deep Learning
    Wencong Xiao, and 11 more authors
    In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI, Nov 2018
  2. Poster
    Scheduling CPU for GPU-based Deep Learning Jobs
    Wencong Xiao, and 6 more authors
    In Proceedings of the ACM Symposium on Cloud Computing (SoCC) Poster, Carlsbad, CA, USA, Nov 2018

2015

  1. SoCC
    GraM: scaling graph computation to the trillions
    Ming Wu, and 8 more authors
    In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC, Kohala Coast, Hawaii, Nov 2015
  2. ToS
    ImmortalGraph: A System for Storage and Analysis of Temporal Graphs
    Youshan Miao, and 8 more authors
    ACM Trans. Storage, Jul 2015

2014

  1. Chronos: a graph engine for temporal graph analysis
    Wentao Han, and 8 more authors
    In Proceedings of the Ninth European Conference on Computer Systems, Amsterdam, The Netherlands, Jul 2014
  2. TechReport
    Arming Cloud Services with Task Aspects
    Zhenyu Guo, and 7 more authors
    Nov 2014

2012

  1. Kineograph: taking the pulse of a fast-changing and connected world
    Raymond Cheng, and 9 more authors
    In Proceedings of the 7th ACM European Conference on Computer Systems, Bern, Switzerland, Nov 2012
  2. TechReport
    Sonora: A Platform for Continuous Mobile-Cloud Computing
    Xiuwei Chen, and 7 more authors
    Mar 2012

2007

  1. VTC
    Distributed Cooperative Rate Adaptation for Energy Efficiency in IEEE 802.11-Based Multihop Networks
    Kun Wang, and 4 more authors
    IEEE Transactions on Vehicular Technology, Mar 2007
  2. IEEENetwork
    Cooperative and opportunistic transmission for wireless ad hoc networks
    Qian Zhang, and 4 more authors
    IEEE Network, Mar 2007
  3. TWC
    Modeling path capacity in multi-hop IEEE 802.11 networks for QoS services
    Kun Wang, and 3 more authors
    IEEE Transactions on Wireless Communications, Mar 2007

2006

  1. QShine
    Distributed cooperative rate adaptation for energy efficiency in IEEE 802.11-based multi-hop networks
    Kun Wang, and 4 more authors
    In Proceedings of the 3rd International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, QShine, Waterloo, Ontario, Canada, Mar 2006
  2. Globecom
    On Improving the Throughput of Media Delivery Applications in Heterogeneous Overlay Network
    Jin Zhao, and 3 more authors
    In IEEE Globecom 2006, Mar 2006
  3. JSAC
    Distributed Channel Assignment and Routing in Multiradio Multichannel Multihop Wireless Networks
    H. Wu, and 5 more authors
    IEEE Journal on Selected Areas in Communications, Mar 2006
  4. ChinaCom
    Next generation mobile multimedia communications: Media codec and media transport perspectives
    Feng Wu, and 4 more authors
    China Communications, Mar 2006
  5. ToM
    LION: Layered Overlay Multicast With Network Coding
    J. Zhao, and 4 more authors
    IEEE Transactions on Multimedia, Mar 2006
  6. ICC
    Impact of Power and Rate Selection on the Throughput of Ad Hoc Networks
    Cong Peng, and 5 more authors
    In 2006 IEEE International Conference on Communications, Mar 2006

2005

  1. EURASIP
    Cross-layer QoS support for multimedia delivery over wireless Internet
    Qian Zhang, and 2 more authors
    EURASIP Journal on Advances in Signal Processing, Mar 2005
  2. ICC
    AMTP: a multipath multimedia streaming protocol for mobile ad hoc networks
    K. Rojviboonchai, and 4 more authors
    In IEEE International Conference on Communications, 2005. ICC 2005. 2005, Mar 2005

2004

  1. JSAC
    End-to-end TCP-friendly streaming protocol and bit allocation for scalable video over wireless Internet
    Fan Yang, and 3 more authors
    IEEE Journal on Selected Areas in Communications, Mar 2004
  2. INFOCOM
    Bit allocation for scalable video streaming over mobile wireless Internet
    Fan Yang, and 3 more authors
    In IEEE INFOCOM 2004, Mar 2004

2003

  1. ICME
    An end-to-end TCP-friendly streaming protocol for multimedia over wireless Internet
    Fan Yang, and 3 more authors
    In 2003 International Conference on Multimedia and Expo. ICME ’03. Proceedings (Cat. No.03TH8698), Mar 2003

2001

  1. 3GWireless
    An efficient transport scheme for multimedia over wireless internet
    Fan Yang, and 3 more authors
    In Proceedings of 2001 IEEE International Conference on 3G Wireless and Beyond, Mar 2001