publications
Please find the complete list here.
2024
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsArXiv, 2024
- RetrievalAttention: Accelerating Long-Context LLM Inference via Vector RetrievalArXiv, 2024
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversArXiv, 2024
2023
2022
- ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network QuantizationIn 55th IEEE/ACM International Symposium on Microarchitecture, MICRO, Jul 2022
-
2021
2020
2019
2018
- PosterScheduling CPU for GPU-based Deep Learning JobsIn Proceedings of the ACM Symposium on Cloud Computing (SoCC) Poster, Carlsbad, CA, USA, Nov 2018
2015
2014
2012
2007
2006
- QShineDistributed cooperative rate adaptation for energy efficiency in IEEE 802.11-based multi-hop networksIn Proceedings of the 3rd International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, QShine, Waterloo, Ontario, Canada, Mar 2006