publications
Please find the complete list here.
2025
- rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep ThinkingArXiv, 2025
2024
- ENLSPRetrievalAttention: Accelerating Long-Context LLM Inference via Vector RetrievalIn NeurIPS Efficient Natural Language and Speech Processing workshop, NeurIPS ENLSP-IV, 2024
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-SolversArXiv, 2024
2023
2022
- ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network QuantizationIn 55th IEEE/ACM International Symposium on Microarchitecture, MICRO, Jul 2022
-
2021
2020
2019
2018
- PosterScheduling CPU for GPU-based Deep Learning JobsIn Proceedings of the ACM Symposium on Cloud Computing (SoCC) Poster, Carlsbad, CA, USA, Nov 2018
2015
2014
2012
2007
2006
- QShineDistributed cooperative rate adaptation for energy efficiency in IEEE 802.11-based multi-hop networksIn Proceedings of the 3rd International Conference on Quality of Service in Heterogeneous Wired/Wireless Networks, QShine, Waterloo, Ontario, Canada, Mar 2006