SeerAttention-R: Sparse Attention Adaptation for Long Reasoning Paper • 2506.08889 • Published Jun 10 • 23
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published Oct 22 • 60