Bolmo: Byteifying the Next Generation of Language Models Paper • 2512.15586 • Published 17 days ago • 14
view article Article Why You Should Care About Partial Differential Equations (PDEs) 23 days ago • 35
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
view article Article LightOnOCR-1B: The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR Oct 23, 2025 • 62
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 Aug 8, 2025 • 90
nablaNABLA: Neighborhood Adaptive Block-Level Attention Paper • 2507.13546 • Published Jul 17, 2025 • 124
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One Jun 26, 2025 • 48
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19, 2025 • 88
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published May 22, 2025 • 34