InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search Paper • 2512.18745 • Published 14 days ago • 10
The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs Paper • 2507.07562 • Published Jul 10, 2025 • 1
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training Paper • 2412.09619 • Published Dec 12, 2024 • 28