The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B. ๐ Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
deepseek-ai/DeepSeek-OCR is out! ๐ฅ my take โคต๏ธ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages
> Quality โ 3โ4B dense, yet faster than Qwen3-1.7B > MoE designed to run on phones/laptops (llama.cpp / vLLM) > Pre-trained on 12T tokens โ strong math/code/IF
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐ฅ
> not only a document converter but also can do document question answering, understand multiple languages ๐คฏ > best part: released with Apache 2.0 license ๐ use it with your commercial projects! > it supports transformers, vLLM and MLX from the get-go! ๐ค > built on SigLIP2 & granite-165M