OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published 10 days ago • 38
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 139
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning Paper • 2510.04081 • Published Oct 5 • 23
DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively Paper • 2509.26603 • Published Sep 30 • 16
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning Paper • 2508.21589 • Published Aug 29 • 3
Middo Collection Dataset & Models for paper "Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning" • 10 items • Updated Oct 15 • 3
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning Paper • 2507.17512 • Published Jul 23 • 36
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once Paper • 2507.10541 • Published Jul 14 • 29
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis Paper • 2504.12322 • Published Apr 11 • 28
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding Paper • 2504.09925 • Published Apr 14 • 38
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion Paper • 2503.16212 • Published Mar 20 • 25
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer Paper • 2503.14891 • Published Mar 19 • 22