HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models Paper • 2512.09928 • Published about 14 hours ago • 4
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper • 2512.08829 • Published 1 day ago • 8
Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning Paper • 2512.03667 • Published 8 days ago • 3
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 6 days ago • 36
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Paper • 2512.04810 • Published 7 days ago • 24
COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence Paper • 2512.04563 • Published 7 days ago • 13
PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling Paper • 2512.04784 • Published 9 days ago • 23
view article Article Building Jobly: Semantic Job Matching with RAG and Vector Embeddings 13 days ago • 12
view article Article TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval 7 days ago • 18
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Paper • 2512.03000 • Published 9 days ago • 34
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion Paper • 2512.04926 • Published 7 days ago • 40
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 7 days ago • 18
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 7 days ago • 165
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published 7 days ago • 146