EasyV2V: A High-quality Instruction-based Video Editing Framework Paper • 2512.16920 • Published 18 days ago • 17
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models Paper • 2512.10655 • Published 25 days ago • 8
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published Oct 7, 2025 • 141
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 125
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 48
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation Paper • 2509.21989 • Published Sep 26, 2025 • 22
Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale Paper • 2509.14008 • Published Sep 17, 2025 • 88
UItron: Foundational GUI Agent with Advanced Perception and Planning Paper • 2508.21767 • Published Aug 29, 2025 • 12
Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection Paper • 2508.20766 • Published Aug 28, 2025 • 14
Train Long, Think Short: Curriculum Learning for Efficient Reasoning Paper • 2508.08940 • Published Aug 12, 2025 • 27
Motion-Aware Concept Alignment for Consistent Video Editing Paper • 2506.01004 • Published Jun 1, 2025 • 8
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Paper • 2505.22642 • Published May 28, 2025 • 3
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions Paper • 2505.21724 • Published May 27, 2025 • 5
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think Paper • 2504.20708 • Published Apr 29, 2025 • 23
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published Apr 17, 2025 • 51
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding Paper • 2503.17827 • Published Mar 22, 2025 • 8
Vivid-ZOO: Multi-View Video Generation with Diffusion Model Paper • 2406.08659 • Published Jun 12, 2024 • 8