PixelDiT: Pixel Diffusion Transformers for Image Generation Paper • 2511.20645 • Published Nov 25, 2025 • 30
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 30
Plan-X: Instruct Video Generation via Semantic Planning Paper • 2511.17986 • Published Nov 22, 2025 • 16
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published Aug 11, 2025 • 110
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges Paper • 2311.05112 • Published Nov 9, 2023 • 1
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published Jun 3, 2025 • 58
ImgEdit: A Unified Image Editing Dataset and Benchmark Paper • 2505.20275 • Published May 26, 2025 • 18
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Paper • 2505.20292 • Published May 26, 2025 • 52
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension Paper • 2503.08689 • Published Mar 11, 2025 • 4
VideoAuteur: Towards Long Narrative Video Generation Paper • 2501.06173 • Published Jan 10, 2025 • 31
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published Jan 16, 2025 • 71
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion Paper • 2501.09019 • Published Jan 15, 2025 • 12
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 37
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Paper • 2411.15411 • Published Nov 23, 2024 • 8