Papers
arxiv:2512.15702

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Published on Dec 17
· Submitted by
Yuwei Guo
on Dec 18
Authors:
,
,
,
,
,
,
,

Abstract

Resampling Forcing is introduced as a teacher-free framework to train autoregressive video diffusion models with improved temporal consistency using self-resampling and history routing.

AI-generated summary

Autoregressive video diffusion models hold promise for world simulation but are vulnerable to exposure bias arising from the train-test mismatch. While recent works address this via post-training, they typically rely on a bidirectional teacher model or online discriminator. To achieve an end-to-end solution, we introduce Resampling Forcing, a teacher-free framework that enables training autoregressive video models from scratch and at scale. Central to our approach is a self-resampling scheme that simulates inference-time model errors on history frames during training. Conditioned on these degraded histories, a sparse causal mask enforces temporal causality while enabling parallel training with frame-level diffusion loss. To facilitate efficient long-horizon generation, we further introduce history routing, a parameter-free mechanism that dynamically retrieves the top-k most relevant history frames for each query. Experiments demonstrate that our approach achieves performance comparable to distillation-based baselines while exhibiting superior temporal consistency on longer videos owing to native-length training.

Community

Paper submitter

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

·

Hi!

The method is proposed to overcome limitations of self-forcing (reliance on teacher model, GAN loss..). Why does it need a warmup by adopting self-forcing objective? How's the result without self-forcing warmup?

The abstract claims to enable training AR model from scratch? Any results without pretrained weights?

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/end-to-end-training-for-autoregressive-video-diffusion-via-self-resampling-9902-d8f704fd

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.15702 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.15702 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.15702 in a Space README.md to link it from this page.

Collections including this paper 4