YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

InternVLA-N1 Model Series

License Transformers PyTorch


Model Description

InternVLA-N1 is a state-of-the-art navigation foundation model built on a multi-system design. Within this framework, it introduces a dual-system approach that joint trains the System 2 for high-level reasoning and System 1 for low-level action and control. This asynchronous architecture enables smooth, efficient, and robust instruction-following navigation in both simulated and real-world environments.


πŸ”— Resources

Code Technical Report β€” InternVLA-N1 DualVLN Paper β€” arXiv Project Page β€” InternVLA-N1 Project Page β€” DualVLN Dataset


Key Features

  • 🧩 Modular Multi-System Support
    Combines System 2 (reasoning/planning) with System 1 (action/control) in an asynchronous framework, delivering the first Dual-System Vision-Language Navigation (VLN) Foundation Model.

  • πŸš€ Zero-Shot Sim2Real Generalization
    Trained exclusively on simulation data (InternData-N1) while generalizing effectively to real-world deployments.

  • πŸ† State-of-the-Art Performance
    Achieves leading results on multiple VLN benchmarks, including VLN-CE R2R/RxR and VLN-PE.

  • ⚑ Asynchronous Inference
    Enables smooth execution and dynamic obstacle avoidance during navigation.


Model Variants

Model Variant Description Key Characteristics
InternVLA-N1 (S2) Finetuned Qwen2.5-VL model for pixel-goal grounding Strong System 2 module; compatible with decoupled System 1 controllers or joint optimization pipelines
InternVLA-N1 (Dual System) w/ NavDP* Jointly tuned System 1 (NavDP*) and InternVLA-N1 (S2) Optimized end-to-end performance; uses RGB-D observations
InternVLA-N1 (Dual System) DualVLN Latest dual-system architecture Optimized end-to-end performance and faster convergence; uses RGB observations

The previously released version is now called InternVLA-N1-wo-dagger. The lastest official release is recommended for best performance.


Usage

For inference, evaluation, and the Gradio demo, please refer to the InternNav repository.


Citation

If you find our work helpful, please consider starring this repository 🌟 and citing:

@misc{internvla-n1,
    title = {{InternVLA-N1: An} Open Dual-System Navigation Foundation Model with Learned Latent Plans},
    author = {InternVLA-N1 Team},
    year = {2025},
    booktitle={arXiv},
}
@misc{internnav2025,
    title = {{InternNav: InternRobotics'} open platform for building generalized navigation foundation models},
    author = {InternNav Contributors},
    howpublished={\url{https://github.com/InternRobotics/InternNav}},
    year = {2025}
}
@misc{wei2025groundslowfastdualsystem,
      title={Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation}, 
      author={Meng Wei and Chenyang Wan and Jiaqi Peng and Xiqian Yu and Yuqiang Yang and Delin Feng and Wenzhe Cai and Chenming Zhu and Tai Wang and Jiangmiao Pang and Xihui Liu},
      year={2025},
      eprint={2512.08186},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2512.08186}, 
}

Downloads last month
19
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support