Qwen2.5-1.5B-DLLM-BD3LM-Pretrain-5550-VI

Model Description

This is a Block Diffusion Language Model (BD3LM) pretrained on Vietnamese Wikipedia dataset. The model is based on Qwen2.5-1.5B architecture and trained using the diffusion language modeling approach.

Training Details

  • Base Model: Qwen/Qwen2.5-1.5B
  • Training Method: BD3LM (Block Diffusion Language Model)
  • Dataset: Vietnamese Wikipedia (vietgpt/wikipedia_vi) - 50,000 samples
  • Training Steps: 5,500 steps
  • Max Length: 512 tokens
  • Block Size: 32 tokens
  • Batch Size: 2 per device × 8 gradient accumulation = 16 effective batch size
  • Learning Rate: 2e-5
  • Epochs: ~3 epochs
  • Framework: dLLM (Diffusion Language Model Library)

Model Architecture

  • Architecture: A2D-Qwen2 (Autoregressive to Diffusion)
  • Hidden Size: 1536
  • Num Layers: 28
  • Num Attention Heads: 12
  • Num KV Heads: 2
  • Intermediate Size: 8960
  • Vocab Size: 151,936

License

Apache 2.0

Downloads last month
5
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ChaosAIVision/Qwen2.5-1.5B-ddlm-bd3lm-pretrain-5550-vi

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(283)
this model
Finetunes
1 model

Dataset used to train ChaosAIVision/Qwen2.5-1.5B-ddlm-bd3lm-pretrain-5550-vi