Qwen2.5-1.5B-DLLM-BD3LM-Pretrain-5550-VI
Model Description
This is a Block Diffusion Language Model (BD3LM) pretrained on Vietnamese Wikipedia dataset. The model is based on Qwen2.5-1.5B architecture and trained using the diffusion language modeling approach.
Training Details
- Base Model: Qwen/Qwen2.5-1.5B
- Training Method: BD3LM (Block Diffusion Language Model)
- Dataset: Vietnamese Wikipedia (vietgpt/wikipedia_vi) - 50,000 samples
- Training Steps: 5,500 steps
- Max Length: 512 tokens
- Block Size: 32 tokens
- Batch Size: 2 per device × 8 gradient accumulation = 16 effective batch size
- Learning Rate: 2e-5
- Epochs: ~3 epochs
- Framework: dLLM (Diffusion Language Model Library)
Model Architecture
- Architecture: A2D-Qwen2 (Autoregressive to Diffusion)
- Hidden Size: 1536
- Num Layers: 28
- Num Attention Heads: 12
- Num KV Heads: 2
- Intermediate Size: 8960
- Vocab Size: 151,936
License
Apache 2.0
- Downloads last month
- 5