Model Card for Helium-Nano (45M)
Helium-Nano is a 45-million parameter Small Language Model (SLM) trained on the TinyStories dataset. It demonstrates how a highly optimized custom Transformer architecture can achieve coherent English storytelling capabilities with minimal compute resources. The model was trained in under 1 hour on a single Nvidia L4 GPU, achieving a throughput of 409k tokens/second via PyTorch 2.0 compile and architectural optimizations.
Model Details
Model Description
Helium-Nano is a decoder-only Transformer designed to investigate training dynamics and scaling laws in low-resource environments. Despite its small size, it produces grammatically correct and narratively consistent short stories.
The primary goal of this model was engineering efficiency. By implementing BFloat16 mixed precision, Flash Attention principles, Torch.compile (Inductor), and Float32-optimized Rotary Embeddings (RoPE), the training pipeline achieved a 16x speedup over standard eager-mode baselines.
- Developed by: Debmalya/batmanLovesAI
- Model type: Decoder-only Transformer (Custom Architecture)
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: N/A (Trained from scratch)
Model Sources
- Repository: Link to Github Repo
- Dataset Paper: TinyStories: How Small Can Language Models Be?
- Optimization Techniques: Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation
Uses
Direct Use
- Story Generation: Generating simple, coherent short stories suitable for early childhood reading levels.
- Educational: A lightweight baseline for experimenting with model interpretation, quantization, or fine-tuning on consumer hardware.
- Performance Benchmarking: Testing inference speeds of small transformers on various hardware.
Out-of-Scope Use
- Factual Queries: The model is trained on fiction; it has no world knowledge and will hallucinate facts.
- Reasoning/Math: The model is not capable of complex logic or arithmetic.
- Harmful Content: While the dataset is heavily filtered, users should not attempt to generate toxic or biased content.
Bias, Risks, and Limitations
- Dataset Bias: The model reflects the vocabulary and concepts found in the TinyStories dataset, which focuses on simple, positive narratives using a limited vocabulary (approx 3-year-old level).
- Repetition: Like many SLMs, the model may enter repetitive loops if the temperature is too low or repetition penalty is not applied during inference.
- Hallucinations: The model prioritizes grammatical structure over semantic logic.
How to Get Started with the Model
Since this uses a custom architecture, you need to instantiate the model class before loading weights.
import torch
from tokenizers import Tokenizer
# Assuming TinySLM class is defined in your local files
# 1. Load Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
# 2. Initialize Model
config = {
"vocab_size": 32000,
"d_model": 512,
"n_head": 8,
"n_layers": 10,
"max_seq_len": 512
}
model = TinySLM(config)
# 3. Load Weights
state_dict = torch.load("helium_nano_45m.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
# 4. Generate
prompt = "Once upon a time, there was a little"
# ... inference code ...