Qwen3-Reranker-0.6B
Multi-format version of Qwen/Qwen3-Reranker-0.6B - optimized for deployment.
Model Information
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-Reranker-0.6B |
| Task | reranker-llm |
| Type | Text Model |
| Trust Remote Code | True |
Available Versions
| Folder | Format | Description | Size |
|---|---|---|---|
safetensors-fp32/ |
PyTorch FP32 | Baseline, highest accuracy | 2288 MB |
safetensors-fp16/ |
PyTorch FP16 | GPU inference, ~50% smaller | 1152 MB |
Usage
PyTorch (GPU)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# GPU inference with FP16
model = AutoModelForCausalLM.from_pretrained(
"n24q02m/Qwen3-Reranker-0.6B",
subfolder="safetensors-fp16",
torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
"n24q02m/Qwen3-Reranker-0.6B",
subfolder="safetensors-fp16", trust_remote_code=True
)
# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
Notes
- SafeTensors FP16 is the primary format for GPU inference
- Requires
trust_remote_code=Truefor this model
License
Apache 2.0 (following the base model's license)
Credits
- Base Model: Qwen/Qwen3-Reranker-0.6B
- Conversion: PyTorch + SafeTensors