Qwen3-Reranker-0.6B

Multi-format version of Qwen/Qwen3-Reranker-0.6B - optimized for deployment.

Model Information

Property	Value
Base Model	Qwen/Qwen3-Reranker-0.6B
Task	reranker-llm
Type	Text Model
Trust Remote Code	True

Available Versions

Folder	Format	Description	Size
`safetensors-fp32/`	PyTorch FP32	Baseline, highest accuracy	2288 MB
`safetensors-fp16/`	PyTorch FP16	GPU inference, ~50% smaller	1152 MB

Usage

PyTorch (GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# GPU inference with FP16
model = AutoModelForCausalLM.from_pretrained(
    "n24q02m/Qwen3-Reranker-0.6B",
    subfolder="safetensors-fp16",
    torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/Qwen3-Reranker-0.6B",
    subfolder="safetensors-fp16", trust_remote_code=True
)

# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)

Notes

SafeTensors FP16 is the primary format for GPU inference
Requires trust_remote_code=True for this model

License

Apache 2.0 (following the base model's license)

Credits

Base Model: Qwen/Qwen3-Reranker-0.6B
Conversion: PyTorch + SafeTensors

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for n24q02m/Qwen3-Reranker-0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Reranker-0.6B

Finetuned

(10)

this model