SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L12-v2
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("redis/model-a-baseline")
# Run inference
sentences = [
    'Why do onions make people cry?',
    'Why do onions sting?',
    'Can people with bipolar have healthy relationships?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.8663,  0.0078],
#         [ 0.8663,  1.0000, -0.0501],
#         [ 0.0078, -0.0501,  1.0000]])

Evaluation

Metrics

Information Retrieval

Metric NanoMSMARCO NanoNQ
cosine_accuracy@1 0.32 0.32
cosine_accuracy@3 0.56 0.54
cosine_accuracy@5 0.72 0.62
cosine_accuracy@10 0.82 0.68
cosine_precision@1 0.32 0.32
cosine_precision@3 0.1867 0.1933
cosine_precision@5 0.144 0.132
cosine_precision@10 0.082 0.072
cosine_recall@1 0.32 0.31
cosine_recall@3 0.56 0.53
cosine_recall@5 0.72 0.6
cosine_recall@10 0.82 0.66
cosine_ndcg@10 0.5574 0.4926
cosine_mrr@10 0.4747 0.4419
cosine_map@100 0.482 0.4462

Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with NanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "msmarco",
            "nq"
        ],
        "dataset_id": "lightonai/NanoBEIR-en"
    }
    
Metric Value
cosine_accuracy@1 0.32
cosine_accuracy@3 0.55
cosine_accuracy@5 0.67
cosine_accuracy@10 0.75
cosine_precision@1 0.32
cosine_precision@3 0.19
cosine_precision@5 0.138
cosine_precision@10 0.077
cosine_recall@1 0.315
cosine_recall@3 0.545
cosine_recall@5 0.66
cosine_recall@10 0.74
cosine_ndcg@10 0.525
cosine_mrr@10 0.4583
cosine_map@100 0.4641

Training Details

Training Dataset

Unnamed Dataset

  • Size: 89,998 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 15.61 tokens
    • max: 71 tokens
    • min: 5 tokens
    • mean: 15.72 tokens
    • max: 71 tokens
    • min: 4 tokens
    • mean: 16.55 tokens
    • max: 67 tokens
  • Samples:
    anchor positive negative
    How long did it take to develop Pokémon GO? How long did it take to develop Pokémon GO? Can I take more than one gym in Pokémon GO?
    What is the best gift you've received? What is the best tangible gift you've ever received? Where can I download Chaayam Poosiya Veedu (The Painted House) malayalam movie for free?
    Why should I bother writing/editing a Wikipedia article when it can be overwritten by anyone? Why should I bother writing/editing a Wikipedia article when it can be overwritten by anyone? When I write a chapter, after I finish editing it, it is way too short. How can I lengthen a chapter?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 7.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 10,000 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 15.75 tokens
    • max: 65 tokens
    • min: 3 tokens
    • mean: 15.86 tokens
    • max: 65 tokens
    • min: 5 tokens
    • mean: 16.66 tokens
    • max: 74 tokens
  • Samples:
    anchor positive negative
    What's it like working in IT for Goldman Sachs? What's it like working in IT for Goldman Sachs? What is the work done at Goldman Sachs?
    How did Revan build his foundation of his army in Star Wars? How did Revan build his foundation of his army in Star Wars? What Star Wars character deserves his/her own movie?
    Is C++ the best programming language to learn first? Is C++ the best programming language to learn first? Which programming language is the best to learn first?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 7.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • learning_rate: 2e-05
  • weight_decay: 0.0001
  • max_steps: 3000
  • warmup_ratio: 0.1
  • fp16: True
  • dataloader_drop_last: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 1
  • load_best_model_at_end: True
  • optim: adamw_torch
  • ddp_find_unused_parameters: False
  • push_to_hub: True
  • hub_model_id: redis/model-a-baseline
  • eval_on_start: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 3000
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 1
  • dataloader_prefetch_factor: 1
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: redis/model-a-baseline
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_cosine_ndcg@10 NanoNQ_cosine_ndcg@10 NanoBEIR_mean_cosine_ndcg@10
0 0 - 0.5972 0.5887 0.5786 0.5836
0.3556 250 0.5902 0.4140 0.5596 0.5395 0.5495
0.7112 500 0.5168 0.4000 0.5798 0.5206 0.5502
1.0669 750 0.4977 0.3934 0.5722 0.5079 0.5401
1.4225 1000 0.4825 0.3875 0.5612 0.5129 0.5370
1.7781 1250 0.4764 0.3843 0.5734 0.5179 0.5457
2.1337 1500 0.4672 0.3821 0.5740 0.5065 0.5402
2.4893 1750 0.4612 0.3804 0.5721 0.4950 0.5335
2.8450 2000 0.4576 0.3791 0.5588 0.4836 0.5212
3.2006 2250 0.4533 0.3775 0.5550 0.5005 0.5278
3.5562 2500 0.4491 0.3770 0.5604 0.4919 0.5262
3.9118 2750 0.4483 0.3763 0.5569 0.4897 0.5233
4.2674 3000 0.446 0.3760 0.5574 0.4926 0.5250

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.9.1+cu128
  • Accelerate: 1.12.0
  • Datasets: 2.21.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
240
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for redis/model-a-baseline

Finetuned
(57)
this model

Papers for redis/model-a-baseline

Evaluation results