DeBERTa v3 Prompt Injection Detector - LoRA Fine-tuned

A LoRA adapter for prompt injection detection, fine-tuned on top of ProtectAI's DeBERTa v3 base model using a two-stage training approach.

Model Details

Model Description

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of ProtectAI/deberta-v3-base-prompt-injection for binary sequence classification (safe vs. injection). The model was trained in two stages on different prompt injection datasets to improve generalization.

  • Base Model: ProtectAI/deberta-v3-base-prompt-injection
  • Model Type: Binary Sequence Classification (DeBERTa v3 + LoRA)
  • Task: Prompt Injection Detection
  • Language: English
  • License: Inherits from base model

Training Details

Two-Stage Training Approach

Stage 1: Fine-tuned on xTRam1/safe-guard-prompt-injection

  • Training set: 90% of original train split
  • Validation set: 10% of original train split (for early stopping)
  • Test set: Original held-out test split (evaluated after training)
  • Early stopping with patience of 3 epochs
  • Split seed: 42 for reproducibility

Stage 2: Continued training on reshabhs/SPML_Chatbot_Prompt_Injection

  • Combined System Prompt and User Prompt as input text
  • Uses predefined validation split if available, otherwise 90/10 split from train set
  • Test set: Original test split if available, otherwise validation set
  • Same early stopping approach (patience=3)
  • Split seed: 42 for reproducibility

Training Hyperparameters

LoRA Configuration:

  • r: 16
  • lora_alpha: 32
  • lora_dropout: 0.1
  • target_modules: ["query_proj", "key_proj", "value_proj", "o_proj"]
  • bias: none
  • task_type: SEQ_CLS

Training Arguments (both stages):

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-4
  • num_train_epochs: 20 (with early stopping)
  • fp16: enabled (if CUDA available)
  • eval_strategy: epoch
  • save_strategy: epoch
  • load_best_model_at_end: True
  • metric_for_best_model: accuracy
  • early_stopping_patience: 3

Preprocessing

  • Tokenizer: DeBERTa v3 tokenizer from base model
  • max_length: 256
  • truncation: True
  • padding: max_length

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

# Load base model and tokenizer
base_model_name = "ProtectAI/deberta-v3-base-prompt-injection"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = AutoModelForSequenceClassification.from_pretrained(base_model_name)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/deberta-pi-lora-final-adapter")

# Inference
text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()  # 0 = safe, 1 = injection

Evaluation

  • Metric: Accuracy
  • Evaluation Strategy: Epoch-based evaluation with early stopping
  • Both training stages included validation on held-out test sets

Framework versions

  • PEFT 0.17.1
Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Theworst1/deberta-v3-prompt-injection_1