DeBERTa v3 Prompt Injection Detector - LoRA Fine-tuned

A LoRA adapter for prompt injection detection, fine-tuned on top of ProtectAI's DeBERTa v3 base model using a two-stage training approach.

Model Details

Model Description

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of ProtectAI/deberta-v3-base-prompt-injection for binary sequence classification (safe vs. injection). The model was trained in two stages on different prompt injection datasets to improve generalization.

Base Model: ProtectAI/deberta-v3-base-prompt-injection
Model Type: Binary Sequence Classification (DeBERTa v3 + LoRA)
Task: Prompt Injection Detection
Language: English
License: Inherits from base model

Training Details

Two-Stage Training Approach

Stage 1: Fine-tuned on xTRam1/safe-guard-prompt-injection

Training set: 90% of original train split
Validation set: 10% of original train split (for early stopping)
Test set: Original held-out test split (evaluated after training)
Early stopping with patience of 3 epochs
Split seed: 42 for reproducibility

Stage 2: Continued training on reshabhs/SPML_Chatbot_Prompt_Injection

Combined System Prompt and User Prompt as input text
Uses predefined validation split if available, otherwise 90/10 split from train set
Test set: Original test split if available, otherwise validation set
Same early stopping approach (patience=3)
Split seed: 42 for reproducibility

Training Hyperparameters

LoRA Configuration:

r: 16
lora_alpha: 32
lora_dropout: 0.1
target_modules: ["query_proj", "key_proj", "value_proj", "o_proj"]
bias: none
task_type: SEQ_CLS

Training Arguments (both stages):

per_device_train_batch_size: 16
per_device_eval_batch_size: 32
learning_rate: 2e-4
num_train_epochs: 20 (with early stopping)
fp16: enabled (if CUDA available)
eval_strategy: epoch
save_strategy: epoch
load_best_model_at_end: True
metric_for_best_model: accuracy
early_stopping_patience: 3

Preprocessing

Tokenizer: DeBERTa v3 tokenizer from base model
max_length: 256
truncation: True
padding: max_length

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel

# Load base model and tokenizer
base_model_name = "ProtectAI/deberta-v3-base-prompt-injection"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = AutoModelForSequenceClassification.from_pretrained(base_model_name)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/deberta-pi-lora-final-adapter")

# Inference
text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()  # 0 = safe, 1 = injection

Evaluation

Metric: Accuracy
Evaluation Strategy: Epoch-based evaluation with early stopping
Both training stages included validation on held-out test sets

Framework versions

PEFT 0.17.1

Downloads last month: 19

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Theworst1/deberta-v3-prompt-injection_1

Base model

microsoft/deberta-v3-base

Quantized

protectai/deberta-v3-base-prompt-injection

Adapter

(1)

this model