DeBERTa v3 Prompt Injection Detector - LoRA Fine-tuned
A LoRA adapter for prompt injection detection, fine-tuned on top of ProtectAI's DeBERTa v3 base model using a two-stage training approach.
Model Details
Model Description
This model is a LoRA (Low-Rank Adaptation) fine-tuned version of ProtectAI/deberta-v3-base-prompt-injection for binary sequence classification (safe vs. injection). The model was trained in two stages on different prompt injection datasets to improve generalization.
- Base Model: ProtectAI/deberta-v3-base-prompt-injection
- Model Type: Binary Sequence Classification (DeBERTa v3 + LoRA)
- Task: Prompt Injection Detection
- Language: English
- License: Inherits from base model
Training Details
Two-Stage Training Approach
Stage 1: Fine-tuned on xTRam1/safe-guard-prompt-injection
- Training set: 90% of original train split
- Validation set: 10% of original train split (for early stopping)
- Test set: Original held-out test split (evaluated after training)
- Early stopping with patience of 3 epochs
- Split seed: 42 for reproducibility
Stage 2: Continued training on reshabhs/SPML_Chatbot_Prompt_Injection
- Combined System Prompt and User Prompt as input text
- Uses predefined validation split if available, otherwise 90/10 split from train set
- Test set: Original test split if available, otherwise validation set
- Same early stopping approach (patience=3)
- Split seed: 42 for reproducibility
Training Hyperparameters
LoRA Configuration:
- r: 16
- lora_alpha: 32
- lora_dropout: 0.1
- target_modules:
["query_proj", "key_proj", "value_proj", "o_proj"] - bias: none
- task_type: SEQ_CLS
Training Arguments (both stages):
- per_device_train_batch_size: 16
- per_device_eval_batch_size: 32
- learning_rate: 2e-4
- num_train_epochs: 20 (with early stopping)
- fp16: enabled (if CUDA available)
- eval_strategy: epoch
- save_strategy: epoch
- load_best_model_at_end: True
- metric_for_best_model: accuracy
- early_stopping_patience: 3
Preprocessing
- Tokenizer: DeBERTa v3 tokenizer from base model
- max_length: 256
- truncation: True
- padding: max_length
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import PeftModel
# Load base model and tokenizer
base_model_name = "ProtectAI/deberta-v3-base-prompt-injection"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
model = AutoModelForSequenceClassification.from_pretrained(base_model_name)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "path/to/deberta-pi-lora-final-adapter")
# Inference
text = "Your input text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item() # 0 = safe, 1 = injection
Evaluation
- Metric: Accuracy
- Evaluation Strategy: Epoch-based evaluation with early stopping
- Both training stages included validation on held-out test sets
Framework versions
- PEFT 0.17.1
- Downloads last month
- 19
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for Theworst1/deberta-v3-prompt-injection_1
Base model
microsoft/deberta-v3-base