Hate Speech & Offensive Message Classifier

A state-of-the-art hate speech and offensive message classifier built with the RoBERTa transformer model, fine-tuned on the Davidson et al. (2017) Twitter dataset. This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for social media moderation, community platforms, and chat applications.

Key Features

🤖 Transformer-based Architecture: Built on roberta-base for advanced natural language understanding
⚡ High Performance: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy
🔧 Hyperparameter Optimization: Automated tuning using Optuna framework
⚖️ Class Imbalance Handling: Weighted cross-entropy loss for fairness across labels
📊 Comprehensive Evaluation: Precision, Recall, F1-score, confusion matrix
🚀 Production Ready: Model + tokenizer saved in Hugging Face format for direct deployment

Model Performance

Final Results on Test Set:

Overall Accuracy: 96.23%
Weighted F1-Score: 0.9621
Offensive/Hate F1-Score: 0.9774 ✅ (Exceeds 0.90 acceptance threshold)
Offensive/Hate Precision: 97.49%
Offensive/Hate Recall: 98% (High hate/offensive message detection rate)
Neither Precision: 89.82%
Neither Recall: 87.52%

Generalizability 📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.

Dataset

Source: Hate Speech and Offensive Language Dataset (Davidson et al., 2017)

Dataset Statistics:

Total Tweets: 24,783
Hate Speech / Offensive: 20620
Neutral: 4163
Average Tweet Length: ~86 characters
Language: English

Dataset Split:

Training Set: 70% (17,348 tweets) – model training
Validation Set: 15% (3,717 tweets) – hyperparameter tuning
Test Set: 15% (3,718 tweets) – final evaluation on unseen data

Preprocessing Steps:

Label mapping: 0 = Neither, 1 = Hate/Offensive.
Text cleaning.
Train/validation/test split.
Tokenization with RoBERTa tokenizer.
Dynamic padding and truncation.

Architecture & Methodology

Model Architecture

Base Model: FacebokAI/roberta-base (Hugging Face Transformers)
Task: Multi-class sequence classification (2 labels)
Fine-tuning: Custom classification head with 2 outputs
Tokenization: RoBERTa tokenizer with optimal sequence length

Training Strategy

Data Preprocessing: Hate/offencive message cleaning and label encoding
Tokenization: Dynamic padding with optimal max length
Class Balancing: Weighted loss function to handle imbalanced dataset
Hyperparameter Optimization: Optuna-based automated tuning
Evaluation: Comprehensive metrics on held-out test set

Hyperparameter Optimization

Optimized with Optuna (15 trials) across ranges:

Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2)
Learning rate: 1e-5 to 5e-5 range
Weight decay: 0.0 to 0.1 regularization
Batch size: 8, 16, or 32 samples
Gradient accumulation steps: 1 to 4
Training epochs: 2 to 5 epochs
Warmup ratio: 0.05 to 0.1 for learning rate scheduling

Best Parameters Found:

Hidden Dropout: 0.13034059066330464
Attention Dropout: 0.1935379847495239
Learning Rate: 1.031409901695853e-05
Weight Decay: 0.03606621145317628
Batch Size: 16
Gradient Accumulation: 1
Epochs: 2
Warmup Ratio: 0.0718442228846798

📊 Detailed Results

Confusion Matrix :

	Predicted Neither	Predicted Offensive/Hate
Actual Neither	547	78
Actual Offensive	62	3031

Performance Breakdown

True Positives (Hate/Offensive correctly identified): 3031
True Negatives (Neutral correctly identified): 547
False Positives (Neutral incorrectly flagged): 78
False Negatives (Hate/offensive missed): 62

Usage

import re
import html
import contractions
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import torch

# Load the trained model + tokenizer
model = RobertaForSequenceClassification.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
tokenizer = RobertaTokenizer.from_pretrained("AshiniR/hate-speech-and-offensive-message-classifier")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

def preprocess_text(text: str) -> str:
    """
    Preprocess raw text for transformer-based models like RoBERTa.

    This function is tailored for toxicity, sentiment, and social media classification.
    It removes noise (URLs, mentions, HTML codes) but keeps important signals
    such as casing, punctuation, and emojis.

    Steps:
        1. Decode HTML entities (e.g., '>' → '>')
        2. Replace URLs with placeholders ("")
        3. Replace mentions with placeholders ("")
        4. Remove '#' from hashtags but keep the word (e.g., "#love" → "love")
        5. Expand contractions (e.g., "you're" → "you are")
        6. Mildly normalize repeated characters (3+ → 2)
        7. Remove "RT" only if at start of tweet
        8. Normalize whitespace

    Args:
        text (str): Raw tweet text.

    Returns:
        str: Cleaned text suitable for RoBERTa tokenization.
    """
    if not isinstance(text, str):
        return ""

    # 1. Decode HTML entities
    text = html.unescape(text)

    # 2. Replace URLs with placeholder
    text = re.sub(r"(https?://\S+|www\.\S+)", "", text)

    # 3. Replace user mentions with placeholder
    text = re.sub(r"@\w+", "", text)

    # 4. Simplify hashtags
    text = re.sub(r"#(\w+)", r"\1", text)

    # 5. Expand contractions
    text = contractions.fix(text)

    # 6. Mild normalization of character elongations (3+ → 2)
    text = re.sub(r"(.)\1{2,}", r"\1\1", text)

    # 7. Remove RT only if it starts the tweet (For tweets)
    text = re.sub(
        r"^[\s\W]*rt\s*@?\w*:?[\s-]*",
        "",
        text,
        flags=re.IGNORECASE
    )

    # 8. Normalize whitespace
    text = re.sub(r"\s+", " ", text).strip()

    return text


def get_inference(text: str) -> list:
    """Returns prediction results in [{'label': str, 'score': float}, ...] format."""
    # Preprocess the text
    text = preprocess_text(text)

    # Tokenize input text
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        padding=False,
        max_length=128
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Get model predictions
    with torch.no_grad():
        outputs = model(**inputs)
        probabilities = torch.softmax(outputs.logits, dim=-1)
    
    # Convert to label format
    labels = ["neither", "hate/offensive"]
    results = []
    for i, prob in enumerate(probabilities[0]):
        results.append({
            "label": labels[i],
            "score": prob.item()
        })
    
    return sorted(results, key=lambda x: x["score"], reverse=True)

# Example usage
text = "your example massege"
predictions = get_inference(text)
print(f"Text: '{text}'")
print(f"Predictions: {predictions}")

Use Cases

This hate/offensive massege classifier is ideal for:

Messaging Platforms

Discord bot moderation (Primary use case)
SMS filtering systems
Chat application content filtering

Content Moderation

Social media platforms
Comment section filtering
User-generated content screening

Citation

If you use this model in your research or application, please cite:

@misc{AshiniR_Hate/Offencive_Message_Classifier_2025,
  author       = {Ashini Dhananjana},
  title        = {Hate/Offencive Message Classifier: RoBERTa-based Hate/Offencive Message Detection},
  year         = {2025},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/AshiniR/hate-speech-and-offensive-message-classifier}},
}

Model Card Contact

AshiniR - Hugging Face Profile

Downloads last month: 64

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for AshiniR/hate-speech-and-offensive-message-classifier

Base model

FacebookAI/roberta-base

Finetuned

(2058)

this model