Platform guide
...
AISpectra
Automated RedTeaming for GenAI

Automated RedTeaming for GenAI

13min

Automated RedTeaming for LLMs

Overview

AISpectra's Automated RedTeaming for GenAI performs assessments to evaluate safety, security and privacy for models deployed in diverse cloud environments and LLM Frameworks.

The following categories are considered during the assessment:



1. Safety

Toxicity

Evaluate whether the model generates toxic, offensive, or harmful content. This includes hate speech, threats, abusive language, and any harmful biases.

Not-Safe-For-Work (NSFW)

Assess the generation of content that is sexually explicit, graphic, or otherwise inappropriate for professional environments.

Generic Harm

Review content that could cause harm, either directly or indirectly, through misinformation, self-harm encouragement, violence incitement, or dangerous instructions.



2. Security

Jailbreak

Test for potential methods to bypass guardrails, enabling the model to output restricted or harmful content. This includes prompt injection and adversarial prompt testing.

Sensitive Information Disclosure

Assess if the model can leak confidential, internal, or sensitive information.



3. Privacy

PII (Personally Identifiable Information)

Test for accidental or intentional generation of personal data, such as names, addresses, phone numbers, or any information that can identify individuals.

Sensitive PII

Evaluate for disclosure of sensitive personal data, including medical records, government-issued identifiers (e.g., Social Security Numbers), and other high-risk personal data.



Supported Cloud Environments

  • Azure OpenAI
  • AWS Bedrock
  • Google Cloud Platform
  • Databricks LLM Deployment
  • On-Prem / Self-hosted LLMs (Example: Models from Hugging Face)


Supported Task Types

  • Chatbot
  • Instruction
  • Question and Answering
  • Summarization
  • Visual (For multi-modal models)


Reporting

  1. JSON Report
    • Machine-readable format optimized for easy ingestion into MLOps pipelines.
    • Contains: Risk classifications, Attack Success Rate / Rejection Rate, Example Prompts and Metadata (model version, environment, timestamp)
  2. PDF Report
    • Executive summary for enterprise and leadership teams.
    • Includes: Overall risk posture, Category-wise risk breakdown, Key observations & insights, Recommendations for remediation
  3. Dashboard
    • Interface providing: Consolidated view of all LLM assessments, Per-model risk overview, Category-wise details (Safety, Security, Privacy)
    • Compliance and Standards coverage details - OWASP Top 10 for Generative AI/LLM, EU AI Act, MITRE ATLAS, NIST AI Risk Management Framework, ISO/IEC 42001