Automated RedTeaming for GenAI

Platform guide

...

14 min

automated redteaming for llms overview aispectra's automated redteaming for genai performs assessments to evaluate safety, security and privacy for models deployed in diverse cloud environments and llm frameworks user manual for complete usage instructions, please refer to the llmrt user manual the following categories are considered during the assessment 1\ safety toxicity evaluate whether the model generates toxic, offensive, or harmful content this includes hate speech, threats, abusive language, and any harmful biases not safe for work (nsfw) assess the generation of content that is sexually explicit, graphic, or otherwise inappropriate for professional environments generic harm review content that could cause harm, either directly or indirectly, through misinformation, self harm encouragement, violence incitement, or dangerous instructions 2\ security jailbreak test for potential methods to bypass guardrails, enabling the model to output restricted or harmful content this includes prompt injection and adversarial prompt testing sensitive information disclosure assess if the model can leak confidential, internal, or sensitive information 3\ privacy pii (personally identifiable information) test for accidental or intentional generation of personal data, such as names, addresses, phone numbers, or any information that can identify individuals sensitive pii evaluate for disclosure of sensitive personal data, including medical records, government issued identifiers (e g , social security numbers), and other high risk personal data supported cloud environments azure openai aws bedrock google cloud platform databricks llm deployment on prem / self hosted llms (example models from hugging face) supported task types chatbot instruction question and answering summarization visual (for multi modal models) reporting json report machine readable format optimized for easy ingestion into mlops pipelines contains risk classifications, attack success rate / rejection rate, example prompts and metadata (model version, environment, timestamp) pdf report executive summary for enterprise and leadership teams includes overall risk posture, category wise risk breakdown, key observations & insights, recommendations for remediation dashboard interface providing consolidated view of all llm assessments, per model risk overview, category wise details (safety, security, privacy) compliance and standards coverage details owasp top 10 for generative ai/llm, eu ai act, mitre atlas, nist ai risk management framework, iso/iec 42001

Model Security

Getting started