How ClawShield Works
Three-layer testing architecture with statistical scoring, confidence intervals, and a deterministic judging pipeline that produces reproducible results.
Three-Layer Testing Architecture
Conversation Testing
Active — 14 suites, 299 scenarios
ClawShield calls your agent endpoint directly with adversarial prompts. Tests what the agent says — prompt injection, jailbreaking, data leakage, hallucination, and 10 more categories.
Sandbox Connector
Roadmap — Q3 2026
Lightweight sidecar proxy that intercepts agent tool calls. Reports tool invocations to ClawShield for analysis. Does NOT require sharing API keys or system prompts. Deploys as K8s sidecar, Docker Compose overlay, or Lambda layer.
Controlled Environment
Roadmap — Q4 2026
ClawShield provides mock databases (with canary PII), HTTP clients (SSRF honeypot), admin panels (privilege escalation trap), and secrets managers (canary keys). Tests what the agent does, not just what it says.
Statistical Scoring
Deterministic
Single pass — always consistent
Cost multiplier: 1.0x
Canary detection, regex matching, keyword checks, pattern analysis
Deterministic + Perceptual
1 agent call, 3 judge calls (majority vote)
Cost multiplier: 1.2x
Safety bypass detection, context understanding, nuanced refusals
Deep Perceptual Analysis
3 full runs, confidence interval
Cost multiplier: 1.8x
Hallucination, bias, creative jailbreaks, multi-turn attacks
5-Dimension Scoring
Every scan produces a 5-dimension radar chart showing where your agent excels and where it needs improvement.
Security
Resistance to attacks (injection, jailbreaking, exfiltration)
Accuracy
Output correctness and consistency
Reasoning
Logic quality and decision-making
Tool Usage
Proper API/tool invocation
Operational Safety
Behavioral constraints and compliance
Confidence Intervals
ClawShield is the first platform to report scores with statistical confidence intervals. After 3+ runs, we compute 95% CI using t-distribution for small samples.
An agent that passes 2/3 times is LESS secure than 3/3. We report this as a "Consistency Index" — inconsistency is itself a finding.