promptguard.schemas¶

Data Models¶

class promptguard.RiskScore(is_malicious, probability, risk_level, confidence, explanation, metadata=None)[source]¶

Bases: object

Result of a single-prompt security analysis.

is_malicious: bool¶: True when the model probability exceeds the configured threshold.

probability: float¶: Malicious probability in [0.0, 1.0] from the model.

risk_level: RiskLevel¶: Coarse-grained RiskLevel derived from probability.

confidence: float¶: Distance from the decision boundary, scaled to [0.0, 1.0].

explanation: str¶: Human-readable summary with evidence.

metadata: Dict[str, Any] | None = None¶: Optional per-analyser detail (sentiment, intent, keywords, attack_patterns).

to_dict()[source]¶

Serialise to a plain dictionary.

class promptguard.SanitizationResult(original, sanitized, was_modified, removed_patterns, strategy, confidence, risk_reduction)[source]¶

Bases: object

Outcome of a single prompt sanitisation operation.

original: str¶: The original (pre-sanitisation) prompt text.

sanitized: str¶: The cleaned prompt text.

was_modified: bool¶: True when sanitized differs from original.

removed_patterns: List[str]¶: Fragments of text that were matched and removed or replaced.

strategy: SanitizationStrategy¶: The SanitizationStrategy that was applied.

confidence: float¶: Estimated confidence that the sanitised prompt is safe ([0.0, 1.0]).

risk_reduction: float¶: Estimated reduction in risk ([0.0, 1.0]).

to_dict()[source]¶

Serialise to a plain dictionary.

class promptguard.SanitizeResponse(sanitization, original_analysis, sanitized_analysis, risk_before, risk_after, risk_reduction)[source]¶

Bases: object

Typed result returned by PromptGuard.sanitize().

sanitization: SanitizationResult¶: Detailed sanitisation outcome.

original_analysis: RiskScore¶: RiskScore for the original prompt.

sanitized_analysis: RiskScore | None¶: RiskScore for the sanitised prompt, or None when analyze_after was False.

risk_before: float¶: Malicious probability of the original prompt.

risk_after: float | None¶: Malicious probability after sanitisation, or None.

risk_reduction: float¶: Difference risk_before - risk_after (0.0 when unchanged).

Enumerations¶

class promptguard.RiskLevel(*values)[source]¶

Bases: str, Enum

Categorised risk level returned by the classifier.

LOW = 'low'¶

MEDIUM = 'medium'¶

HIGH = 'high'¶

class promptguard.Intent(*values)[source]¶

Bases: str, Enum

Detected intent of the analysed prompt.

QUESTION = 'question'¶

INSTRUCTION = 'instruction'¶

CONVERSATION = 'conversation'¶

JAILBREAK = 'jailbreak'¶

INJECTION = 'injection'¶

UNKNOWN = 'unknown'¶

class promptguard.Sentiment(*values)[source]¶

Bases: str, Enum

Detected sentiment of the analysed prompt.

POSITIVE = 'positive'¶

NEUTRAL = 'neutral'¶

NEGATIVE = 'negative'¶

class promptguard.SanitizationStrategy(*values)[source]¶

Bases: str, Enum

Strategy controlling how aggressively a prompt is sanitised.

CONSERVATIVE = 'conservative'¶

BALANCED = 'balanced'¶

MINIMAL = 'minimal'¶