promptguard.core¶

class promptguard.PromptGuard(model_name='arkaean/promptguard-distilbert', threshold=0.5, device='auto', use_cache=True, cache_size=10000, cache_ttl=3600, enable_analysis=True, enable_sanitization=True, **kwargs)[source]¶

Bases: object

Main PromptGuard classifier for detecting malicious prompts.

__init__(model_name='arkaean/promptguard-distilbert', threshold=0.5, device='auto', use_cache=True, cache_size=10000, cache_ttl=3600, enable_analysis=True, enable_sanitization=True, **kwargs)[source]¶

Initialise the PromptGuard classifier.

Parameters:

model_name (str) – HuggingFace Hub model identifier.
threshold (float) – Malicious classification threshold in [0.0, 1.0]. Prompts with a model probability at or above this value are classified as malicious.
device (str | None) – Device for model inference — "cuda", "cpu", or "auto" (selects CUDA when available).
use_cache (bool) – Enable in-memory LRU caching of analysis results.
cache_size (int) – Maximum number of entries held in the cache.
cache_ttl (int | None) – Cache entry time-to-live in seconds. Pass None to disable expiry.
enable_analysis (bool) – When True (default), enables supplementary sentiment, intent, keyword, and attack-pattern analysis that enriches RiskScore metadata.
enable_sanitization (bool) – When True (default), enables the sanitize() and sanitize_if_malicious() methods.
**kwargs – Additional options forwarded to PromptGuardConfig.

sanitize(prompt, strategy=SanitizationStrategy.BALANCED, analyze_after=True)[source]¶

Sanitise a potentially malicious prompt.

Parameters:

prompt (str) – Prompt to sanitise.
strategy (SanitizationStrategy) – Sanitisation strategy to apply.
analyze_after (bool) – When True (default), the sanitised prompt is re-analysed and the result stored in SanitizeResponse.sanitized_analysis.

Returns:

A SanitizeResponse with the sanitisation outcome and before/after risk scores.

Raises:

ValueError – When sanitisation is not enabled on this instance.

Return type:

SanitizeResponse

sanitize_if_malicious(prompt, strategy=SanitizationStrategy.BALANCED)[source]¶

Sanitize prompt only if it’s detected as malicious.

Parameters:

prompt (str) – Prompt to check and potentially sanitize
strategy (SanitizationStrategy) – Sanitization strategy if needed

Returns:

Tuple of (potentially_sanitized_prompt, was_sanitized)

Return type:

Tuple[str, bool]

analyze(prompt)[source]¶

Analyze a single prompt for malicious content.

clear_cache()[source]¶

Clear the analysis cache

cache_stats()[source]¶

Get cache statistics

classify(prompt, threshold=None)[source]¶

Simple binary classification.

analyze_batch(prompts, batch_size=None, show_progress=True)[source]¶

Analyze multiple prompts efficiently in batches, using cache when enabled.

classify_batch(prompts, threshold=None, show_progress=False)[source]¶

Simple binary classification for multiple prompts

property device: str¶: Get the device being used for inference.

property threshold: float¶: Get current classification threshold.