Content Detection
CID222 exposes its detection engine directly through the Guardrails API, so you can scan any text for PII, secrets, toxicity, and prompt injection without sending it to an LLM. The same engine runs automatically on every chat request.
Detect Content
POST /api/v1/guardrails/detect
Runs the full detection pipeline — PII, secrets, toxicity, and jailbreak — over a block of text and returns the entities found, the decided action, and a masked copy of the text.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to analyze (1–50,000 characters) |
check_type | string | No | prompt (default) or response |
filter | string | No | Filter nickname to apply instead of the tenant default |
hap_version | number | No | Toxicity model version, 1 or 2 (default 2) |
Example Request
curl -X POST https://api.cid222.ai/api/v1/guardrails/detect \-H "Authorization: Bearer cid_key_your_api_key" \-H "Content-Type: application/json" \-d '{"text": "My name is John Smith and my email is john@example.com","check_type": "prompt"}'
Response
The response reports the overall action, a count, the list of detected entities, and a masked version of the input.
{"action": "mask","detectionCount": 2,"maskedText": "My name is [PERSON_NAME] and my email is [EMAIL]","detectedEntities": [{"type": "PERSON_NAME","value": "[PERSON_NAME]","action": "mask","confidence": 0.95,"source": "pii_detector","start": 11,"end": 21},{"type": "EMAIL","value": "[EMAIL]","action": "mask","confidence": 0.99,"source": "pii_detector","start": 39,"end": 55}],"processingTimeMs": 142,"normalization": { "applied": false }}
Raw values are never returned
Detected Entity Fields
| Field | Type | Description |
|---|---|---|
type | string | Detection category (e.g. EMAIL, PERSON_NAME, jailbreak, violence) |
value | string | Redacted marker (e.g. [EMAIL]) — never the raw value |
action | string | allow, flag, mask, or reject |
confidence | number | Model confidence, 0–1 |
source | string | pii_detector, hap_detector, jailbreak_detector, risk_guardian, code_rule_detector |
start, end | number | Character offsets in the input (optional) |
filterName | string | Security filter that matched (optional) |
Detection Actions
| Action | Description |
|---|---|
allow | No action — the content is clean or below threshold. |
flag | Log the detection but allow the content through unchanged (for review and audit). |
mask | Redact the entity from the text (e.g. replace with [EMAIL]) and continue. |
reject | Block the content. In /detect the response action is reject; in chat the stream emits input_rejected or output_rejected. |
Action priority
PII Entity Types
The ONNX NER model plus regex validators detect 21+ entity types across contact, identity, financial, location, and network data (Turkish formats included):
| Type | Description | Examples |
|---|---|---|
EMAIL | Email addresses | user@domain.com |
PHONE | Phone numbers (intl. + Turkish) | +1-555-123-4567 |
PERSON_NAME | Person names | John Smith |
SSN | US Social Security Number | 123-45-6789 |
TC_KIMLIK | Turkish national ID | 12345678901 |
PASSPORT | Passport numbers | U1234567 |
CREDIT_CARD | Card numbers (Luhn-checked) | 4111 1111 1111 1111 |
IBAN / BANK_ACCOUNT | Bank accounts | DE89 3704 0044 0532 0130 00 |
LOCATION | Addresses, cities, regions | 123 Main St, New York |
IP_ADDRESS | IPv4 / IPv6 | 192.168.1.1, 2001:db8::1 |
MAC_ADDRESS / URL | Network identifiers | 00:1A:2B:3C:4D:5E |
LICENSE_PLATE | Vehicle plates | 34 ABC 123 |
MRN | Medical record number | MRN-12345678 |
CRYPTO_ADDRESS | Crypto wallet addresses | bc1q… |
ORGANIZATION / DATE_TIME | Orgs and dates | Acme Corp, 01/15/1990 |
Secret & Credential Detection (DLP)
A dedicated DLP layer flags leaked secrets and credentials in prompts and responses:
| Type | Description |
|---|---|
API_KEY | Provider API keys (OpenAI, AWS, generic) |
PASSWORD | Inline passwords / secret assignments |
PRIVATE_KEY | RSA / OpenSSH private keys |
CONNECTION_STRING | Postgres, MySQL, MongoDB, Redis URIs |
OAUTH_TOKEN | JWT / Bearer tokens |
WEBHOOK_SECRET | Stripe / generic webhook secrets |
CVE_ID | CVE vulnerability identifiers |
Safety Categories
Beyond PII and secrets, the safety models classify harmful content (13 toxicity labels) and adversarial prompts (jailbreak, injection):
| Label | Source | Description |
|---|---|---|
violence | HAP | Unlawful violence toward people or animals |
hate | HAP | Hate speech targeting protected attributes |
self_harm | HAP | Suicide, self-harm, eating disorders |
sexual_content | HAP | Adult / explicit sexual content |
sexual_crime | HAP | Non-consensual sexual acts, trafficking |
child_exploitation | HAP | CSAM |
crime | HAP | Non-violent crimes (financial, property, drug) |
cyber_crimes | HAP | Hacking, malware, DDoS |
weapons | HAP | WMD, illegal arms |
dangerous_advice | HAP | Dangerous specialized advice |
privacy | HAP | Tracking, doxxing, identity theft |
intellectual_property | HAP | Piracy, plagiarism, counterfeiting |
defamation | HAP | Reputation-damaging falsehoods |
jailbreak | Attack Guard | DAN-style role-play, system-prompt extraction |
injection | Attack Guard | Prompt-injection attempts |
Confidence Scores
Each detection includes a confidence score between 0 and 1:
- > 0.9 — High confidence, action applied automatically
- 0.7 - 0.9 — Medium confidence, may require review
- < 0.7 — Low confidence, typically flagged only
Query Detection Logs
Admins can query stored detections and aggregate statistics: GET /admin/detections, GET /admin/detections/stats
| Parameter | Type | Description |
|---|---|---|
detection_type | string | Filter by detection category |
action | string | Filter by action taken (flag, mask, reject) |
tenant_id | string | Filter by tenant |
start_date, end_date | string | ISO 8601 date range |
curl -X GET "https://api.cid222.ai/admin/detections?detection_type=EMAIL&action=mask" \-H "Authorization: Bearer YOUR_ADMIN_JWT"