Content Detection

CID222 exposes its detection engine directly through the Guardrails API, so you can scan any text for PII, secrets, toxicity, and prompt injection without sending it to an LLM. The same engine runs automatically on every chat request.

Detect Content

POST /api/v1/guardrails/detect

Runs the full detection pipeline — PII, secrets, toxicity, and jailbreak — over a block of text and returns the entities found, the decided action, and a masked copy of the text.

Request Body

Parameter	Type	Required	Description
`text`	string	Yes	Text to analyze (1–50,000 characters)
`check_type`	string	No	`prompt` (default) or `response`
`filter`	string	No	Filter nickname to apply instead of the tenant default
`hap_version`	number	No	Toxicity model version, 1 or 2 (default 2)

Example Request

cURL

curl -X POST https://api.cid222.ai/api/v1/guardrails/detect \
  -H "Authorization: Bearer cid_key_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "My name is John Smith and my email is john@example.com",
    "check_type": "prompt"
  }'

Response

The response reports the overall action, a count, the list of detected entities, and a masked version of the input.

{
  "action": "mask",
  "detectionCount": 2,
  "maskedText": "My name is [PERSON_NAME] and my email is [EMAIL]",
  "detectedEntities": [
    {
      "type": "PERSON_NAME",
      "value": "[PERSON_NAME]",
      "action": "mask",
      "confidence": 0.95,
      "source": "pii_detector",
      "start": 11,
      "end": 21
    },
    {
      "type": "EMAIL",
      "value": "[EMAIL]",
      "action": "mask",
      "confidence": 0.99,
      "source": "pii_detector",
      "start": 39,
      "end": 55
    }
  ],
  "processingTimeMs": 142,
  "normalization": { "applied": false }
}

Raw values are never returned

Detected entities expose only a redacted marker in value (e.g. [EMAIL]). The original sensitive text is never echoed back in the response.

Detected Entity Fields

Field	Type	Description
`type`	string	Detection category (e.g. `EMAIL`, `PERSON_NAME`, `jailbreak`, `violence`)
`value`	string	Redacted marker (e.g. `[EMAIL]`) — never the raw value
`action`	string	`allow`, `flag`, `mask`, or `reject`
`confidence`	number	Model confidence, 0–1
`source`	string	`pii_detector`, `hap_detector`, `jailbreak_detector`, `risk_guardian`, `code_rule_detector`
`start`, `end`	number	Character offsets in the input (optional)
`filterName`	string	Security filter that matched (optional)

Detection Actions

Action	Description
`allow`	No action — the content is clean or below threshold.
`flag`	Log the detection but allow the content through unchanged (for review and audit).
`mask`	Redact the entity from the text (e.g. replace with [EMAIL]) and continue.
`reject`	Block the content. In /detect the response action is reject; in chat the stream emits input_rejected or output_rejected.

Action priority

When multiple entities match, the strongest action wins: reject > mask > flag > allow.

PII Entity Types

The ONNX NER model plus regex validators detect 21+ entity types across contact, identity, financial, location, and network data (Turkish formats included):

Type	Description	Examples
`EMAIL`	Email addresses	user@domain.com
`PHONE`	Phone numbers (intl. + Turkish)	+1-555-123-4567
`PERSON_NAME`	Person names	John Smith
`SSN`	US Social Security Number	123-45-6789
`TC_KIMLIK`	Turkish national ID	12345678901
`PASSPORT`	Passport numbers	U1234567
`CREDIT_CARD`	Card numbers (Luhn-checked)	4111 1111 1111 1111
`IBAN` / `BANK_ACCOUNT`	Bank accounts	DE89 3704 0044 0532 0130 00
`LOCATION`	Addresses, cities, regions	123 Main St, New York
`IP_ADDRESS`	IPv4 / IPv6	192.168.1.1, 2001:db8::1
`MAC_ADDRESS` / `URL`	Network identifiers	00:1A:2B:3C:4D:5E
`LICENSE_PLATE`	Vehicle plates	34 ABC 123
`MRN`	Medical record number	MRN-12345678
`CRYPTO_ADDRESS`	Crypto wallet addresses	bc1q…
`ORGANIZATION` / `DATE_TIME`	Orgs and dates	Acme Corp, 01/15/1990

Custom regex patterns can be added per filter for organization-specific identifiers.

Secret & Credential Detection (DLP)

A dedicated DLP layer flags leaked secrets and credentials in prompts and responses:

Type	Description
`API_KEY`	Provider API keys (OpenAI, AWS, generic)
`PASSWORD`	Inline passwords / secret assignments
`PRIVATE_KEY`	RSA / OpenSSH private keys
`CONNECTION_STRING`	Postgres, MySQL, MongoDB, Redis URIs
`OAUTH_TOKEN`	JWT / Bearer tokens
`WEBHOOK_SECRET`	Stripe / generic webhook secrets
`CVE_ID`	CVE vulnerability identifiers

Safety Categories

Beyond PII and secrets, the safety models classify harmful content (13 toxicity labels) and adversarial prompts (jailbreak, injection):

Label	Source	Description
`violence`	HAP	Unlawful violence toward people or animals
`hate`	HAP	Hate speech targeting protected attributes
`self_harm`	HAP	Suicide, self-harm, eating disorders
`sexual_content`	HAP	Adult / explicit sexual content
`sexual_crime`	HAP	Non-consensual sexual acts, trafficking
`child_exploitation`	HAP	CSAM
`crime`	HAP	Non-violent crimes (financial, property, drug)
`cyber_crimes`	HAP	Hacking, malware, DDoS
`weapons`	HAP	WMD, illegal arms
`dangerous_advice`	HAP	Dangerous specialized advice
`privacy`	HAP	Tracking, doxxing, identity theft
`intellectual_property`	HAP	Piracy, plagiarism, counterfeiting
`defamation`	HAP	Reputation-damaging falsehoods
`jailbreak`	Attack Guard	DAN-style role-play, system-prompt extraction
`injection`	Attack Guard	Prompt-injection attempts

Confidence Scores

Each detection includes a confidence score between 0 and 1:

> 0.9 — High confidence, action applied automatically
0.7 - 0.9 — Medium confidence, may require review
< 0.7 — Low confidence, typically flagged only

Confidence thresholds are configurable per filter. See Content Safety Pipeline for details.

Query Detection Logs

Admins can query stored detections and aggregate statistics: GET /admin/detections, GET /admin/detections/stats

This endpoint requires admin permissions.

Parameter	Type	Description
`detection_type`	string	Filter by detection category
`action`	string	Filter by action taken (flag, mask, reject)
`tenant_id`	string	Filter by tenant
`start_date`, `end_date`	string	ISO 8601 date range

Query Detections (admin JWT)

curl -X GET "https://api.cid222.ai/admin/detections?detection_type=EMAIL&action=mask" \
  -H "Authorization: Bearer YOUR_ADMIN_JWT"