Skip to main content
CID222Documentation

Content Detection

CID222 exposes its detection engine directly through the Guardrails API, so you can scan any text for PII, secrets, toxicity, and prompt injection without sending it to an LLM. The same engine runs automatically on every chat request.

Detect Content

POST /api/v1/guardrails/detect

Runs the full detection pipeline — PII, secrets, toxicity, and jailbreak — over a block of text and returns the entities found, the decided action, and a masked copy of the text.

Request Body

ParameterTypeRequiredDescription
textstringYesText to analyze (1–50,000 characters)
check_typestringNoprompt (default) or response
filterstringNoFilter nickname to apply instead of the tenant default
hap_versionnumberNoToxicity model version, 1 or 2 (default 2)

Example Request

cURL
curl -X POST https://api.cid222.ai/api/v1/guardrails/detect \
-H "Authorization: Bearer cid_key_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"text": "My name is John Smith and my email is john@example.com",
"check_type": "prompt"
}'

Response

The response reports the overall action, a count, the list of detected entities, and a masked version of the input.

{
"action": "mask",
"detectionCount": 2,
"maskedText": "My name is [PERSON_NAME] and my email is [EMAIL]",
"detectedEntities": [
{
"type": "PERSON_NAME",
"value": "[PERSON_NAME]",
"action": "mask",
"confidence": 0.95,
"source": "pii_detector",
"start": 11,
"end": 21
},
{
"type": "EMAIL",
"value": "[EMAIL]",
"action": "mask",
"confidence": 0.99,
"source": "pii_detector",
"start": 39,
"end": 55
}
],
"processingTimeMs": 142,
"normalization": { "applied": false }
}

Raw values are never returned

Detected entities expose only a redacted marker in value (e.g. [EMAIL]). The original sensitive text is never echoed back in the response.

Detected Entity Fields

FieldTypeDescription
typestringDetection category (e.g. EMAIL, PERSON_NAME, jailbreak, violence)
valuestringRedacted marker (e.g. [EMAIL]) — never the raw value
actionstringallow, flag, mask, or reject
confidencenumberModel confidence, 0–1
sourcestringpii_detector, hap_detector, jailbreak_detector, risk_guardian, code_rule_detector
start, endnumberCharacter offsets in the input (optional)
filterNamestringSecurity filter that matched (optional)

Detection Actions

ActionDescription
allowNo action — the content is clean or below threshold.
flagLog the detection but allow the content through unchanged (for review and audit).
maskRedact the entity from the text (e.g. replace with [EMAIL]) and continue.
rejectBlock the content. In /detect the response action is reject; in chat the stream emits input_rejected or output_rejected.

Action priority

When multiple entities match, the strongest action wins: reject > mask > flag > allow.

PII Entity Types

The ONNX NER model plus regex validators detect 21+ entity types across contact, identity, financial, location, and network data (Turkish formats included):

TypeDescriptionExamples
EMAILEmail addressesuser@domain.com
PHONEPhone numbers (intl. + Turkish)+1-555-123-4567
PERSON_NAMEPerson namesJohn Smith
SSNUS Social Security Number123-45-6789
TC_KIMLIKTurkish national ID12345678901
PASSPORTPassport numbersU1234567
CREDIT_CARDCard numbers (Luhn-checked)4111 1111 1111 1111
IBAN / BANK_ACCOUNTBank accountsDE89 3704 0044 0532 0130 00
LOCATIONAddresses, cities, regions123 Main St, New York
IP_ADDRESSIPv4 / IPv6192.168.1.1, 2001:db8::1
MAC_ADDRESS / URLNetwork identifiers00:1A:2B:3C:4D:5E
LICENSE_PLATEVehicle plates34 ABC 123
MRNMedical record numberMRN-12345678
CRYPTO_ADDRESSCrypto wallet addressesbc1q…
ORGANIZATION / DATE_TIMEOrgs and datesAcme Corp, 01/15/1990
Custom regex patterns can be added per filter for organization-specific identifiers.

Secret & Credential Detection (DLP)

A dedicated DLP layer flags leaked secrets and credentials in prompts and responses:

TypeDescription
API_KEYProvider API keys (OpenAI, AWS, generic)
PASSWORDInline passwords / secret assignments
PRIVATE_KEYRSA / OpenSSH private keys
CONNECTION_STRINGPostgres, MySQL, MongoDB, Redis URIs
OAUTH_TOKENJWT / Bearer tokens
WEBHOOK_SECRETStripe / generic webhook secrets
CVE_IDCVE vulnerability identifiers

Safety Categories

Beyond PII and secrets, the safety models classify harmful content (13 toxicity labels) and adversarial prompts (jailbreak, injection):

LabelSourceDescription
violenceHAPUnlawful violence toward people or animals
hateHAPHate speech targeting protected attributes
self_harmHAPSuicide, self-harm, eating disorders
sexual_contentHAPAdult / explicit sexual content
sexual_crimeHAPNon-consensual sexual acts, trafficking
child_exploitationHAPCSAM
crimeHAPNon-violent crimes (financial, property, drug)
cyber_crimesHAPHacking, malware, DDoS
weaponsHAPWMD, illegal arms
dangerous_adviceHAPDangerous specialized advice
privacyHAPTracking, doxxing, identity theft
intellectual_propertyHAPPiracy, plagiarism, counterfeiting
defamationHAPReputation-damaging falsehoods
jailbreakAttack GuardDAN-style role-play, system-prompt extraction
injectionAttack GuardPrompt-injection attempts

Confidence Scores

Each detection includes a confidence score between 0 and 1:

  • > 0.9 — High confidence, action applied automatically
  • 0.7 - 0.9 — Medium confidence, may require review
  • < 0.7 — Low confidence, typically flagged only
Confidence thresholds are configurable per filter. See Content Safety Pipeline for details.

Query Detection Logs

Admins can query stored detections and aggregate statistics: GET /admin/detections, GET /admin/detections/stats

This endpoint requires admin permissions.
ParameterTypeDescription
detection_typestringFilter by detection category
actionstringFilter by action taken (flag, mask, reject)
tenant_idstringFilter by tenant
start_date, end_datestringISO 8601 date range
Query Detections (admin JWT)
curl -X GET "https://api.cid222.ai/admin/detections?detection_type=EMAIL&action=mask" \
-H "Authorization: Bearer YOUR_ADMIN_JWT"