PII Detection

CID222 automatically detects and protects Personally Identifiable Information (PII) and Protected Health Information (PHI) across 15+ entity types with configurable handling policies.

Supported Entity Types

Personal Identifiers

Entity	Description	Detection Method
`PERSON`	Full names, including titles	NER
`EMAIL`	Email addresses	Regex + validation
`PHONE`	Phone numbers (intl formats)	Regex + libphonenumber
`SSN`	Social Security Numbers (US)	Regex + checksum
`DATE_OF_BIRTH`	Birth dates	NER + context
`PASSPORT`	Passport numbers	Regex by country
`TC_KIMLIK`	Turkish national ID	Regex + checksum

Financial Data

Entity	Description	Detection Method
`CREDIT_CARD`	Credit/debit card numbers	Regex + Luhn
`IBAN`	International bank accounts	Regex + checksum
`BANK_ACCOUNT`	Domestic bank accounts	Regex + context
`TAX_ID`	Tax identification numbers	Regex by country

Location Data

Entity	Description	Detection Method
`LOCATION`	Addresses, cities, regions	NER + patterns
`IP_ADDRESS`	IPv4 and IPv6 addresses	Regex
`MAC_ADDRESS` / `URL`	Network identifiers	Regex
`LICENSE_PLATE`	Vehicle plates (Turkish format)	Regex

Health Data (PHI)

Entity	Description	Detection Method
`MRN`	Medical record numbers	Regex + context

Secrets are detected too

Beyond classic PII, a DLP layer detects leaked secrets — API keys, passwords, private keys, connection strings, OAuth tokens. See Content Detection for the full catalog.

Detection Accuracy

CID222 combines deterministic regex validators with ONNX NER models. Accuracy depends on entity type and language; structured identifiers are near-deterministic, while free-text entities rely on the NER model.

Structured identifiers — Emails, cards (Luhn), IBANs and IPs are validated by regex + checksums and are highly precise.
Contextual entities — Names, locations and organizations come from the multilingual ONNX NER model and depend on surrounding context.
Confidence scoring — Every detection carries a 0–1 confidence; thresholds are configurable per filter to balance precision and recall.

Masking Formats

Detected PII is replaced with type-specific placeholders:

Masking Examples

Original: "Contact john.smith@company.com or call 555-123-4567"
Masked:   "Contact [EMAIL] or call [PHONE]"

Original: "My SSN is 123-45-6789 and credit card is 4111-1111-1111-1111"
Masked:   "My SSN is [SSN] and credit card is [CREDIT_CARD]"

Original: "Send to John Smith at 123 Main St, New York"
Masked:   "Send to [PERSON_NAME] at [LOCATION]"

Custom Patterns

Add custom regex patterns for organization-specific identifiers:

Custom Pattern Example

{
  "name": "Employee ID",
  "entity_type": "EMPLOYEE_ID",
  "pattern": "EMP-[0-9]{6}",
  "action": "mask",
  "confidence": 0.95
}

Test custom patterns in the dashboard's Filter Testing tool before deploying to production.

Context Awareness

CID222's detection is context-aware to reduce false positives:

Number sequences — "Call 911" won't trigger phone detection
Code contexts — IP addresses in code comments may be allowed
Example data — "example@example.com" can be exempted
Business context — Company names vs. person names