Skip to main content
CID222Documentation

PII Detection

CID222 automatically detects and protects Personally Identifiable Information (PII) and Protected Health Information (PHI) across 15+ entity types with configurable handling policies.

Supported Entity Types

Personal Identifiers

EntityDescriptionDetection Method
PERSONFull names, including titlesNER
EMAILEmail addressesRegex + validation
PHONEPhone numbers (intl formats)Regex + libphonenumber
SSNSocial Security Numbers (US)Regex + checksum
DATE_OF_BIRTHBirth datesNER + context
PASSPORTPassport numbersRegex by country
TC_KIMLIKTurkish national IDRegex + checksum

Financial Data

EntityDescriptionDetection Method
CREDIT_CARDCredit/debit card numbersRegex + Luhn
IBANInternational bank accountsRegex + checksum
BANK_ACCOUNTDomestic bank accountsRegex + context
TAX_IDTax identification numbersRegex by country

Location Data

EntityDescriptionDetection Method
LOCATIONAddresses, cities, regionsNER + patterns
IP_ADDRESSIPv4 and IPv6 addressesRegex
MAC_ADDRESS / URLNetwork identifiersRegex
LICENSE_PLATEVehicle plates (Turkish format)Regex

Health Data (PHI)

EntityDescriptionDetection Method
MRNMedical record numbersRegex + context

Secrets are detected too

Beyond classic PII, a DLP layer detects leaked secrets — API keys, passwords, private keys, connection strings, OAuth tokens. See Content Detection for the full catalog.

Detection Accuracy

CID222 combines deterministic regex validators with ONNX NER models. Accuracy depends on entity type and language; structured identifiers are near-deterministic, while free-text entities rely on the NER model.

  • Structured identifiersEmails, cards (Luhn), IBANs and IPs are validated by regex + checksums and are highly precise.
  • Contextual entitiesNames, locations and organizations come from the multilingual ONNX NER model and depend on surrounding context.
  • Confidence scoringEvery detection carries a 0–1 confidence; thresholds are configurable per filter to balance precision and recall.

Masking Formats

Detected PII is replaced with type-specific placeholders:

Masking Examples
Original: "Contact john.smith@company.com or call 555-123-4567"
Masked: "Contact [EMAIL] or call [PHONE]"
Original: "My SSN is 123-45-6789 and credit card is 4111-1111-1111-1111"
Masked: "My SSN is [SSN] and credit card is [CREDIT_CARD]"
Original: "Send to John Smith at 123 Main St, New York"
Masked: "Send to [PERSON_NAME] at [LOCATION]"

Custom Patterns

Add custom regex patterns for organization-specific identifiers:

Custom Pattern Example
{
"name": "Employee ID",
"entity_type": "EMPLOYEE_ID",
"pattern": "EMP-[0-9]{6}",
"action": "mask",
"confidence": 0.95
}
Test custom patterns in the dashboard's Filter Testing tool before deploying to production.

Context Awareness

CID222's detection is context-aware to reduce false positives:

  • Number sequences — "Call 911" won't trigger phone detection
  • Code contexts — IP addresses in code comments may be allowed
  • Example data — "example@example.com" can be exempted
  • Business context — Company names vs. person names