PII Detection
CID222 automatically detects and protects Personally Identifiable Information (PII) and Protected Health Information (PHI) across 15+ entity types with configurable handling policies.
Supported Entity Types
Personal Identifiers
| Entity | Description | Detection Method |
|---|---|---|
PERSON | Full names, including titles | NER |
EMAIL | Email addresses | Regex + validation |
PHONE | Phone numbers (intl formats) | Regex + libphonenumber |
SSN | Social Security Numbers (US) | Regex + checksum |
DATE_OF_BIRTH | Birth dates | NER + context |
PASSPORT | Passport numbers | Regex by country |
TC_KIMLIK | Turkish national ID | Regex + checksum |
Financial Data
| Entity | Description | Detection Method |
|---|---|---|
CREDIT_CARD | Credit/debit card numbers | Regex + Luhn |
IBAN | International bank accounts | Regex + checksum |
BANK_ACCOUNT | Domestic bank accounts | Regex + context |
TAX_ID | Tax identification numbers | Regex by country |
Location Data
| Entity | Description | Detection Method |
|---|---|---|
LOCATION | Addresses, cities, regions | NER + patterns |
IP_ADDRESS | IPv4 and IPv6 addresses | Regex |
MAC_ADDRESS / URL | Network identifiers | Regex |
LICENSE_PLATE | Vehicle plates (Turkish format) | Regex |
Health Data (PHI)
| Entity | Description | Detection Method |
|---|---|---|
MRN | Medical record numbers | Regex + context |
Secrets are detected too
Beyond classic PII, a DLP layer detects leaked secrets — API keys, passwords, private keys, connection strings, OAuth tokens. See Content Detection for the full catalog.
Detection Accuracy
CID222 combines deterministic regex validators with ONNX NER models. Accuracy depends on entity type and language; structured identifiers are near-deterministic, while free-text entities rely on the NER model.
- Structured identifiers — Emails, cards (Luhn), IBANs and IPs are validated by regex + checksums and are highly precise.
- Contextual entities — Names, locations and organizations come from the multilingual ONNX NER model and depend on surrounding context.
- Confidence scoring — Every detection carries a 0–1 confidence; thresholds are configurable per filter to balance precision and recall.
Masking Formats
Detected PII is replaced with type-specific placeholders:
Masking Examples
Original: "Contact john.smith@company.com or call 555-123-4567"Masked: "Contact [EMAIL] or call [PHONE]"Original: "My SSN is 123-45-6789 and credit card is 4111-1111-1111-1111"Masked: "My SSN is [SSN] and credit card is [CREDIT_CARD]"Original: "Send to John Smith at 123 Main St, New York"Masked: "Send to [PERSON_NAME] at [LOCATION]"
Custom Patterns
Add custom regex patterns for organization-specific identifiers:
Custom Pattern Example
{"name": "Employee ID","entity_type": "EMPLOYEE_ID","pattern": "EMP-[0-9]{6}","action": "mask","confidence": 0.95}
Test custom patterns in the dashboard's Filter Testing tool before deploying to production.
Context Awareness
CID222's detection is context-aware to reduce false positives:
- Number sequences — "Call 911" won't trigger phone detection
- Code contexts — IP addresses in code comments may be allowed
- Example data — "example@example.com" can be exempted
- Business context — Company names vs. person names