Document Analysis

Parse PDF, DOCX, TXT, and CSV documents, detect PII per page, and download a redacted copy.

Authentication

Document endpoints require a user JWT (Authorization: Bearer YOUR_JWT).

Analyze a Document

POST /document-analysis/analyze

Extracts text (with optional OCR for scanned pages), scans it for PII and safety issues, and optionally produces a redacted copy.

Parameter	Type	Required	Description
`documentBase64`	string	Yes	Base64-encoded document
`documentType`	string	Yes	`pdf`, `docx`, `txt`, or `csv`
`processingMode`	string	No	`text` (default) or `visual`
`redact`	boolean	No	Produce a redacted copy (default true)
`parseOptions`	object	No	OCR controls: `ocrScannedPages`, `ocrLanguage`, `ocrEngine`

cURL

curl -X POST https://api.cid222.ai/document-analysis/analyze \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "documentBase64": "JVBERi0xLjQK...",
    "documentType": "pdf",
    "redact": true
  }'

Response

Returns an analysis ID, the overall action, per-page detections, and the extracted and redacted text.

{
  "analysisId": "a1b2c3d4-...",
  "action": "mask",
  "detections": [
    { "type": "EMAIL", "value": "[EMAIL]", "action": "mask", "confidence": 0.99 }
  ],
  "extractedText": "...",
  "redactedText": "...",
  "pageResults": [
    { "page": 1, "detections": 3 }
  ],
  "processingTimeMs": 2540
}

Supported Types

GET /document-analysis/supported-types

Returns the document types CID222 can parse and whether each supports OCR.

Download Redacted Document

GET /document-analysis/:id/download

Streams the redacted document produced by a previous analysis, using its analysisId.

curl -X GET https://api.cid222.ai/document-analysis/a1b2c3d4-.../download \
  -H "Authorization: Bearer YOUR_JWT" \
  -o redacted.pdf