Skip to main content
CID222Documentation

Document Analysis

Parse PDF, DOCX, TXT, and CSV documents, detect PII per page, and download a redacted copy.

Authentication

Document endpoints require a user JWT (Authorization: Bearer YOUR_JWT).

Analyze a Document

POST /document-analysis/analyze

Extracts text (with optional OCR for scanned pages), scans it for PII and safety issues, and optionally produces a redacted copy.

ParameterTypeRequiredDescription
documentBase64stringYesBase64-encoded document
documentTypestringYespdf, docx, txt, or csv
processingModestringNotext (default) or visual
redactbooleanNoProduce a redacted copy (default true)
parseOptionsobjectNoOCR controls: ocrScannedPages, ocrLanguage, ocrEngine
cURL
curl -X POST https://api.cid222.ai/document-analysis/analyze \
-H "Authorization: Bearer YOUR_JWT" \
-H "Content-Type: application/json" \
-d '{
"documentBase64": "JVBERi0xLjQK...",
"documentType": "pdf",
"redact": true
}'

Response

Returns an analysis ID, the overall action, per-page detections, and the extracted and redacted text.

{
"analysisId": "a1b2c3d4-...",
"action": "mask",
"detections": [
{ "type": "EMAIL", "value": "[EMAIL]", "action": "mask", "confidence": 0.99 }
],
"extractedText": "...",
"redactedText": "...",
"pageResults": [
{ "page": 1, "detections": 3 }
],
"processingTimeMs": 2540
}

Supported Types

GET /document-analysis/supported-types

Returns the document types CID222 can parse and whether each supports OCR.

Download Redacted Document

GET /document-analysis/:id/download

Streams the redacted document produced by a previous analysis, using its analysisId.

curl -X GET https://api.cid222.ai/document-analysis/a1b2c3d4-.../download \
-H "Authorization: Bearer YOUR_JWT" \
-o redacted.pdf