Document Analysis
Parse PDF, DOCX, TXT, and CSV documents, detect PII per page, and download a redacted copy.
Authentication
Document endpoints require a user JWT (Authorization: Bearer YOUR_JWT).
Analyze a Document
POST /document-analysis/analyze
Extracts text (with optional OCR for scanned pages), scans it for PII and safety issues, and optionally produces a redacted copy.
| Parameter | Type | Required | Description |
|---|---|---|---|
documentBase64 | string | Yes | Base64-encoded document |
documentType | string | Yes | pdf, docx, txt, or csv |
processingMode | string | No | text (default) or visual |
redact | boolean | No | Produce a redacted copy (default true) |
parseOptions | object | No | OCR controls: ocrScannedPages, ocrLanguage, ocrEngine |
cURL
curl -X POST https://api.cid222.ai/document-analysis/analyze \-H "Authorization: Bearer YOUR_JWT" \-H "Content-Type: application/json" \-d '{"documentBase64": "JVBERi0xLjQK...","documentType": "pdf","redact": true}'
Response
Returns an analysis ID, the overall action, per-page detections, and the extracted and redacted text.
{"analysisId": "a1b2c3d4-...","action": "mask","detections": [{ "type": "EMAIL", "value": "[EMAIL]", "action": "mask", "confidence": 0.99 }],"extractedText": "...","redactedText": "...","pageResults": [{ "page": 1, "detections": 3 }],"processingTimeMs": 2540}
Supported Types
GET /document-analysis/supported-types
Returns the document types CID222 can parse and whether each supports OCR.
Download Redacted Document
GET /document-analysis/:id/download
Streams the redacted document produced by a previous analysis, using its analysisId.
curl -X GET https://api.cid222.ai/document-analysis/a1b2c3d4-.../download \-H "Authorization: Bearer YOUR_JWT" \-o redacted.pdf