Documents API¶
Document translation at scale. Upload documents, translate with layout preservation, and download results. Support for PDF, DOCX, XLSX, PPTX, and 50+ more formats.
Endpoints¶
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/documents | Upload document |
| POST | /v1/documents/{id}/translate | Translate document |
| GET | /v1/documents/{id} | Get document status |
| GET | /v1/documents/{id}/download | Download result |
| DELETE | /v1/documents/{id} | Delete document |
POST /documents¶
Upload a document for translation.
Request¶
curl -X POST https://api.pauhu.ai/v1/documents \
-H "Authorization: Bearer pk_..." \
-F "file=@contract.pdf" \
-F "target=fi" \
-F "domain=12 Law"
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Document file (max 100 MB) |
target | string | Yes | Target language code |
source | string | No | Source language (auto-detect) |
domain | string | No | EuroVoc domain |
preserve_layout | boolean | No | Preserve formatting (default: true) |
translate_headers | boolean | No | Translate headers/footers |
glossary_id | string | No | Use specific glossary |
Response¶
{
"data": {
"id": "doc_abc123",
"filename": "contract.pdf",
"status": "processing",
"source_language": "en",
"target_language": "fi",
"page_count": 15,
"word_count": 5000,
"created_at": "2025-01-15T10:30:00Z",
"estimated_completion": "2025-01-15T10:31:00Z"
}
}
Python SDK¶
from pauhu import Pauhu
client = Pauhu()
# Upload and translate
result = client.translate_document(
file_path="contract.pdf",
target="fi",
domain="12 Law"
)
# Wait for completion
result.wait()
# Save translated document
result.save("contract_fi.pdf")
POST /documents/{id}/translate¶
Start translation for an uploaded document.
Request¶
curl -X POST https://api.pauhu.ai/v1/documents/doc_abc123/translate \
-H "Authorization: Bearer pk_..." \
-H "Content-Type: application/json" \
-d '{
"target": "fi",
"options": {
"preserve_layout": true,
"quality": "quality"
}
}'
Translation Options¶
| Option | Type | Description |
|---|---|---|
preserve_layout | boolean | Keep original formatting |
preserve_images | boolean | Keep images in place |
translate_alt_text | boolean | Translate image alt text |
translate_comments | boolean | Translate document comments |
quality | string | fast, balanced, quality |
page_range | string | e.g., "1-10" or "1,3,5" |
GET /documents/{id}¶
Check document status.
Request¶
Response¶
{
"data": {
"id": "doc_abc123",
"filename": "contract.pdf",
"status": "completed",
"source_language": "en",
"target_language": "fi",
"page_count": 15,
"word_count": 5000,
"progress": 100,
"created_at": "2025-01-15T10:30:00Z",
"completed_at": "2025-01-15T10:31:30Z",
"download_url": "https://api.pauhu.ai/v1/documents/doc_abc123/download",
"expires_at": "2025-01-22T10:31:30Z"
}
}
Status Values¶
| Status | Description |
|---|---|
pending | Uploaded, waiting for processing |
processing | Translation in progress |
completed | Ready for download |
failed | Translation failed |
expired | Download link expired |
GET /documents/{id}/download¶
Download the translated document.
Request¶
curl https://api.pauhu.ai/v1/documents/doc_abc123/download \
-H "Authorization: Bearer pk_..." \
-o contract_fi.pdf
Query Parameters¶
| Parameter | Type | Description |
|---|---|---|
format | string | Output format (optional) |
include_source | boolean | Include source in output |
Response¶
Binary file download with headers:
Content-Type: application/pdf
Content-Disposition: attachment; filename="contract_fi.pdf"
Content-Length: 1048576
DELETE /documents/{id}¶
Delete a document and its translations.
Request¶
Response¶
Supported Formats¶
Input Formats¶
| Category | Formats |
|---|---|
| Office | DOCX, DOC, XLSX, XLS, PPTX, PPT |
| PDF, PDF/A, PDF/UA | |
| OpenDocument | ODT, ODS, ODP |
| Text | TXT, RTF, HTML, XML, Markdown |
| Publishing | IDML (InDesign), XLS |
| Subtitles | SRT, VTT, ASS, SSA |
| Data | JSON, YAML, CSV |
Output Formats¶
By default, output matches input format. Override with format parameter:
Webhooks¶
Get notified when translation completes:
curl -X POST https://api.pauhu.ai/v1/documents \
-F "file=@document.pdf" \
-F "target=fi" \
-F "webhook_url=https://yourapp.com/webhook" \
-F "webhook_secret=your_secret"
Webhook payload:
{
"event": "document.completed",
"document_id": "doc_abc123",
"status": "completed",
"download_url": "https://api.pauhu.ai/v1/documents/doc_abc123/download",
"timestamp": "2025-01-15T10:31:30Z"
}
Error Codes¶
| Code | Status | Description |
|---|---|---|
unsupported_format | 400 | File format not supported |
file_too_large | 400 | Exceeds 100 MB limit |
document_not_found | 404 | Document ID not found |
document_expired | 410 | Download link expired |
processing_failed | 500 | Translation failed |
Python SDK Examples¶
from pauhu import Pauhu
client = Pauhu()
# Simple document translation
result = client.translate_document(
file_path="report.pdf",
target="fi"
)
result.save("report_fi.pdf")
# Batch documents
for doc in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
result = client.translate_document(
file_path=doc,
target="fi",
async_mode=True # Don't wait
)
print(f"Started: {result.id}")
# Check all pending
for doc in client.documents.list(status="processing"):
print(f"{doc.filename}: {doc.progress}%")