Skip to content

Documents API

Document translation at scale. Upload documents, translate with layout preservation, and download results. Support for PDF, DOCX, XLSX, PPTX, and 50+ more formats.


Endpoints

Method Endpoint Description
POST /v1/documents Upload document
POST /v1/documents/{id}/translate Translate document
GET /v1/documents/{id} Get document status
GET /v1/documents/{id}/download Download result
DELETE /v1/documents/{id} Delete document

POST /documents

Upload a document for translation.

Request

curl -X POST https://api.pauhu.ai/v1/documents \
  -H "Authorization: Bearer pk_..." \
  -F "file=@contract.pdf" \
  -F "target=fi" \
  -F "domain=12 Law"

Parameters

Parameter Type Required Description
file file Yes Document file (max 100 MB)
target string Yes Target language code
source string No Source language (auto-detect)
domain string No EuroVoc domain
preserve_layout boolean No Preserve formatting (default: true)
translate_headers boolean No Translate headers/footers
glossary_id string No Use specific glossary

Response

{
  "data": {
    "id": "doc_abc123",
    "filename": "contract.pdf",
    "status": "processing",
    "source_language": "en",
    "target_language": "fi",
    "page_count": 15,
    "word_count": 5000,
    "created_at": "2025-01-15T10:30:00Z",
    "estimated_completion": "2025-01-15T10:31:00Z"
  }
}

Python SDK

from pauhu import Pauhu

client = Pauhu()

# Upload and translate
result = client.translate_document(
    file_path="contract.pdf",
    target="fi",
    domain="12 Law"
)

# Wait for completion
result.wait()

# Save translated document
result.save("contract_fi.pdf")

POST /documents/{id}/translate

Start translation for an uploaded document.

Request

curl -X POST https://api.pauhu.ai/v1/documents/doc_abc123/translate \
  -H "Authorization: Bearer pk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "target": "fi",
    "options": {
      "preserve_layout": true,
      "quality": "quality"
    }
  }'

Translation Options

Option Type Description
preserve_layout boolean Keep original formatting
preserve_images boolean Keep images in place
translate_alt_text boolean Translate image alt text
translate_comments boolean Translate document comments
quality string fast, balanced, quality
page_range string e.g., "1-10" or "1,3,5"

GET /documents/{id}

Check document status.

Request

curl https://api.pauhu.ai/v1/documents/doc_abc123 \
  -H "Authorization: Bearer pk_..."

Response

{
  "data": {
    "id": "doc_abc123",
    "filename": "contract.pdf",
    "status": "completed",
    "source_language": "en",
    "target_language": "fi",
    "page_count": 15,
    "word_count": 5000,
    "progress": 100,
    "created_at": "2025-01-15T10:30:00Z",
    "completed_at": "2025-01-15T10:31:30Z",
    "download_url": "https://api.pauhu.ai/v1/documents/doc_abc123/download",
    "expires_at": "2025-01-22T10:31:30Z"
  }
}

Status Values

Status Description
pending Uploaded, waiting for processing
processing Translation in progress
completed Ready for download
failed Translation failed
expired Download link expired

GET /documents/{id}/download

Download the translated document.

Request

curl https://api.pauhu.ai/v1/documents/doc_abc123/download \
  -H "Authorization: Bearer pk_..." \
  -o contract_fi.pdf

Query Parameters

Parameter Type Description
format string Output format (optional)
include_source boolean Include source in output

Response

Binary file download with headers:

Content-Type: application/pdf
Content-Disposition: attachment; filename="contract_fi.pdf"
Content-Length: 1048576

DELETE /documents/{id}

Delete a document and its translations.

Request

curl -X DELETE https://api.pauhu.ai/v1/documents/doc_abc123 \
  -H "Authorization: Bearer pk_..."

Response

{
  "data": {
    "id": "doc_abc123",
    "deleted": true
  }
}

Supported Formats

Input Formats

Category Formats
Office DOCX, DOC, XLSX, XLS, PPTX, PPT
PDF PDF, PDF/A, PDF/UA
OpenDocument ODT, ODS, ODP
Text TXT, RTF, HTML, XML, Markdown
Publishing IDML (InDesign), XLS
Subtitles SRT, VTT, ASS, SSA
Data JSON, YAML, CSV

Output Formats

By default, output matches input format. Override with format parameter:

curl "https://api.pauhu.ai/v1/documents/doc_abc123/download?format=pdf"

Webhooks

Get notified when translation completes:

curl -X POST https://api.pauhu.ai/v1/documents \
  -F "file=@document.pdf" \
  -F "target=fi" \
  -F "webhook_url=https://yourapp.com/webhook" \
  -F "webhook_secret=your_secret"

Webhook payload:

{
  "event": "document.completed",
  "document_id": "doc_abc123",
  "status": "completed",
  "download_url": "https://api.pauhu.ai/v1/documents/doc_abc123/download",
  "timestamp": "2025-01-15T10:31:30Z"
}

Error Codes

Code Status Description
unsupported_format 400 File format not supported
file_too_large 400 Exceeds 100 MB limit
document_not_found 404 Document ID not found
document_expired 410 Download link expired
processing_failed 500 Translation failed

Python SDK Examples

from pauhu import Pauhu

client = Pauhu()

# Simple document translation
result = client.translate_document(
    file_path="report.pdf",
    target="fi"
)
result.save("report_fi.pdf")

# Batch documents
for doc in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
    result = client.translate_document(
        file_path=doc,
        target="fi",
        async_mode=True  # Don't wait
    )
    print(f"Started: {result.id}")

# Check all pending
for doc in client.documents.list(status="processing"):
    print(f"{doc.filename}: {doc.progress}%")