Skip to content

Automator Guide

You're a power user. In the 2025 survey, 52.1% of MT users translate at least weekly β€” with 31% translating more than once per week. You need batch processing, translation memory, and workflow integration.

This guide covers advanced features for high-volume translation work.


Your Profile

From the survey:

Statistic Your Needs
31% translate more than once/week High-volume workflows
59.9% use MT for written communication Email, documents, reports
56.7% use MT to read/understand content Research, analysis
42% use MT for publishing External communications
76.5% work on laptops Desktop-first workflows

You need more than one-off translations. You need a system.


Batch Processing

Translate Entire Folders

Instead of one document at a time:

from pauhu import Pauhu

client = Pauhu()

# Translate all documents in a folder
results = client.batch.translate(
    source_folder="/documents/incoming/",
    target_folder="/documents/translated/",
    target_language="fi",
    preserve_format=True
)

# Results
print(f"Translated: {results.success_count} documents")
print(f"Total words: {results.word_count}")
print(f"Time: {results.elapsed_time}")

CLI for Bulk Operations

# Translate all PDFs in a directory
pauhu translate ./input/*.pdf --target fi --output ./output/

# Watch a folder for new documents
pauhu watch ./inbox/ --target fi --output ./translated/

# Process with specific domain model
pauhu translate ./legal-docs/ --target fi --domain "12 Law"

Supported Formats

Format Extensions Features
Office .docx, .xlsx, .pptx Full formatting preserved
PDF .pdf OCR if needed, layout preserved
Text .txt, .md, .html Clean conversion
Data .json, .xml, .csv Structure preserved
Images .png, .jpg, .tiff OCR extraction
Scans .pdf (scanned) OCR + layout analysis

File Hubs (Unlimited Projects)

Organize your work. Each project gets its own:

  • Translation memory
  • Terminology glossaries
  • Quality settings
  • Team access controls

Create a Project

# Create a new file hub
project = client.projects.create(
    name="EU Regulation Q1 2025",
    domain="10 European Union",
    languages=["en", "fi", "sv"],
    glossary="./eu-terms.csv"
)

# Upload documents
for doc in regulatory_documents:
    project.upload(doc)

# Translate entire project
results = project.translate(target="fi")

Project Structure

πŸ“ EU Regulation Q1 2025
β”œβ”€β”€ πŸ“ Source Documents
β”‚   β”œβ”€β”€ regulation-2025-001.pdf
β”‚   β”œβ”€β”€ directive-ai-act.docx
β”‚   └── council-conclusions.pdf
β”œβ”€β”€ πŸ“ Translations (Finnish)
β”‚   β”œβ”€β”€ regulation-2025-001_fi.pdf
β”‚   β”œβ”€β”€ directive-ai-act_fi.docx
β”‚   └── council-conclusions_fi.pdf
β”œβ”€β”€ πŸ“ Translation Memory
β”‚   └── project-tm.tmx
β”œβ”€β”€ πŸ“ Glossaries
β”‚   └── eu-terms.csv
└── πŸ“ Quality Reports
    └── qa-report.html

Translation Memory

Reuse previous translations. 100% matches are instant and included.

How It Works

graph LR
    A[New Segment] --> B{In TM?}
    B -->|100% match| C[Use Previous]
    B -->|Fuzzy match| D[Suggest + Translate]
    B -->|No match| E[New Translation]
    C --> F[Instant]
    D --> G[Discounted]
    E --> H[Full Cost]

TM Statistics

From typical enterprise usage:

Match Type Percentage Cost
100% match 15-40% Included
95-99% fuzzy 10-20% 50% cost
75-94% fuzzy 10-15% 75% cost
New 35-65% Full cost

Average savings: 25-45% on repeat content.

Import Existing TM

# Import from other CAT tools
project.tm.import_file(
    "./existing-tm.tmx",  # TMX format
    # Or: "./legacy-tm.xliff"  # XLIFF format
    # Or: "./sdl-tm.sdltm"  # SDL Trados
)

# Export for backup
project.tm.export("./backup-tm.tmx")

Auto-Recognition

Drop files β€” Pauhu handles the rest.

What Gets Detected

Aspect Detection Accuracy
Source language Automatic 99%+
Document domain 21 EuroVoc domains 95%+
Document type Contract, letter, form, etc. 90%+
Terminology Domain-specific glossary Automatic

Domain Routing

# Upload any document β€” domain detected automatically
doc = project.upload("./unknown-document.pdf")

print(doc.detected_domain)      # "12 Law"
print(doc.detected_type)        # "Contract"
print(doc.detected_language)    # "en"
print(doc.terminology_applied)  # ["legal-fi", "contract-terms"]

Quality Assurance

Multi-Tier QA

Every translation goes through:

graph TB
    A[Translation] --> B[Terminology Check]
    B --> C[Consistency Check]
    C --> D[Style Check]
    D --> E[Quality Score]
    E --> F{Score > 85?}
    F -->|Yes| G[Auto-Approve]
    F -->|No| H[Human Review Queue]

QA Configuration

# Set project-level QA rules
project.qa.configure(
    terminology_strict=True,      # Enforce glossary terms
    consistency_threshold=0.9,    # Flag inconsistent translations
    style_guide="formal",         # Formal/informal/technical
    auto_approve_threshold=85,    # Minimum score for auto-approval
    forbidden_terms=["TBD", "TODO", "FIXME"]
)

QA Report

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    QA REPORT                            β”‚
β”‚                 Project: EU Reg Q1 2025                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Documents processed: 47                                 β”‚
β”‚ Total segments: 12,456                                  β”‚
β”‚ Average quality score: 92.3                             β”‚
β”‚                                                         β”‚
β”‚ ISSUES FOUND:                                           β”‚
β”‚ β”œβ”€β”€ Terminology: 23 (glossary mismatches)              β”‚
β”‚ β”œβ”€β”€ Consistency: 12 (same source, different target)     β”‚
β”‚ β”œβ”€β”€ Style: 8 (informal register in formal doc)         β”‚
β”‚ └── Numbers: 3 (possible format errors)                 β”‚
β”‚                                                         β”‚
β”‚ RECOMMENDATIONS:                                        β”‚
β”‚ β€’ Review segments flagged for terminology               β”‚
β”‚ β€’ Update glossary with 5 new terms detected             β”‚
β”‚ β€’ Consider post-editing for 8 style issues             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

API Integration

RESTful API

import requests

# Translate via API
response = requests.post(
    "https://api.pauhu.ai/v1/translate",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "text": "EU directive on artificial intelligence",
        "source": "en",
        "target": "fi",
        "domain": "10 European Union"
    }
)

result = response.json()
print(result["translation"])  # "EU:n tekoΓ€lydirektiiviΓ€"
print(result["quality_score"])  # 94

Webhook Integration

# Configure webhook for batch completion
client.webhooks.create(
    url="https://your-system.com/pauhu-webhook",
    events=["batch.completed", "document.translated"],
    secret="your-webhook-secret"
)

CMS/DAM Integration

System Integration Status
SharePoint Native plugin βœ… Available
Confluence REST API βœ… Available
WordPress Plugin βœ… Available
Contentful Webhook βœ… Available
Custom API + Webhooks βœ… Full support

Workflow Automation

Watch Folders

# Auto-translate documents dropped in folder
pauhu watch ./inbox \
    --target fi \
    --output ./translated \
    --notify email:team@org.fi \
    --qa-threshold 85

Scheduled Jobs

# pauhu-schedule.yaml
schedules:
  - name: "Daily EU News"
    source: "https://europa.eu/newsroom/rss"
    target: "fi"
    schedule: "0 6 * * *"  # 6 AM daily
    output: "./news/translated/"

  - name: "Weekly Reports"
    source: "./pending-reports/"
    target: "fi,sv"
    schedule: "0 8 * * 1"  # Monday 8 AM
    output: "./reports/translated/"

Integration with Your Tools

# Example: SharePoint integration
from pauhu.integrations import SharePointConnector

sp = SharePointConnector(
    site="https://yourorg.sharepoint.com/sites/documents",
    credentials=credentials
)

# Watch SharePoint folder, translate, upload back
sp.watch_folder(
    source_folder="/Incoming/",
    target_folder="/Translated/",
    target_language="fi"
)

Cost Management

Usage Dashboard

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 USAGE THIS MONTH                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Characters translated: 2,456,789                        β”‚
β”‚ Documents processed: 234                                β”‚
β”‚ TM savings: 892,345 characters (36%)                   β”‚
β”‚                                                         β”‚
β”‚ COST BREAKDOWN:                                         β”‚
β”‚ β”œβ”€β”€ New translations: €412.50                          β”‚
β”‚ β”œβ”€β”€ Fuzzy matches: €89.20                              β”‚
β”‚ β”œβ”€β”€ 100% matches: €0.00                                β”‚
β”‚ └── Total: €501.70 (€0.20/1000 chars effective)        β”‚
β”‚                                                         β”‚
β”‚ Budget remaining: €498.30 / €1,000.00                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cost Optimization Tips

  1. Build TM early β€” First project costs more, subsequent projects cost less
  2. Use consistent terminology β€” Reduces fuzzy match variations
  3. Batch similar documents β€” Better TM leverage
  4. Pre-translate with TM β€” 100% matches are included

Team Features

Role-Based Access

# Set up team with different access levels
project.team.add_member("translator@org.fi", role="translator")
project.team.add_member("reviewer@org.fi", role="reviewer")
project.team.add_member("admin@org.fi", role="admin")

Workflow Stages

graph LR
    A[Upload] --> B[Auto-Translate]
    B --> C[QA Check]
    C --> D{QA Pass?}
    D -->|Yes| E[Auto-Approve]
    D -->|No| F[Human Review]
    F --> G[Post-Edit]
    G --> H[Final Approval]
    E --> I[Published]
    H --> I

Next Steps

  • API Reference


    Full API documentation for developers.

    API Docs

  • Translation Memory


    Deep dive into TM features.

    TM Guide

  • Quality Assurance


    Configure QA rules and workflows.

    QA Guide

  • Enterprise Deployment


    On-premises, air-gapped, or hybrid.

    Deployment


Data source: Koponen, Nurminen & Ilkiliç (2025). "Automaattisten kÀÀnnâssovellusten kÀyttâ Suomen julkishallinnossa." University of Eastern Finland. URN:ISBN:978-952-61-5851-8