Automator Guide¶
You're a power user. In the 2025 survey, 52.1% of MT users translate at least weekly β with 31% translating more than once per week. You need batch processing, translation memory, and workflow integration.
This guide covers advanced features for high-volume translation work.
Your Profile¶
From the survey:
| Statistic | Your Needs |
|---|---|
| 31% translate more than once/week | High-volume workflows |
| 59.9% use MT for written communication | Email, documents, reports |
| 56.7% use MT to read/understand content | Research, analysis |
| 42% use MT for publishing | External communications |
| 76.5% work on laptops | Desktop-first workflows |
You need more than one-off translations. You need a system.
Batch Processing¶
Translate Entire Folders¶
Instead of one document at a time:
from pauhu import Pauhu
client = Pauhu()
# Translate all documents in a folder
results = client.batch.translate(
source_folder="/documents/incoming/",
target_folder="/documents/translated/",
target_language="fi",
preserve_format=True
)
# Results
print(f"Translated: {results.success_count} documents")
print(f"Total words: {results.word_count}")
print(f"Time: {results.elapsed_time}")
CLI for Bulk Operations¶
# Translate all PDFs in a directory
pauhu translate ./input/*.pdf --target fi --output ./output/
# Watch a folder for new documents
pauhu watch ./inbox/ --target fi --output ./translated/
# Process with specific domain model
pauhu translate ./legal-docs/ --target fi --domain "12 Law"
Supported Formats¶
| Format | Extensions | Features |
|---|---|---|
| Office | .docx, .xlsx, .pptx | Full formatting preserved |
| OCR if needed, layout preserved | ||
| Text | .txt, .md, .html | Clean conversion |
| Data | .json, .xml, .csv | Structure preserved |
| Images | .png, .jpg, .tiff | OCR extraction |
| Scans | .pdf (scanned) | OCR + layout analysis |
File Hubs (Unlimited Projects)¶
Organize your work. Each project gets its own:
- Translation memory
- Terminology glossaries
- Quality settings
- Team access controls
Create a Project¶
# Create a new file hub
project = client.projects.create(
name="EU Regulation Q1 2025",
domain="10 European Union",
languages=["en", "fi", "sv"],
glossary="./eu-terms.csv"
)
# Upload documents
for doc in regulatory_documents:
project.upload(doc)
# Translate entire project
results = project.translate(target="fi")
Project Structure¶
π EU Regulation Q1 2025
βββ π Source Documents
β βββ regulation-2025-001.pdf
β βββ directive-ai-act.docx
β βββ council-conclusions.pdf
βββ π Translations (Finnish)
β βββ regulation-2025-001_fi.pdf
β βββ directive-ai-act_fi.docx
β βββ council-conclusions_fi.pdf
βββ π Translation Memory
β βββ project-tm.tmx
βββ π Glossaries
β βββ eu-terms.csv
βββ π Quality Reports
βββ qa-report.html
Translation Memory¶
Reuse previous translations. 100% matches are instant and included.
How It Works¶
graph LR
A[New Segment] --> B{In TM?}
B -->|100% match| C[Use Previous]
B -->|Fuzzy match| D[Suggest + Translate]
B -->|No match| E[New Translation]
C --> F[Instant]
D --> G[Discounted]
E --> H[Full Cost] TM Statistics¶
From typical enterprise usage:
| Match Type | Percentage | Cost |
|---|---|---|
| 100% match | 15-40% | Included |
| 95-99% fuzzy | 10-20% | 50% cost |
| 75-94% fuzzy | 10-15% | 75% cost |
| New | 35-65% | Full cost |
Average savings: 25-45% on repeat content.
Import Existing TM¶
# Import from other CAT tools
project.tm.import_file(
"./existing-tm.tmx", # TMX format
# Or: "./legacy-tm.xliff" # XLIFF format
# Or: "./sdl-tm.sdltm" # SDL Trados
)
# Export for backup
project.tm.export("./backup-tm.tmx")
Auto-Recognition¶
Drop files β Pauhu handles the rest.
What Gets Detected¶
| Aspect | Detection | Accuracy |
|---|---|---|
| Source language | Automatic | 99%+ |
| Document domain | 21 EuroVoc domains | 95%+ |
| Document type | Contract, letter, form, etc. | 90%+ |
| Terminology | Domain-specific glossary | Automatic |
Domain Routing¶
# Upload any document β domain detected automatically
doc = project.upload("./unknown-document.pdf")
print(doc.detected_domain) # "12 Law"
print(doc.detected_type) # "Contract"
print(doc.detected_language) # "en"
print(doc.terminology_applied) # ["legal-fi", "contract-terms"]
Quality Assurance¶
Multi-Tier QA¶
Every translation goes through:
graph TB
A[Translation] --> B[Terminology Check]
B --> C[Consistency Check]
C --> D[Style Check]
D --> E[Quality Score]
E --> F{Score > 85?}
F -->|Yes| G[Auto-Approve]
F -->|No| H[Human Review Queue] QA Configuration¶
# Set project-level QA rules
project.qa.configure(
terminology_strict=True, # Enforce glossary terms
consistency_threshold=0.9, # Flag inconsistent translations
style_guide="formal", # Formal/informal/technical
auto_approve_threshold=85, # Minimum score for auto-approval
forbidden_terms=["TBD", "TODO", "FIXME"]
)
QA Report¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QA REPORT β
β Project: EU Reg Q1 2025 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Documents processed: 47 β
β Total segments: 12,456 β
β Average quality score: 92.3 β
β β
β ISSUES FOUND: β
β βββ Terminology: 23 (glossary mismatches) β
β βββ Consistency: 12 (same source, different target) β
β βββ Style: 8 (informal register in formal doc) β
β βββ Numbers: 3 (possible format errors) β
β β
β RECOMMENDATIONS: β
β β’ Review segments flagged for terminology β
β β’ Update glossary with 5 new terms detected β
β β’ Consider post-editing for 8 style issues β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
API Integration¶
RESTful API¶
import requests
# Translate via API
response = requests.post(
"https://api.pauhu.ai/v1/translate",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"text": "EU directive on artificial intelligence",
"source": "en",
"target": "fi",
"domain": "10 European Union"
}
)
result = response.json()
print(result["translation"]) # "EU:n tekoΓ€lydirektiiviΓ€"
print(result["quality_score"]) # 94
Webhook Integration¶
# Configure webhook for batch completion
client.webhooks.create(
url="https://your-system.com/pauhu-webhook",
events=["batch.completed", "document.translated"],
secret="your-webhook-secret"
)
CMS/DAM Integration¶
| System | Integration | Status |
|---|---|---|
| SharePoint | Native plugin | β Available |
| Confluence | REST API | β Available |
| WordPress | Plugin | β Available |
| Contentful | Webhook | β Available |
| Custom | API + Webhooks | β Full support |
Workflow Automation¶
Watch Folders¶
# Auto-translate documents dropped in folder
pauhu watch ./inbox \
--target fi \
--output ./translated \
--notify email:team@org.fi \
--qa-threshold 85
Scheduled Jobs¶
# pauhu-schedule.yaml
schedules:
- name: "Daily EU News"
source: "https://europa.eu/newsroom/rss"
target: "fi"
schedule: "0 6 * * *" # 6 AM daily
output: "./news/translated/"
- name: "Weekly Reports"
source: "./pending-reports/"
target: "fi,sv"
schedule: "0 8 * * 1" # Monday 8 AM
output: "./reports/translated/"
Integration with Your Tools¶
# Example: SharePoint integration
from pauhu.integrations import SharePointConnector
sp = SharePointConnector(
site="https://yourorg.sharepoint.com/sites/documents",
credentials=credentials
)
# Watch SharePoint folder, translate, upload back
sp.watch_folder(
source_folder="/Incoming/",
target_folder="/Translated/",
target_language="fi"
)
Cost Management¶
Usage Dashboard¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USAGE THIS MONTH β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Characters translated: 2,456,789 β
β Documents processed: 234 β
β TM savings: 892,345 characters (36%) β
β β
β COST BREAKDOWN: β
β βββ New translations: β¬412.50 β
β βββ Fuzzy matches: β¬89.20 β
β βββ 100% matches: β¬0.00 β
β βββ Total: β¬501.70 (β¬0.20/1000 chars effective) β
β β
β Budget remaining: β¬498.30 / β¬1,000.00 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cost Optimization Tips¶
- Build TM early β First project costs more, subsequent projects cost less
- Use consistent terminology β Reduces fuzzy match variations
- Batch similar documents β Better TM leverage
- Pre-translate with TM β 100% matches are included
Team Features¶
Role-Based Access¶
# Set up team with different access levels
project.team.add_member("translator@org.fi", role="translator")
project.team.add_member("reviewer@org.fi", role="reviewer")
project.team.add_member("admin@org.fi", role="admin")
Workflow Stages¶
graph LR
A[Upload] --> B[Auto-Translate]
B --> C[QA Check]
C --> D{QA Pass?}
D -->|Yes| E[Auto-Approve]
D -->|No| F[Human Review]
F --> G[Post-Edit]
G --> H[Final Approval]
E --> I[Published]
H --> I Next Steps¶
-
API Reference
Full API documentation for developers.
-
Translation Memory
Deep dive into TM features.
-
Quality Assurance
Configure QA rules and workflows.
-
Enterprise Deployment
On-premises, air-gapped, or hybrid.
Data source: Koponen, Nurminen & Ilkiliç (2025). "Automaattisten kÀÀnnâssovellusten kÀyttâ Suomen julkishallinnossa." University of Eastern Finland. URN:ISBN:978-952-61-5851-8