Term Base¶
Terminology consistency across all content. Pauhu's term base ensures domain-specific terms are translated consistently across all projects, languages, and time periods.
What is a Term Base?¶
A term base (terminology database) stores approved translations for specific terms:
| Source Term (EN) | Target Term (FI) | Domain | Reliability |
|---|---|---|---|
| artificial intelligence | tekoäly | 36 Science | ⭐⭐⭐⭐ |
| GDPR | tietosuoja-asetus | 12 Law | ⭐⭐⭐⭐ |
| encryption | salaus | 32 IT | ⭐⭐⭐⭐ |
| quantum-safe | kvanttiturvallinen | 36 Science | ⭐⭐⭐ |
Automatic Term Recognition¶
from pauhu import Pauhu
client = Pauhu()
# Translate with term base enforcement
result = client.translate(
text="Our AI uses GDPR-compliant encryption.",
source="en",
target="fi",
domain="12 Law",
enforce_terms=True # Enforce term base
)
print(result.text)
# "Tekoälymme käyttää tietosuoja-asetuksen mukaista salausta."
# Check which terms were recognized
print(result.terms_applied)
# [
# {"source": "AI", "target": "tekoäly", "confidence": 0.98},
# {"source": "GDPR", "target": "tietosuoja-asetus", "confidence": 1.0},
# {"source": "encryption", "target": "salaus", "confidence": 0.95}
# ]
Term Base Sources¶
1. IATE (EU InterActive Terminology)¶
8.4 million terms from EU institutions:
# IATE terms are automatically available
result = client.translate(
text="The European Commission adopted a directive.",
source="en",
target="fi",
domain="10 European Union"
)
# IATE terms automatically applied:
# "European Commission" → "Euroopan komissio" (⭐⭐⭐⭐)
# "directive" → "direktiivi" (⭐⭐⭐⭐)
2. EuroVoc (EU Multilingual Thesaurus)¶
7,000+ domain concepts across 21 EuroVoc domains:
| Domain | Terms | Languages |
|---|---|---|
| 04 Politics | 850+ | 24 |
| 10 European Union | 1,200+ | 24 |
| 12 Law | 1,500+ | 24 |
| 36 Science | 900+ | 24 |
3. Custom Term Bases¶
Upload your own terminology:
# Import organization-specific terms
project = client.projects.create(name="Legal Docs Q1 2025")
project.terms.import_csv(
file_path="./organization-terms.csv",
format="csv",
columns=["source", "target", "domain", "reliability"]
)
CSV Format:
source,target,domain,reliability
data controller,rekisterinpitäjä,12 Law,4
data processor,henkilötietojen käsittelijä,12 Law,4
lawful basis,oikeusperuste,12 Law,4
Term Base Priority¶
When multiple term bases contain the same term:
Priority (highest to lowest):
1. Custom Project Term Base (your organization)
2. IATE (EU official) (⭐⭐⭐⭐ reliability)
3. EuroVoc (EU multilingual thesaurus)
4. Domain-specific glossaries (per-domain terms)
5. AI-generated suggestions (see AI Term Base)
Multilingual Term Bases¶
Create multilingual term bases for EU projects:
# Define term in all 24 EU languages
project.terms.add(
source_term="artificial intelligence",
translations={
"fi": "tekoäly",
"sv": "artificiell intelligens",
"de": "künstliche Intelligenz",
"fr": "intelligence artificielle",
"es": "inteligencia artificial",
# ... 19 more languages
},
domain="36 Science",
reliability=4
)
Term Base Integration with AI¶
Bidirectional semantic flow:
graph LR
A[Term Base] -->|Enforce consistency| B[AI Translation]
B -->|Extract new terms| C[AI Term Base]
C -->|Suggest additions| A
A -->|Context| D[Translation Memory]
D -->|Historical usage| A See AI Term Base for AI-powered terminology extraction.
Quality Enforcement¶
# Strict term base enforcement
result = client.translate(
text="AI processes personal data under GDPR.",
target="fi",
domain="12 Law",
enforce_terms="strict" # Fail if terms not found
)
# Permissive (suggest but don't enforce)
result = client.translate(
text="Emerging AI concepts like AGI.",
target="fi",
domain="36 Science",
enforce_terms="suggest" # Suggest matches, allow alternatives
)
Compliance¶
ISO 17100:2015¶
"Terminology resources shall be used and maintained throughout the translation process."
Pauhu's term base satisfies ISO 17100 requirements for terminology management.
GDPR Article 32¶
Standard terminology ensures: - Consistent privacy notices across languages - Legally accurate translations of data protection terms - No ambiguity in user rights
Export and Backup¶
# Export term base (TBX format)
project.terms.export(
file_path="./term-base-backup.tbx",
format="tbx" # TermBase eXchange (ISO 30042)
)
# Also supports:
# - CSV (simple tabular)
# - XLIFF (translation interchange)
# - TMX (translation memory exchange)
Getting Started¶
from pauhu import Pauhu
client = Pauhu()
# Create project with term base
project = client.projects.create(
name="EU Regulation Translation",
domain="10 European Union"
)
# IATE terms automatically available
result = project.translate(
text="The regulation enters into force.",
source="en",
target="fi"
)
# Check applied terms
for term in result.terms_applied:
print(f"{term.source} → {term.target} (⭐×{term.reliability})")
Further Reading¶
- AI Term Base - AI-powered terminology extraction
- Translation Memory - Context from historical translations
- AI Memory - How AI learns from term usage
- Quality Assurance - Terminology consistency checks