-
Notifications
You must be signed in to change notification settings - Fork 6
Description
WHO SMART Guidelines — Multi-Service Translation Integration
Status: DRAFT v0.2 — 2026-03-05
Author: @litlfred
Repo: WorldHealthOrganization/smart-base
Audience: IG Administrators
Table of Contents
- Overview
- Concepts and Terminology
- Architecture
- DAK Configuration —
dak.json - DAK Logical Model Updates
- Translation Components — Dynamic Discovery
- Script Infrastructure (
input/scripts/) - GitHub Actions Workflows
- Secrets and Credentials
- Per-IG Onboarding Checklist
- Automated Bulk Registration
- Translation Lifecycle
- Translation Status Report
- Python Script Standards
- Requirements Summary
- Open Questions
- References
1. Overview
WHO SMART Guidelines are published as FHIR Implementation Guides (IGs). Each
DAK (Digital Adaptation Kit) IG is a separate GitHub repository under
WorldHealthOrganization/. This document specifies the complete, multi-service
translation integration between these IG repos and external translation
platforms (Weblate, Launchpad, Crowdin, and others that support the
.pot/.po Gettext format).
Goals
- One Weblate Project per DAK IG — isolated translation "space"
- Multiple translation services in the same workflow — Weblate, Launchpad,
Crowdin, or any service supporting.pot/.po - Fully automated via GitHub Actions — no manual Weblate web UI steps
required for day-to-day operation - Python-first business logic — all non-trivial logic in Python scripts;
shell in workflows is wiring only dak.jsonas the single source of truth for language config; no
hardcoded language lists anywhere- Dynamic component discovery — components discovered by scanning repo for
.potfiles; no hardcoded component lists - Security-hardened — secrets in Actions secrets only; inputs sanitised;
no secret values ever in logs
2. Concepts and Terminology
| Term | Meaning |
|---|---|
| DAK IG | A FHIR Implementation Guide that is a WHO Digital Adaptation Kit; identified by presence of dak.json in the repo root |
| smart-base | Shared infrastructure IG; hosts all reusable scripts and reusable workflows |
| Translation service | An external platform that manages .pot/.po files: Weblate, Launchpad, Crowdin, or other |
| Translation component | One .pot file and its corresponding .po files; derived from the .pot file's path in the repo |
.pot file |
Gettext Portable Object Template — English source strings |
.po file |
Gettext Portable Object — translated strings for one language |
| Project slug | {github-org}-{repo-name}, e.g. worldhealthorganization-smart-hiv |
| Component slug | Derived from the .pot file path (see §6) |
| Source language | Always English (en); not downloaded from translation services |
| Target languages | Configurable per IG in dak.json#translations.languages; defaults to the 6 UN official languages |
| UN 6 languages | ar, zh, fr, ru, es (plus source en) |
| Feature branch | translations/{service-name} — created per service when new translations arrive |
| Completeness report | input/pagecontent/translation-status.md — auto-generated; shows % complete by language × component |
3. Architecture
┌────────────────────────────────────────────────────────────────────┐
│ DAK IG Repos (e.g. smart-hiv, smart-immunizations, …) │
│ │
│ dak.json ← defines languages, translation services │
│ *.pot files ← generated by commit-pot.yml + extract_*.py │
│ translations/ ← .po files per service/lang, managed by Actions │
│ input/pagecontent/translation-status.md ← completeness report │
│ │
│ notify_smart_base.yml → fires repository_dispatch on dak.json Δ │
└──────────────────────────────┬─────────────────────────────────────┘
│ repository_dispatch
▼
┌────────────────────────────────────────────────────────────────────┐
│ smart-base (WorldHealthOrganization/smart-base) │
│ │
│ .github/workflows/ │
│ register_translation_project.yml ← create service projects │
│ pull_translations.yml ← pull .po from all services │
│ commit-pot.yml ← extract .pot, commit │
│ generate_translation_report.yml ← completeness report │
│ │
│ input/scripts/ │
│ translation_config.py ← dak.json reader; lang/component │
│ translation_security.py ← input sanitisation; secret guard │
│ register_translation_project.py ← register one IG (idempotent) │
│ register_all_dak_projects.py ← bulk discovery + register │
│ pull_weblate_translations.py ← Weblate service adapter │
│ pull_launchpad_translations.py ← Launchpad service adapter │
│ pull_crowdin_translations.py ← Crowdin service adapter │
│ pull_translations.py ← orchestrator for all services│
│ translation_report.py ← generate translation-status │
│ extract_script_strings.py ← NEW: extract .pot from py │
│ extract_translations.py ← existing diagram/.pot extrac │
│ inject_translations.py ← existing injector │
└──────────────────────────────┬─────────────────────────────────────┘
┌─────────────────┼──────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Weblate │ │ Launchpad │ │ Crowdin │
│ hosted.weblate│ │ launchpad.net│ │ crowdin.com │
│ .org │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
Key design principles:
- Python owns the logic. Workflow YAML only wires env vars to Python
scripts. No business logic, string parsing, or conditionals in shell. dak.jsonis authoritative. Language lists, enabled services, and
project metadata come exclusively fromdak.json. Nothing is hardcoded.- Secrets stay in GitHub Actions secrets. API tokens are never passed as
workflow_dispatchplaintext inputs, never echoed, never interpolated into
shell commands. - Components are discovered, not declared. The script scans for
*.pot
files at runtime to determine the component list. - One feature branch per service.
translations/weblate,
translations/launchpad,translations/crowdin— PRs opened automatically
when new translations arrive. - Idempotent everywhere. All registration and pull operations are safe to
re-run.
4. DAK Configuration — dak.json
4.1 Updated dak.json schema (additions in bold)
{
"resourceType": "DAK",
"id": "smart.who.int.{ig-id}",
"name": "{CamelCaseName}",
"title": "{Human Readable Title}",
"version": "{semver}",
"status": "draft|active|retired",
"publicationUrl": "https://smart.who.int/{ig-id}",
"canonicalUrl": "https://smart.who.int/{ig-id}",
"previewUrl": "https://WorldHealthOrganization.github.io/{repo-name}",
"license": "CC-BY-SA-3.0-IGO",
"copyrightYear": "2023+",
"publisher": {
"name": "WHO",
"url": "http://who.int"
},
"translations": {
"sourceLanguage": "en",
"languages": [
{ "code": "ar", "name": "Arabic", "direction": "rtl",
"plural": "nplurals=6; plural=(n==0?0:n==1?1:n==2?2:n%100>=3&&n%100<=10?3:n%100>=11&&n%100<=99?4:5);" },
{ "code": "zh", "name": "Chinese (Simplified)", "direction": "ltr",
"plural": "nplurals=1; plural=0;" },
{ "code": "fr", "name": "French", "direction": "ltr",
"plural": "nplurals=2; plural=(n>1);" },
{ "code": "ru", "name": "Russian", "direction": "ltr",
"plural": "nplurals=3; plural=(n%10==1&&n%100!=11?0:n%10>=2&&n%10<=4&&(n%100<10||n%100>=20)?1:2);" },
{ "code": "es", "name": "Spanish", "direction": "ltr",
"plural": "nplurals=2; plural=(n!=1);" }
],
"services": {
"weblate": {
"enabled": true,
"url": "https://hosted.weblate.org"
},
"launchpad": {
"enabled": false
},
"crowdin": {
"enabled": false
}
}
}
}4.2 Field descriptions
| Field | Type | Required | Description |
|---|---|---|---|
translations.sourceLanguage |
string | Yes | BCP-47 code of source language; always "en" |
translations.languages |
array | Yes | Target languages; replaces all hardcoded language lists |
translations.languages[].code |
string | Yes | BCP-47 / ISO 639-1 code |
translations.languages[].name |
string | Yes | Human-readable name |
translations.languages[].direction |
string | Yes | ltr or rtl |
translations.languages[].plural |
string | Yes | Gettext plural form expression |
translations.services |
object | Yes | Keyed by service name; each has enabled flag and service-specific config |
translations.services.weblate.url |
string | No | Weblate base URL; default https://hosted.weblate.org |
4.3 smart-base instance (dak.json)
The smart-base dak.json MUST be updated to include the full
translations block with all 6 UN official non-English languages and
weblate.enabled = true. This serves as the canonical reference instance.
{
"resourceType": "DAK",
"id": "smart.who.int.base",
"name": "Base",
"title": "SMART Base",
"version": "0.2.0",
"status": "draft",
"publicationUrl": "https://smart.who.int/base",
"canonicalUrl": "https://smart.who.int/base",
"previewUrl": "https://WorldHealthOrganization.github.io/smart-base",
"license": "CC-BY-SA-3.0-IGO",
"copyrightYear": "2023+",
"publisher": { "name": "WHO", "url": "http://who.int" },
"translations": {
"sourceLanguage": "en",
"languages": [
{ "code": "ar", "name": "Arabic", "direction": "rtl",
"plural": "nplurals=6; plural=(n==0?0:n==1?1:n==2?2:n%100>=3&&n%100<=10?3:n%100>=11&&n%100<=99?4:5);" },
{ "code": "zh", "name": "Chinese (Simplified)", "direction": "ltr",
"plural": "nplurals=1; plural=0;" },
{ "code": "fr", "name": "French", "direction": "ltr",
"plural": "nplurals=2; plural=(n>1);" },
{ "code": "ru", "name": "Russian", "direction": "ltr",
"plural": "nplurals=3; plural=(n%10==1&&n%100!=11?0:n%10>=2&&n%10<=4&&(n%100<10||n%100>=20)?1:2);" },
{ "code": "es", "name": "Spanish", "direction": "ltr",
"plural": "nplurals=2; plural=(n!=1);" }
],
"services": {
"weblate": { "enabled": true, "url": "https://hosted.weblate.org" },
"launchpad": { "enabled": false },
"crowdin": { "enabled": false }
}
}
}5. DAK Logical Model Updates
The DAK Logical Model (StructureDefinition/DAK in smart-base) MUST be
extended to formally capture the translations element. The following elements
are to be added to the FSH definition:
DAK.translations 0..1 BackboneElement "Translation configuration"
DAK.translations.sourceLanguage 1..1 code "Source language BCP-47 code"
DAK.translations.languages 0..* BackboneElement "Target language entries"
DAK.translations.languages.code 1..1 code "BCP-47 / ISO 639-1 language code"
DAK.translations.languages.name 1..1 string "Human-readable language name"
DAK.translations.languages.direction 1..1 code "Text direction: ltr | rtl"
DAK.translations.languages.plural 0..1 string "Gettext plural form expression"
DAK.translations.services 0..1 BackboneElement "Enabled translation services"
DAK.translations.services.weblate 0..1 BackboneElement "Weblate configuration"
DAK.translations.services.weblate.enabled 1..1 boolean "Is Weblate enabled?"
DAK.translations.services.weblate.url 0..1 url "Weblate base URL"
DAK.translations.services.launchpad 0..1 BackboneElement "Launchpad configuration"
DAK.translations.services.launchpad.enabled 1..1 boolean "Is Launchpad enabled?"
DAK.translations.services.crowdin 0..1 BackboneElement "Crowdin configuration"
DAK.translations.services.crowdin.enabled 1..1 boolean "Is Crowdin enabled?"
Requirement LM-001: The DAK FSH Logical Model in smart-base MUST include
these elements as normative backbone elements.
Requirement LM-002: A JSON Schema for dak.json MUST be provided alongside
the FSH LM for programmatic validation in scripts.
6. Translation Components — Dynamic Discovery
6.1 Discovery algorithm
All scripts MUST derive the component list at runtime by scanning the repo for
*.pot files rather than using any hardcoded list. The canonical function is
in input/scripts/translation_config.py:
def discover_components(repo_root: Path) -> List[TranslationComponent]:
"""
Scan repo_root for all *.pot files and derive component definitions.
Returns components sorted by pot_path for deterministic ordering.
"""6.2 Component slug derivation
Given a .pot file path relative to repo root, the component slug is derived
as follows:
.pot path |
Component slug |
|---|---|
input/fsh/translations/base.pot |
fsh-base |
input/images-source/translations/diagrams.pot |
images-source-diagrams |
input/images/translations/images.pot |
images-images |
input/archimate/translations/models.pot |
archimate-models |
input/diagrams/translations/diagrams.pot |
diagrams-diagrams |
input/scripts/translations/scripts.pot |
scripts-scripts (new) |
Algorithm: Take the path segments between input/ and /translations/,
plus the .pot stem, joined with -. Lowercase, non-alphanumeric → -.
6.3 .po file location convention
For a .pot at {dir}/translations/{stem}.pot, .po files live at:
{dir}/translations/{lang_code}.po
Example: input/fsh/translations/ar.po, input/fsh/translations/fr.po, etc.
7. Script Infrastructure (input/scripts/)
All scripts MUST follow the standards in §14. Below is the complete script inventory.
7.1 translation_config.py — Configuration reader (NEW)
Purpose: Single authoritative module for reading dak.json, discovering
components, and providing config to all other scripts. Eliminates all hardcoded
language or component lists.
Key functions:
def load_dak_config(repo_root: Path) -> DakConfig
"""Load and validate dak.json. Raises DakConfigError on missing/invalid."""
def get_languages(config: DakConfig) -> List[LanguageEntry]
"""Return target language list from dak.json#translations.languages."""
def get_enabled_services(config: DakConfig) -> Dict[str, ServiceConfig]
"""Return dict of enabled translation services and their config."""
def discover_components(repo_root: Path) -> List[TranslationComponent]
"""Scan repo for *.pot files and return component definitions."""
def get_project_slug(github_org: str, repo_name: str) -> str
"""Derive Weblate project slug: f'{github_org}-{repo_name}'.lower()"""Requirement CFG-001: Every script that needs language codes or component
paths MUST import from translation_config.py. No script MAY contain a
hardcoded language list or hardcoded component path.
Requirement CFG-002: load_dak_config() MUST validate required fields and
raise a descriptive DakConfigError (not a generic exception) for missing or
malformed fields.
7.2 translation_security.py — Security utilities (NEW)
Purpose: Centralised input sanitisation and secret protection. Imported by
all scripts that handle external inputs (API tokens, slugs, URLs, language
codes).
Key functions:
def sanitize_slug(value: str, field_name: str) -> str
"""Allow only [a-z0-9-_]. Raise ValueError on invalid input."""
def sanitize_url(value: str, field_name: str, allowed_schemes=("https",)) -> str
"""Validate URL scheme and structure. Raise ValueError on invalid."""
def sanitize_lang_code(value: str) -> str
"""Validate BCP-47 language code format. Raise ValueError on invalid."""
def redact_for_log(value: str, visible_chars: int = 4) -> str
"""Return first N chars + '***' for safe log output of partial values."""
def assert_no_secret_in_env(env_var: str) -> None
"""Raise RuntimeError if the named env var was passed as a workflow input
(detectable by checking GITHUB_EVENT_INPUTS_ prefix). Guards against
accidentally wiring secrets as plaintext inputs."""Requirement SEC-001: All values received from environment variables that
originated from workflow_dispatch inputs MUST be sanitised before use.
Requirement SEC-002: API tokens MUST NEVER be logged, echoed, or included
in exception messages. Use redact_for_log() when a partial value must appear
in diagnostics.
Requirement SEC-003: assert_no_secret_in_env() MUST be called at startup
of any script that handles API tokens, to guard against misconfigured workflows
that accidentally pass tokens as inputs.
Requirement SEC-004: All HTTP requests to translation service APIs MUST
set a connection timeout (default 60 s) and a max-response-size guard (10 MiB).
7.3 register_translation_project.py — Per-IG project registration (NEW)
Purpose: Idempotently create or verify the project and all dynamically
discovered components for one IG repo, on every enabled translation service.
Usage:
# env: WEBLATE_API_TOKEN, CROWDIN_API_TOKEN, LAUNCHPAD_API_TOKEN (as applicable)
python register_translation_project.py --repo-name smart-hiv [--repo-root /path]Requirement REG-001: Registration MUST be idempotent.
Requirement REG-002: Missing dak.json → warning + exit 0.
Requirement REG-003: Project slug = {github_org}-{repo_name} (lowercase).
Requirement REG-004: Component list MUST come from discover_components().
Requirement REG-005: Service tokens MUST come from environment variables only.
7.4 register_all_dak_projects.py — Bulk registration (NEW)
Purpose: Discover all repos in the GitHub org with a dak.json, then call
register_translation_project for each.
Usage:
python register_all_dak_projects.py [--dry-run] [--org WorldHealthOrganization]
# env: WEBLATE_API_TOKEN, GITHUB_TOKENDiscovery uses GitHub Code Search API:
GET /search/code?q=filename:dak.json+org:{org}
7.5 pull_translations.py — Multi-service pull orchestrator (NEW)
Purpose: For each enabled service in dak.json, call the appropriate
service adapter script, collect updated .po files, and write them to the repo.
This is the single entry point called by the workflow; it never contains
service-specific logic.
Usage:
python pull_translations.py [--service weblate|launchpad|crowdin|all]
[--component SLUG] [--language CODE]
# env: per-service tokens as required7.6 pull_weblate_translations.py — Weblate service adapter (EXISTING, refactor)
- Remove hardcoded language list → use
translation_config.get_languages() - Remove hardcoded component map → use
translation_config.discover_components() - Remove hardcoded project slug default → derive from
GITHUB_REPOSITORYenv var - Add security hardening from
translation_security
7.7 pull_launchpad_translations.py — Launchpad service adapter (NEW)
Purpose: Fetch .po files from Launchpad Translations API.
Launchpad uses .pot/.po directly and exposes a REST API.
Key config in dak.json:
"launchpad": {
"enabled": true,
"project": "smart-hiv"
}Requirement LP-001: Launchpad API token MUST be read from
LAUNCHPAD_API_TOKEN environment variable.
7.8 pull_crowdin_translations.py — Crowdin service adapter (NEW)
Purpose: Fetch .po files from Crowdin v2 API.
Key config in dak.json:
"crowdin": {
"enabled": true,
"projectId": "12345"
}Requirement CR-001: Crowdin API token MUST be read from
CROWDIN_API_TOKEN environment variable.
Requirement CR-002: The adapter MUST convert Crowdin's native format to
standard .po if Crowdin does not natively export Gettext.
7.9 translation_report.py — Completeness report generator (NEW)
Purpose: Scan all .po files in the repo and generate
input/pagecontent/translation-status.md.
See §13 for full report format requirements.
Usage:
python translation_report.py [--repo-root .] [--output input/pagecontent/translation-status.md]7.10 extract_script_strings.py — Extract .pot from Python scripts (NEW)
Purpose: Extract all translatable strings from Python scripts in
input/scripts/ and produce a .pot file at
input/scripts/translations/scripts.pot.
Mechanism: Use Python's standard xgettext (via subprocess) or the
babel.messages.extract API to scan *.py files for _(), gettext(),
ngettext() call patterns.
Requirement SCR-001: All user-facing strings in Python scripts (log
messages visible in GitHub Actions output, report labels, error messages)
MUST be wrapped in _() for extractability.
Requirement SCR-002: extract_script_strings.py MUST be integrated into
the commit-pot.yml pipeline so scripts.pot is committed alongside other
.pot files.
Requirement SCR-003: The scripts-scripts component MUST be auto-registered
in Weblate
CB92
(and other services) via discover_components() like any other component.
Requirement SCR-004: Translated strings from input/scripts/translations/ {lang}.po MUST be loaded at runtime by scripts using Python's gettext module,
with en as the fallback locale.
# Standard pattern for all scripts (import from translation_config):
from translation_config import setup_gettext
_ = setup_gettext(__file__) # loads locale from translations/ sibling dir8. GitHub Actions Workflows
All workflows are in WorldHealthOrganization/smart-base/.github/workflows/.
Shell code in workflows is strictly limited to:
- Setting environment variables
- Calling a Python script
- Checking exit code
No business logic, string manipulation, or conditionals based on string content
MAY appear in workflow YAML run: blocks.
8.1 register_translation_project.yml (NEW)
Purpose: Register one or all DAK IG repos with all enabled translation services.
Triggers:
| Trigger | Mode |
|---|---|
workflow_dispatch |
mode=single: register one named repo; mode=all: bulk register |
repository_dispatch type=dak-ig-registered |
Auto-register when downstream IG pushes dak.json |
Inputs (workflow_dispatch):
| Input | Type | Default | Secret? | Description |
|---|---|---|---|---|
mode |
choice | single |
No | single or all |
repo_name |
string | (blank) | No | Target repo name (mode=single) |
weblate_url |
string | https://hosted.weblate.org |
No | Non-secret: base URL only |
dry_run |
boolean | false |
No | List repos without registering (mode=all) |
⚠️ WEBLATE_API_TOKENis NEVER an input. It is always a secret.
Required secrets:
| Secret | Minimum permission | Used by |
|---|---|---|
WEBLATE_API_TOKEN |
project-admin | register_translation_project.py |
GITHUB_TOKEN |
read | GitHub Search API (mode=all) |
CROWDIN_API_TOKEN |
project-admin | Crowdin adapter (if enabled) |
LAUNCHPAD_API_TOKEN |
project-admin | Launchpad adapter (if enabled) |
8.2 pull_translations.yml (EXISTING — major refactor)
Purpose: Pull .po files from all enabled translation services, create or
update translations/{service} feature branch, and open a PR if none exists.
Triggers:
| Trigger | Description |
|---|---|
workflow_dispatch |
Manual: choose service, component, language |
schedule |
Nightly at 02:00 UTC (enabled by default; can be disabled per-repo) |
Inputs (workflow_dispatch):
| Input | Type | Default | Secret? | Description |
|---|---|---|---|---|
service |
choice | all |
No | all, weblate, launchpad, crowdin |
component |
string | (all) | No | Restrict to one component slug |
language |
string | (all) | No | Restrict to one language code |
weblate_url |
string | https://hosted.weblate.org |
No | Override Weblate URL |
⚠️ No token inputs. All tokens are secrets.
Feature branch and PR behaviour:
- Script downloads updated
.pofiles to a temp area - Workflow checks out (or creates) branch
translations/{service} .pofiles are committed to that branch- If no open PR from
translations/{service}→mainexists, one is
created withgh pr create(viaGITHUB_TOKEN) - If a PR already exists, the new commit is pushed to the existing branch;
the PR is updated with a comment summarising the changes
Requirement PULL-001: Gate on dak.json presence; skip silently if absent.
Requirement PULL-002: Languages and components MUST come from translation_config.
Requirement PULL-003: Project slug MUST be auto-derived: {GITHUB_REPOSITORY_OWNER}-{GITHUB_REPOSITORY##*/}.
Requirement PULL-004: Each service MUST write to its own feature branch
translations/{service-name}.
Requirement PULL-005: A PR MUST be created automatically if no open PR
exists for the feature branch.
Requirement PULL-006: If no .po changes are detected after pull, no
commit, no branch, no PR is created.
Requirement PULL-007: The workflow MUST support a nightly schedule.
8.3 commit-pot.yml (EXISTING — minor additions)
Additions:
- After existing extraction steps, call
extract_script_strings.pyto
produceinput/scripts/translations/scripts.pot - Commit that file alongside all other
.potfiles
Requirement POT-001: commit-pot.yml MUST invoke extract_script_strings.py.
Requirement POT-002: Empty .pot files (no translatable strings) MUST still
be committed so service components always have a valid template.
8.4 generate_translation_report.yml (NEW)
Purpose: Generate input/pagecontent/translation-status.md as a pre-publication
step; integrated into the main CI/GH Pages build.
Triggers:
- Called from the main IG build workflow before
gh-pagespublish step workflow_dispatchfor standalone regeneration
Script called: python input/scripts/translation_report.py
8.5 notify_smart_base.yml — Downstream IG trigger (NEW, in each IG repo)
Purpose: When dak.json is pushed to main in a downstream IG, fire a
repository_dispatch event to smart-base to trigger project registration.
Triggers: push to main, path dak.json
Required secrets:
| Secret | Description |
|---|---|
SMARTBASE_DISPATCH_TOKEN |
GitHub PAT with repo scope on smart-base |
Requirement NOTIFY-001: MUST be included in smart-dak-empty template.
9. Secrets and Credentials
9.1 Complete secrets inventory
| Secret name | Scope | Min permission | Required by |
|---|---|---|---|
WEBLATE_API_TOKEN |
smart-base only |
project-admin | register_translation_project.yml |
WEBLATE_API_TOKEN |
each DAK IG repo | read | pull_translations.yml |
CROWDIN_API_TOKEN |
each DAK IG repo (if used) | project-admin for reg; read for pull | registration + pull |
LAUNCHPAD_API_TOKEN |
each DAK IG repo (if used) | project-admin for reg; read for pull | registration + pull |
SMARTBASE_DISPATCH_TOKEN |
each downstream IG repo | repo scope on smart-base |
notify_smart_base.yml |
GITHUB_TOKEN |
auto-provided | varies | all workflows |
Recommendation:
WEBLATE_API_TOKEN(read-only),CROWDIN_API_TOKEN
(read-only), andSMARTBASE_DISPATCH_TOKENSHOULD be stored as
organisation-level secrets inWorldHealthOrganizationto eliminate
per-repo setup. The project-admin tokens remain insmart-baseonly.
9.2 Token acquisition
Weblate:
- Log in at hosted.weblate.org
- Account → Settings → API access (
/accounts/profile/#api) - Generate token; note project-admin vs. read-only scopes
Crowdin:
- Log in at crowdin.com
- Account Settings → API → Personal Access Tokens
- Generate token with appropriate project scope
Launchpad:
- Log in at launchpad.net
https://launchpad.net/+apitokens→ Create new token
GitHub PAT (SMARTBASE_DISPATCH_TOKEN):
Settings → Developer settings → Personal access tokens → Fine-grained tokens
Repository: WorldHealthOrganization/smart-base
Permissions: Contents (read), Actions (write)
9.3 Adding a secret
Via GitHub UI:
Repo → Settings → Secrets and variables → Actions → New repository secret
Via CLI:
gh secret set WEBLATE_API_TOKEN --repo WorldHealthOrganization/smart-hivVia org-level (org admin required):
gh secret set WEBLATE_API_TOKEN --org WorldHealthOrganization \
--repos "smart-hiv,smart-immunizations,smart-anc"9.4 Security prohibitions
The following are STRICTLY PROHIBITED and enforced by translation_security.py:
- Passing any API token as a
workflow_dispatchinput - Logging any token value (even partial) except via
redact_for_log() - Storing any token in
dak.json,weblate.yaml, or any committed file - Interpolating any
${{ secrets.* }}value directly into a shellrun:command
(useenv:block → Python arg) - Using
set -xin any workflow step that handles secrets
10. Per-IG Onboarding Checklist
Prerequisites
- Repo exists under
WorldHealthOrganization/ -
dak.jsonpresent at repo root withtranslationsblock populated - At least one
.potfile committed (orcommit-pot.ymlrun to generate)
One-time setup steps
Step 1 — Add dak.json with translations config
Include the translations block as shown in §4.1. Set enabled services.
Step 2 — Add secrets to the IG repo
# Required for all repos:
gh secret set WEBLATE_API_TOKEN --repo WorldHealthOrganization/{repo-name}
gh secret set SMARTBASE_DISPATCH_TOKEN --repo WorldHealthOrganization/{repo-name}
# Only if Crowdin enabled in dak.json:
gh secret set CROWDIN_API_TOKEN --repo WorldHealthOrganization/{repo-name}
# Only if Launchpad enabled in dak.json:
gh secret set LAUNCHPAD_API_TOKEN --repo WorldHealthOrganization/{repo-name}Step 3 — Add notify_smart_base.yml to the repo
Copy from smart-dak-empty. Already present in any repo created from template.
Step 4 — Seed .pot files
Actions → Commit POT Files → Run workflow (on main)
Step 5 — Register in translation services
Either automatic (push dak.json → notify_smart_base.yml fires) or manual:
Actions (in smart-base) → Register Translation Projects
→ mode=single, repo_name={repo-name}
Step 6 — Verify
- Visit
https://hosted.weblate.org/projects/worldhealthorganization-{repo-name}/ - Confirm all discovered components are present
- Confirm source strings are visible for each component
Step 7 — Enable nightly pull (optional)
Uncomment the schedule: block in pull_translations.yml in the IG repo, or
rely on org-level orchestration from smart-base.
11. Automated Bulk Registration
For one-time catch-up of all existing DAK IG repos:
smart-base → Actions → Register Translation Projects
→ mode=all, dry_run=true
Review the discovered repo list in the logs, then re-run with dry_run=false.
Discovery mechanism:
GET /search/code?q=filename:dak.json+org:{org}&per_page=100
This finds every repo containing a file named dak.json, regardless of content.
The script then fetches each dak.json and validates the translations block
before attempting registration.
12. Translation Lifecycle
1. AUTHOR pushes content change to main
│
▼
2. commit-pot.yml runs (via ci.yml)
• IG Publisher extracts FHIR resource strings → .pot
• extract_translations.py extracts diagram strings → .pot
• extract_script_strings.py extracts Python script strings → .pot
• All .pot files committed to main
│
▼
3. Translation services detect .pot changes
(Weblate: via webhook or polling; others: via API push or scheduled)
• Translators work in service UI
• Translations approved / reviewed per service workflow
│
▼
4. pull_translations.yml runs (nightly or workflow_dispatch)
• For each enabled service:
- pull_translations.py calls service adapter
- Service adapter fetches approved .po files
- .po files written to translations/{service} branch
- PR created/updated: translations/{service} → main
│
▼
5. IG Admin reviews and merges translation PR
│
▼
6. On merge to main:
• generate_translation_report.yml runs
• translation_report.py generates translation-status.md
• inject_translations.py injects .po into diagram sources
• Full IG build runs → multilingual pages published
13. Translation Status Report
13.1 Output file
input/pagecontent/translation-status.md
This file is:
- Auto-generated by
translation_report.py; never manually edited - Committed to main as part of the pre-publication build step
- Published as an IG page (
Translation Status) in the built site - Regenerated on every main branch build (so always current)
13.2 Report structure
# Translation Status
Generated: {ISO 8601 datetime UTC}
Source language: English (`en`)
Target languages: Arabic (`ar`), Chinese (`zh`), French (`fr`),
Russian (`ru`), Spanish (`es`)
## Summary
| Component | ar | zh | fr | ru | es |
|-----------|----|----|----|----|-----|
| fsh-base | 45% | 72% | 100% | 38% | 89% |
| images-source-diagrams | 0% | 12% | 56% | 0% | 34% |
| scripts-scripts | 10% | 10% | 20% | 5% | 18% |
| **Total** | **32%** | **51%** | **76%** | **24%** | **62%** |
## Component Detail
### fsh-base
Source: `input/fsh/translations/base.pot` |
[View source strings](../fsh/translations/base.pot)
<details>
<summary>Arabic (ar) — 45% complete (45/100 strings)</summary>
| msgid (English) | ar | Source context |
|-----------------|----|----------------|
| "Patient name" | "اسم المريض" ✅ | [ANCContact.fsh#L12](../fsh/ANCContact.fsh#L12) |
| "Visit date" | _(untranslated)_ ❌ | [ANCContact.fsh#L18](../fsh/ANCContact.fsh#L18) |
| … | … | … |
</details>
<details>
<summary>French (fr) — 100% complete (100/100 strings) ✅</summary>
…
</details>13.3 Report requirements
Requirement RPT-001: The report MUST show % complete per language per
component as a summary table.
Requirement RPT-002: Each component section MUST be expandable (HTML
<details>/<summary>) to show individual strings with their translations.
Requirement RPT-003: Each string MUST include a source context link
pointing to the file and line number where the string originates (from the
.pot file's #: file:line comment).
Requirement RPT-004: The report MUST indicate untranslated strings (❌) and
translated strings (✅) visually.
Requirement RPT-005: The report MUST be generated in the pre-publication
build step, not in a separate manual step.
Requirement RPT-006: The report MUST use languages from dak.json, not
any hardcoded list.
Requirement RPT-007: An overall completeness percentage per language MUST
appear in the summary table footer row.
14. Python Script Standards
All scripts in input/scripts/ MUST comply with the following standards.
14.1 Business logic in Python
Requirement PY-001: All business logic MUST reside in Python scripts.
Workflow YAML run: blocks MUST contain only:
run: |
python input/scripts/some_script.py --arg "$ENV_VAR_FROM_ENV_BLOCK"String manipulation, conditionals on content, and API calls MUST NOT appear
in shell.
14.2 Input handling and sanitisation
Requirement PY-002: All values received from environment variables MUST be
read and sanitised through translation_security.py before use.
Requirement PY-003: API tokens MUST be read from environment variables;
MUST NOT be accepted as function arguments, constructor parameters, or CLI
positional arguments (use os.environ.get() only).
Requirement PY-004: Environment variable → CLI arg wiring in workflows
MUST use the env: block pattern:
env:
INPUT_FOO: ${{ inputs.foo }} # non-secret input
SECRET_TOKEN: ${{ secrets.WEBLATE_API_TOKEN }} # secret
run: python script.py --foo "$INPUT_FOO"
# Script reads SECRET_TOKEN directly from os.environ14.3 Logging and observability
Requirement PY-005: Scripts MUST use Python's logging module with
structured log levels (DEBUG / INFO / WARNING / ERROR). print() is
not used for operational output.
Requirement PY-006: Log output MUST NOT contain any secret value.
Token values in logs MUST be passed through redact_for_log().
Requirement PY-007: Scripts MUST emit a summary at completion: how many
items processed, how many succeeded, how many failed.
14.4 Error handling
Requirement PY-008: Scripts MUST return exit(0) on success and
exit(1) on partial/full failure. exit(2) for bad arguments.
Requirement PY-009: Scripts MUST handle API rate limits gracefully by
reading Retry-After / X-RateLimit-Reset headers and sleeping accordingly,
up to a configurable max wait.
Requirement PY-010: Network requests MUST use a requests Session with
timeout=(connect_timeout, read_timeout) set explicitly.
14.5 Documentation
Requirement PY-011: Every script MUST have a module docstring describing:
- Purpose
- Usage (CLI invocation)
- Environment variables consumed
- Exit codes
Requirement PY-012: A companion file input/scripts/TRANSLATION-ADMIN.md
MUST document:
- All scripts in the translation pipeline and their relationships
- Complete list of environment variables and secrets consumed
- Step-by-step runbooks for common operations
- Troubleshooting guide for common failures
15. Requirements Summary
Configuration
| ID | Requirement |
|---|---|
| CFG-001 | All scripts MUST read language list from dak.json#translations.languages |
| CFG-002 | All scripts MUST discover components by scanning for *.pot files |
| CFG-003 | Project slug MUST be {github_org}-{repo_name} (lowercase) |
| CFG-004 | load_dak_config() MUST validate and raise DakConfigError on bad config |
| CFG-005 | dak.json schema MUST be updated and a JSON Schema provided for validation |
Logical Model
| ID | Requirement |
|---|---|
| LM-001 | DAK FSH Logical Model MUST include translations backbone element |
| LM-002 | A JSON Schema for dak.json MUST be provided |
| LM-003 | smart-base dak.json MUST be updated with the 6 UN language entries |
Security
| ID | Requirement |
|---|---|
| SEC-001 | All workflow inputs MUST be sanitised before use in Python |
| SEC-002 | Tokens MUST NEVER appear in logs; use redact_for_log() |
| SEC-003 | assert_no_secret_in_env() MUST be called at script startup |
| SEC-004 | All HTTP requests MUST have timeouts and response size guards |
| SEC-005 | Tokens MUST be secrets, never plaintext workflow_dispatch inputs |
| SEC-006 | set -x MUST NOT appear in any workflow step handling secrets |
Registration
| ID | Requirement |
|---|---|
| REG-001 | Registration MUST be idempotent |
| REG-002 | Missing dak.json → warning + exit 0 |
| REG-003 | Components MUST come from discover_components() |
| REG-004 | Tokens MUST be environment secrets only |
| REG-005 | Bulk registration MUST support --dry-run |
| REG-006 | repository_dispatch trigger MUST be supported |
Pull / Feature Branch
| ID | Requirement |
|---|---|
| PULL-001 | Gate on dak.json existence |
| PULL-002 | Languages from translation_config only |
| PULL-003 | Project slug auto-derived from GITHUB_REPOSITORY |
| PULL-004 | Each service uses its own translations/{service} branch |
| PULL-005 | PR auto-created if none exists; updated if one exists |
| PULL-006 | No changes = no commit, no PR |
| PULL-007 | Nightly schedule supported |
| PULL-008 | Multiple services supported in single workflow run |
Report
| ID | Requirement |
|---|---|
| RPT-001 | Summary table: % complete per language per component |
| RPT-002 | Expandable per-component string detail |
| RPT-003 | Source context links per string |
| RPT-004 | ✅/❌ visual indicators |
| RPT-005 | Generated in pre-publication build step |
| RPT-006 | Languages from dak.json |
| RPT-007 | Overall % per language in summary footer |
Python Standards
| ID | Requirement |
|---|---|
| PY-001 | Business logic in Python; shell is wiring only |
| PY-002 | Inputs sanitised via translation_security.py |
| PY-003 | Tokens from os.environ only |
| PY-004 | env: block for secret/input wiring |
| PY-005 | Use logging module; no print() for operational output |
| PY-006 | No secrets in log output |
| PY-007 | Summary log on completion |
| PY-008 | Exit codes 0/1/2 |
| PY-009 | Rate-limit handling with Retry-After |
| PY-010 | Explicit request timeouts |
| PY-011 | Module docstrings with usage, env vars, exit codes |
| PY-012 | TRANSLATION-ADMIN.md for admin runbooks |
Script Component
| ID | Requirement |
|---|---|
| SCR-001 | User-facing strings in Python scripts MUST use _() |
| SCR-002 | extract_script_strings.py MUST be in commit-pot.yml pipeline |
| SCR-003 | scripts-scripts component MUST be auto-registered via discovery |
| SCR-004 | Translated strings MUST be loaded at runtime via gettext |
Template (smart-dak-empty)
| ID | Requirement |
|---|---|
| TMPL-001 | MUST include notify_smart_base.yml |
| TMPL-002 | MUST include pull_translations.yml (or workflow_call) |
| TMPL-003 | MUST include skeleton dak.json with translations block |
| TMPL-004 | MUST include weblate.yaml (informational) |
| TMPL-005 | MUST include placeholder .pot directories |
16. Open Questions
-
PR vs. direct commit: The current spec uses PRs for translation pull-back
(one PR per service). Is there a preference for auto-merge if completeness
exceeds a threshold (e.g. 80%)? Or always require human merge? -
Organisation-level secrets: Should
WEBLATE_API_TOKEN(read),
CROWDIN_API_TOKEN(read), andSMARTBASE_DISPATCH_TOKENbe org-level
secrets? Requires org admin approval. -
Launchpad authentication: Launchpad uses OAuth 1.0. The token handling
differs from Weblate/Crowdin. Does WHO have an existing Launchpad account
and OAuth consumer registered? -
Crowdin format: Crowdin natively supports
.po. Does the WHO Crowdin
plan include API access, or only web UI? -
Narrative markdown as a translation component: Should
input/pagecontent/*.mdbe extracted to a.potand translated? This
would add a 6th component (pagecontent-narratives). High value but more
complex injection. -
Weblate self-hosting: Is
hosted.weblate.orgpermanent or will WHO
self-host? Token management and URL config will change. -
Machine translation pre-fill:
weblate.yamlhas LibreTranslate. Should
Crowdin/Launchpad also have MT pre-fill configured? Which MT services are
WHO-approved? -
zhvariant: Shouldzhbezh_Hans(Simplified) orzh_Hant
(Traditional)? Currentlyzhis assumed Simplified. WHO house style?
17. References
DRAFT v0.2 — for review and iteration. Edit this file and open a PR with corrections,
decisions on open questions, or additional requirements. Do not implement
without sign-off on §16.