FFFF translation service integration · Issue #281 · WorldHealthOrganization/smart-base · GitHub
[go: up one dir, main page]

Skip to content

translation service integration #281

@litlfred

Description

@litlfred

WHO SMART Guidelines — Multi-Service Translation Integration

Status: DRAFT v0.2 — 2026-03-05
Author: @litlfred
Repo: WorldHealthOrganization/smart-base
Audience: IG Administrators


Table of Contents

  1. Overview
  2. Concepts and Terminology
  3. Architecture
  4. DAK Configuration — dak.json
  5. DAK Logical Model Updates
  6. Translation Components — Dynamic Discovery
  7. Script Infrastructure (input/scripts/)
  8. GitHub Actions Workflows
  9. Secrets and Credentials
  10. Per-IG Onboarding Checklist
  11. Automated Bulk Registration
  12. Translation Lifecycle
  13. Translation Status Report
  14. Python Script Standards
  15. Requirements Summary
  16. Open Questions
  17. References

1. Overview

WHO SMART Guidelines are published as FHIR Implementation Guides (IGs). Each
DAK (Digital Adaptation Kit) IG is a separate GitHub repository under
WorldHealthOrganization/. This document specifies the complete, multi-service
translation integration between these IG repos and external translation
platforms (Weblate, Launchpad, Crowdin, and others that support the
.pot/.po Gettext format).

Goals

  • One Weblate Project per DAK IG — isolated translation "space"
  • Multiple translation services in the same workflow — Weblate, Launchpad,
    Crowdin, or any service supporting .pot/.po
  • Fully automated via GitHub Actions — no manual Weblate web UI steps
    required for day-to-day operation
  • Python-first business logic — all non-trivial logic in Python scripts;
    shell in workflows is wiring only
  • dak.json as the single source of truth for language config; no
    hardcoded language lists anywhere
  • Dynamic component discovery — components discovered by scanning repo for
    .pot files; no hardcoded component lists
  • Security-hardened — secrets in Actions secrets only; inputs sanitised;
    no secret values ever in logs

2. Concepts and Terminology

Term Meaning
DAK IG A FHIR Implementation Guide that is a WHO Digital Adaptation Kit; identified by presence of dak.json in the repo root
smart-base Shared infrastructure IG; hosts all reusable scripts and reusable workflows
Translation service An external platform that manages .pot/.po files: Weblate, Launchpad, Crowdin, or other
Translation component One .pot file and its corresponding .po files; derived from the .pot file's path in the repo
.pot file Gettext Portable Object Template — English source strings
.po file Gettext Portable Object — translated strings for one language
Project slug {github-org}-{repo-name}, e.g. worldhealthorganization-smart-hiv
Component slug Derived from the .pot file path (see §6)
Source language Always English (en); not downloaded from translation services
Target languages Configurable per IG in dak.json#translations.languages; defaults to the 6 UN official languages
UN 6 languages ar, zh, fr, ru, es (plus source en)
Feature branch translations/{service-name} — created per service when new translations arrive
Completeness report input/pagecontent/translation-status.md — auto-generated; shows % complete by language × component

3. Architecture

┌────────────────────────────────────────────────────────────────────┐
│  DAK IG Repos  (e.g. smart-hiv, smart-immunizations, …)            │
│                                                                    │
│  dak.json         ← defines languages, translation services        │
│  *.pot files      ← generated by commit-pot.yml + extract_*.py     │
│  translations/    ← .po files per service/lang, managed by Actions │
│  input/pagecontent/translation-status.md  ← completeness report   │
│                                                                    │
│  notify_smart_base.yml  → fires repository_dispatch on dak.json Δ  │
└──────────────────────────────┬─────────────────────────────────────┘
                               │ repository_dispatch
                               ▼
┌────────────────────────────────────────────────────────────────────┐
│  smart-base  (WorldHealthOrganization/smart-base)                  │
│                                                                    │
│  .github/workflows/                                                │
│    register_translation_project.yml  ← create service projects     │
│    pull_translations.yml             ← pull .po from all services  │
│    commit-pot.yml                    ← extract .pot, commit        │
│    generate_translation_report.yml   ← completeness report         │
│                                                                    │
│  input/scripts/                                                    │
│    translation_config.py      ← dak.json reader; lang/component    │
│    translation_security.py    ← input sanitisation; secret guard   │
│    register_translation_project.py  ← register one IG (idempotent) │
│    register_all_dak_projects.py     ← bulk discovery + register    │
│    pull_weblate_translations.py     ← Weblate service adapter      │
│    pull_launchpad_translations.py   ← Launchpad service adapter    │
│    pull_crowdin_translations.py     ← Crowdin service adapter      │
│    pull_translations.py             ← orchestrator for all services│
│    translation_report.py            ← generate translation-status  │
│    extract_script_strings.py        ← NEW: extract .pot from py    │
│    extract_translations.py          ← existing diagram/.pot extrac │
│    inject_translations.py           ← existing injector            │
└──────────────────────────────┬─────────────────────────────────────┘
             ┌─────────────────┼──────────────────┐
             ▼                 ▼                  ▼
   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐
   │   Weblate    │   │  Launchpad   │   │   Crowdin    │
   │ hosted.weblate│  │ launchpad.net│   │  crowdin.com │
   │   .org       │   │              │   │              │
   └──────────────┘   └──────────────┘   └──────────────┘

Key design principles:

  1. Python owns the logic. Workflow YAML only wires env vars to Python
    scripts. No business logic, string parsing, or conditionals in shell.
  2. dak.json is authoritative. Language lists, enabled services, and
    project metadata come exclusively from dak.json. Nothing is hardcoded.
  3. Secrets stay in GitHub Actions secrets. API tokens are never passed as
    workflow_dispatch plaintext inputs, never echoed, never interpolated into
    shell commands.
  4. Components are discovered, not declared. The script scans for *.pot
    files at runtime to determine the component list.
  5. One feature branch per service. translations/weblate,
    translations/launchpad, translations/crowdin — PRs opened automatically
    when new translations arrive.
  6. Idempotent everywhere. All registration and pull operations are safe to
    re-run.

4. DAK Configuration — dak.json

4.1 Updated dak.json schema (additions in bold)

{
  "resourceType": "DAK",
  "id": "smart.who.int.{ig-id}",
  "name": "{CamelCaseName}",
  "title": "{Human Readable Title}",
  "version": "{semver}",
  "status": "draft|active|retired",
  "publicationUrl": "https://smart.who.int/{ig-id}",
  "canonicalUrl":   "https://smart.who.int/{ig-id}",
  "previewUrl":     "https://WorldHealthOrganization.github.io/{repo-name}",
  "license":        "CC-BY-SA-3.0-IGO",
  "copyrightYear":  "2023+",
  "publisher": {
    "name": "WHO",
    "url":  "http://who.int"
  },

  "translations": {
    "sourceLanguage": "en",
    "languages": [
      { "code": "ar", "name": "Arabic",            "direction": "rtl",
        "plural": "nplurals=6; plural=(n==0?0:n==1?1:n==2?2:n%100>=3&&n%100<=10?3:n%100>=11&&n%100<=99?4:5);" },
      { "code": "zh", "name": "Chinese (Simplified)", "direction": "ltr",
        "plural": "nplurals=1; plural=0;" },
      { "code": "fr", "name": "French",             "direction": "ltr",
        "plural": "nplurals=2; plural=(n>1);" },
      { "code": "ru", "name": "Russian",            "direction": "ltr",
        "plural": "nplurals=3; plural=(n%10==1&&n%100!=11?0:n%10>=2&&n%10<=4&&(n%100<10||n%100>=20)?1:2);" },
      { "code": "es", "name": "Spanish",            "direction": "ltr",
        "plural": "nplurals=2; plural=(n!=1);" }
    ],
    "services": {
      "weblate": {
        "enabled": true,
        "url":     "https://hosted.weblate.org"
      },
      "launchpad": {
        "enabled": false
      },
      "crowdin": {
        "enabled": false
      }
    }
  }
}

4.2 Field descriptions

Field Type Required Description
translations.sourceLanguage string Yes BCP-47 code of source language; always "en"
translations.languages array Yes Target languages; replaces all hardcoded language lists
translations.languages[].code string Yes BCP-47 / ISO 639-1 code
translations.languages[].name string Yes Human-readable name
translations.languages[].direction string Yes ltr or rtl
translations.languages[].plural string Yes Gettext plural form expression
translations.services object Yes Keyed by service name; each has enabled flag and service-specific config
translations.services.weblate.url string No Weblate base URL; default https://hosted.weblate.org

4.3 smart-base instance (dak.json)

The smart-base dak.json MUST be updated to include the full
translations block with all 6 UN official non-English languages and
weblate.enabled = true. This serves as the canonical reference instance.

{
  "resourceType": "DAK",
  "id": "smart.who.int.base",
  "name": "Base",
  "title": "SMART Base",
  "version": "0.2.0",
  "status": "draft",
  "publicationUrl": "https://smart.who.int/base",
  "canonicalUrl":   "https://smart.who.int/base",
  "previewUrl":     "https://WorldHealthOrganization.github.io/smart-base",
  "license": "CC-BY-SA-3.0-IGO",
  "copyrightYear": "2023+",
  "publisher": { "name": "WHO", "url": "http://who.int" },
  "translations": {
    "sourceLanguage": "en",
    "languages": [
      { "code": "ar", "name": "Arabic",              "direction": "rtl",
        "plural": "nplurals=6; plural=(n==0?0:n==1?1:n==2?2:n%100>=3&&n%100<=10?3:n%100>=11&&n%100<=99?4:5);" },
      { "code": "zh", "name": "Chinese (Simplified)", "direction": "ltr",
        "plural": "nplurals=1; plural=0;" },
      { "code": "fr", "name": "French",               "direction": "ltr",
        "plural": "nplurals=2; plural=(n>1);" },
      { "code": "ru", "name": "Russian",              "direction": "ltr",
        "plural": "nplurals=3; plural=(n%10==1&&n%100!=11?0:n%10>=2&&n%10<=4&&(n%100<10||n%100>=20)?1:2);" },
      { "code": "es", "name": "Spanish",              "direction": "ltr",
        "plural": "nplurals=2; plural=(n!=1);" }
    ],
    "services": {
      "weblate": { "enabled": true, "url": "https://hosted.weblate.org" },
      "launchpad": { "enabled": false },
      "crowdin":   { "enabled": false }
    }
  }
}

5. DAK Logical Model Updates

The DAK Logical Model (StructureDefinition/DAK in smart-base) MUST be
extended to formally capture the translations element. The following elements
are to be added to the FSH definition:

DAK.translations                   0..1   BackboneElement  "Translation configuration"
DAK.translations.sourceLanguage    1..1   code             "Source language BCP-47 code"
DAK.translations.languages         0..*   BackboneElement  "Target language entries"
DAK.translations.languages.code    1..1   code             "BCP-47 / ISO 639-1 language code"
DAK.translations.languages.name    1..1   string           "Human-readable language name"
DAK.translations.languages.direction 1..1 code             "Text direction: ltr | rtl"
DAK.translations.languages.plural  0..1   string           "Gettext plural form expression"
DAK.translations.services          0..1   BackboneElement  "Enabled translation services"
DAK.translations.services.weblate  0..1   BackboneElement  "Weblate configuration"
DAK.translations.services.weblate.enabled 1..1 boolean    "Is Weblate enabled?"
DAK.translations.services.weblate.url     0..1 url        "Weblate base URL"
DAK.translations.services.launchpad 0..1  BackboneElement "Launchpad configuration"
DAK.translations.services.launchpad.enabled 1..1 boolean  "Is Launchpad enabled?"
DAK.translations.services.crowdin  0..1   BackboneElement "Crowdin configuration"
DAK.translations.services.crowdin.enabled 1..1 boolean    "Is Crowdin enabled?"

Requirement LM-001: The DAK FSH Logical Model in smart-base MUST include
these elements as normative backbone elements.

Requirement LM-002: A JSON Schema for dak.json MUST be provided alongside
the FSH LM for programmatic validation in scripts.


6. Translation Components — Dynamic Discovery

6.1 Discovery algorithm

All scripts MUST derive the component list at runtime by scanning the repo for
*.pot files rather than using any hardcoded list. The canonical function is
in input/scripts/translation_config.py:

def discover_components(repo_root: Path) -> List[TranslationComponent]:
    """
    Scan repo_root for all *.pot files and derive component definitions.
    Returns components sorted by pot_path for deterministic ordering.
    """

6.2 Component slug derivation

Given a .pot file path relative to repo root, the component slug is derived
as follows:

.pot path Component slug
input/fsh/translations/base.pot fsh-base
input/images-source/translations/diagrams.pot images-source-diagrams
input/images/translations/images.pot images-images
input/archimate/translations/models.pot archimate-models
input/diagrams/translations/diagrams.pot diagrams-diagrams
input/scripts/translations/scripts.pot scripts-scripts (new)

Algorithm: Take the path segments between input/ and /translations/,
plus the .pot stem, joined with -. Lowercase, non-alphanumeric → -.

6.3 .po file location convention

For a .pot at {dir}/translations/{stem}.pot, .po files live at:

{dir}/translations/{lang_code}.po

Example: input/fsh/translations/ar.po, input/fsh/translations/fr.po, etc.


7. Script Infrastructure (input/scripts/)

All scripts MUST follow the standards in §14. Below is the complete script inventory.

7.1 translation_config.py — Configuration reader (NEW)

Purpose: Single authoritative module for reading dak.json, discovering
components, and providing config to all other scripts. Eliminates all hardcoded
language or component lists.

Key functions:

def load_dak_config(repo_root: Path) -> DakConfig
    """Load and validate dak.json. Raises DakConfigError on missing/invalid."""

def get_languages(config: DakConfig) -> List[LanguageEntry]
    """Return target language list from dak.json#translations.languages."""

def get_enabled_services(config: DakConfig) -> Dict[str, ServiceConfig]
    """Return dict of enabled translation services and their config."""

def discover_components(repo_root: Path) -> List[TranslationComponent]
    """Scan repo for *.pot files and return component definitions."""

def get_project_slug(github_org: str, repo_name: str) -> str
    """Derive Weblate project slug: f'{github_org}-{repo_name}'.lower()"""

Requirement CFG-001: Every script that needs language codes or component
paths MUST import from translation_config.py. No script MAY contain a
hardcoded language list or hardcoded component path.

Requirement CFG-002: load_dak_config() MUST validate required fields and
raise a descriptive DakConfigError (not a generic exception) for missing or
malformed fields.


7.2 translation_security.py — Security utilities (NEW)

Purpose: Centralised input sanitisation and secret protection. Imported by
all scripts that handle external inputs (API tokens, slugs, URLs, language
codes).

Key functions:

def sanitize_slug(value: str, field_name: str) -> str
    """Allow only [a-z0-9-_]. Raise ValueError on invalid input."""

def sanitize_url(value: str, field_name: str, allowed_schemes=("https",)) -> str
    """Validate URL scheme and structure. Raise ValueError on invalid."""

def sanitize_lang_code(value: str) -> str
    """Validate BCP-47 language code format. Raise ValueError on invalid."""

def redact_for_log(value: str, visible_chars: int = 4) -> str
    """Return first N chars + '***' for safe log output of partial values."""

def assert_no_secret_in_env(env_var: str) -> None
    """Raise RuntimeError if the named env var was passed as a workflow input
    (detectable by checking GITHUB_EVENT_INPUTS_ prefix). Guards against
    accidentally wiring secrets as plaintext inputs."""

Requirement SEC-001: All values received from environment variables that
originated from workflow_dispatch inputs MUST be sanitised before use.

Requirement SEC-002: API tokens MUST NEVER be logged, echoed, or included
in exception messages. Use redact_for_log() when a partial value must appear
in diagnostics.

Requirement SEC-003: assert_no_secret_in_env() MUST be called at startup
of any script that handles API tokens, to guard against misconfigured workflows
that accidentally pass tokens as inputs.

Requirement SEC-004: All HTTP requests to translation service APIs MUST
set a connection timeout (default 60 s) and a max-response-size guard (10 MiB).


7.3 register_translation_project.py — Per-IG project registration (NEW)

Purpose: Idempotently create or verify the project and all dynamically
discovered components for one IG repo, on every enabled translation service.

Usage:

# env: WEBLATE_API_TOKEN, CROWDIN_API_TOKEN, LAUNCHPAD_API_TOKEN (as applicable)
python register_translation_project.py --repo-name smart-hiv [--repo-root /path]

Requirement REG-001: Registration MUST be idempotent.
Requirement REG-002: Missing dak.json → warning + exit 0.
Requirement REG-003: Project slug = {github_org}-{repo_name} (lowercase).
Requirement REG-004: Component list MUST come from discover_components().
Requirement REG-005: Service tokens MUST come from environment variables only.


7.4 register_all_dak_projects.py — Bulk registration (NEW)

Purpose: Discover all repos in the GitHub org with a dak.json, then call
register_translation_project for each.

Usage:

python register_all_dak_projects.py [--dry-run] [--org WorldHealthOrganization]
# env: WEBLATE_API_TOKEN, GITHUB_TOKEN

Discovery uses GitHub Code Search API:
GET /search/code?q=filename:dak.json+org:{org}


7.5 pull_translations.py — Multi-service pull orchestrator (NEW)

Purpose: For each enabled service in dak.json, call the appropriate
service adapter script, collect updated .po files, and write them to the repo.
This is the single entry point called by the workflow; it never contains
service-specific logic.

Usage:

python pull_translations.py [--service weblate|launchpad|crowdin|all]
    [--component SLUG] [--language CODE]
# env: per-service tokens as required

7.6 pull_weblate_translations.py — Weblate service adapter (EXISTING, refactor)

  • Remove hardcoded language list → use translation_config.get_languages()
  • Remove hardcoded component map → use translation_config.discover_components()
  • Remove hardcoded project slug default → derive from GITHUB_REPOSITORY env var
  • Add security hardening from translation_security

7.7 pull_launchpad_translations.py — Launchpad service adapter (NEW)

Purpose: Fetch .po files from Launchpad Translations API.
Launchpad uses .pot/.po directly and exposes a REST API.

Key config in dak.json:

"launchpad": {
  "enabled": true,
  "project": "smart-hiv"
}

Requirement LP-001: Launchpad API token MUST be read from
LAUNCHPAD_API_TOKEN environment variable.


7.8 pull_crowdin_translations.py — Crowdin service adapter (NEW)

Purpose: Fetch .po files from Crowdin v2 API.

Key config in dak.json:

"crowdin": {
  "enabled": true,
  "projectId": "12345"
}

Requirement CR-001: Crowdin API token MUST be read from
CROWDIN_API_TOKEN environment variable.
Requirement CR-002: The adapter MUST convert Crowdin's native format to
standard .po if Crowdin does not natively export Gettext.


7.9 translation_report.py — Completeness report generator (NEW)

Purpose: Scan all .po files in the repo and generate
input/pagecontent/translation-status.md.

See §13 for full report format requirements.

Usage:

python translation_report.py [--repo-root .] [--output input/pagecontent/translation-status.md]

7.10 extract_script_strings.py — Extract .pot from Python scripts (NEW)

Purpose: Extract all translatable strings from Python scripts in
input/scripts/ and produce a .pot file at
input/scripts/translations/scripts.pot.

Mechanism: Use Python's standard xgettext (via subprocess) or the
babel.messages.extract API to scan *.py files for _(), gettext(),
ngettext() call patterns.

Requirement SCR-001: All user-facing strings in Python scripts (log
messages visible in GitHub Actions output, report labels, error messages)
MUST be wrapped in _() for extractability.

Requirement SCR-002: extract_script_strings.py MUST be integrated into
the commit-pot.yml pipeline so scripts.pot is committed alongside other
.pot files.

Requirement SCR-003: The scripts-scripts component MUST be auto-registered
in Weblate CB92 (and other services) via discover_components() like any other component.

Requirement SCR-004: Translated strings from input/scripts/translations/ {lang}.po MUST be loaded at runtime by scripts using Python's gettext module,
with en as the fallback locale.

# Standard pattern for all scripts (import from translation_config):
from translation_config import setup_gettext
_ = setup_gettext(__file__)   # loads locale from translations/ sibling dir

8. GitHub Actions Workflows

All workflows are in WorldHealthOrganization/smart-base/.github/workflows/.
Shell code in workflows is strictly limited to:

  • Setting environment variables
  • Calling a Python script
  • Checking exit code

No business logic, string manipulation, or conditionals based on string content
MAY appear in workflow YAML run: blocks.

8.1 register_translation_project.yml (NEW)

Purpose: Register one or all DAK IG repos with all enabled translation services.

Triggers:

Trigger Mode
workflow_dispatch mode=single: register one named repo; mode=all: bulk register
repository_dispatch type=dak-ig-registered Auto-register when downstream IG pushes dak.json

Inputs (workflow_dispatch):

Input Type Default Secret? Description
mode choice single No single or all
repo_name string (blank) No Target repo name (mode=single)
weblate_url string https://hosted.weblate.org No Non-secret: base URL only
dry_run boolean false No List repos without registering (mode=all)

⚠️ WEBLATE_API_TOKEN is NEVER an input. It is always a secret.

Required secrets:

Secret Minimum permission Used by
WEBLATE_API_TOKEN project-admin register_translation_project.py
GITHUB_TOKEN read GitHub Search API (mode=all)
CROWDIN_API_TOKEN project-admin Crowdin adapter (if enabled)
LAUNCHPAD_API_TOKEN project-admin Launchpad adapter (if enabled)

8.2 pull_translations.yml (EXISTING — major refactor)

Purpose: Pull .po files from all enabled translation services, create or
update translations/{service} feature branch, and open a PR if none exists.

Triggers:

Trigger Description
workflow_dispatch Manual: choose service, component, language
schedule Nightly at 02:00 UTC (enabled by default; can be disabled per-repo)

Inputs (workflow_dispatch):

Input Type Default Secret? Description
service choice all No all, weblate, launchpad, crowdin
component string (all) No Restrict to one component slug
language string (all) No Restrict to one language code
weblate_url string https://hosted.weblate.org No Override Weblate URL

⚠️ No token inputs. All tokens are secrets.

Feature branch and PR behaviour:

  1. Script downloads updated .po files to a temp area
  2. Workflow checks out (or creates) branch translations/{service}
  3. .po files are committed to that branch
  4. If no open PR from translations/{service}main exists, one is
    created with gh pr create (via GITHUB_TOKEN)
  5. If a PR already exists, the new commit is pushed to the existing branch;
    the PR is updated with a comment summarising the changes

Requirement PULL-001: Gate on dak.json presence; skip silently if absent.
Requirement PULL-002: Languages and components MUST come from translation_config.
Requirement PULL-003: Project slug MUST be auto-derived: {GITHUB_REPOSITORY_OWNER}-{GITHUB_REPOSITORY##*/}.
Requirement PULL-004: Each service MUST write to its own feature branch
translations/{service-name}.
Requirement PULL-005: A PR MUST be created automatically if no open PR
exists for the feature branch.
Requirement PULL-006: If no .po changes are detected after pull, no
commit, no branch, no PR is created.
Requirement PULL-007: The workflow MUST support a nightly schedule.


8.3 commit-pot.yml (EXISTING — minor additions)

Additions:

  • After existing extraction steps, call extract_script_strings.py to
    produce input/scripts/translations/scripts.pot
  • Commit that file alongside all other .pot files

Requirement POT-001: commit-pot.yml MUST invoke extract_script_strings.py.
Requirement POT-002: Empty .pot files (no translatable strings) MUST still
be committed so service components always have a valid template.


8.4 generate_translation_report.yml (NEW)

Purpose: Generate input/pagecontent/translation-status.md as a pre-publication
step; integrated into the main CI/GH Pages build.

Triggers:

  • Called from the main IG build workflow before gh-pages publish step
  • workflow_dispatch for standalone regeneration

Script called: python input/scripts/translation_report.py


8.5 notify_smart_base.yml — Downstream IG trigger (NEW, in each IG repo)

Purpose: When dak.json is pushed to main in a downstream IG, fire a
repository_dispatch event to smart-base to trigger project registration.

Triggers: push to main, path dak.json

Required secrets:

Secret Description
SMARTBASE_DISPATCH_TOKEN GitHub PAT with repo scope on smart-base

Requirement NOTIFY-001: MUST be included in smart-dak-empty template.


9. Secrets and Credentials

9.1 Complete secrets inventory

Secret name Scope Min permission Required by
WEBLATE_API_TOKEN smart-base only project-admin register_translation_project.yml
WEBLATE_API_TOKEN each DAK IG repo read pull_translations.yml
CROWDIN_API_TOKEN each DAK IG repo (if used) project-admin for reg; read for pull registration + pull
LAUNCHPAD_API_TOKEN each DAK IG repo (if used) project-admin for reg; read for pull registration + pull
SMARTBASE_DISPATCH_TOKEN each downstream IG repo repo scope on smart-base notify_smart_base.yml
GITHUB_TOKEN auto-provided varies all workflows

Recommendation: WEBLATE_API_TOKEN (read-only), CROWDIN_API_TOKEN
(read-only), and SMARTBASE_DISPATCH_TOKEN SHOULD be stored as
organisation-level secrets in WorldHealthOrganization to eliminate
per-repo setup. The project-admin tokens remain in smart-base only.

9.2 Token acquisition

Weblate:

  1. Log in at hosted.weblate.org
  2. Account → Settings → API access (/accounts/profile/#api)
  3. Generate token; note project-admin vs. read-only scopes

Crowdin:

  1. Log in at crowdin.com
  2. Account Settings → API → Personal Access Tokens
  3. Generate token with appropriate project scope

Launchpad:

  1. Log in at launchpad.net
  2. https://launchpad.net/+apitokens → Create new token

GitHub PAT (SMARTBASE_DISPATCH_TOKEN):

Settings → Developer settings → Personal access tokens → Fine-grained tokens
Repository: WorldHealthOrganization/smart-base
Permissions: Contents (read), Actions (write)

9.3 Adding a secret

Via GitHub UI:

Repo → Settings → Secrets and variables → Actions → New repository secret

Via CLI:

gh secret set WEBLATE_API_TOKEN --repo WorldHealthOrganization/smart-hiv

Via org-level (org admin required):

gh secret set WEBLATE_API_TOKEN --org WorldHealthOrganization \
  --repos "smart-hiv,smart-immunizations,smart-anc"

9.4 Security prohibitions

The following are STRICTLY PROHIBITED and enforced by translation_security.py:

  • Passing any API token as a workflow_dispatch input
  • Logging any token value (even partial) except via redact_for_log()
  • Storing any token in dak.json, weblate.yaml, or any committed file
  • Interpolating any ${{ secrets.* }} value directly into a shell run: command
    (use env: block → Python arg)
  • Using set -x in any workflow step that handles secrets

10. Per-IG Onboarding Checklist

Prerequisites

  • Repo exists under WorldHealthOrganization/
  • dak.json present at repo root with translations block populated
  • At least one .pot file committed (or commit-pot.yml run to generate)

One-time setup steps

Step 1 — Add dak.json with translations config

Include the translations block as shown in §4.1. Set enabled services.

Step 2 — Add secrets to the IG repo

# Required for all repos:
gh secret set WEBLATE_API_TOKEN --repo WorldHealthOrganization/{repo-name}
gh secret set SMARTBASE_DISPATCH_TOKEN --repo WorldHealthOrganization/{repo-name}

# Only if Crowdin enabled in dak.json:
gh secret set CROWDIN_API_TOKEN --repo WorldHealthOrganization/{repo-name}

# Only if Launchpad enabled in dak.json:
gh secret set LAUNCHPAD_API_TOKEN --repo WorldHealthOrganization/{repo-name}

Step 3 — Add notify_smart_base.yml to the repo

Copy from smart-dak-empty. Already present in any repo created from template.

Step 4 — Seed .pot files

Actions → Commit POT Files → Run workflow (on main)

Step 5 — Register in translation services

Either automatic (push dak.jsonnotify_smart_base.yml fires) or manual:

Actions (in smart-base) → Register Translation Projects
→ mode=single, repo_name={repo-name}

Step 6 — Verify

  • Visit https://hosted.weblate.org/projects/worldhealthorganization-{repo-name}/
  • Confirm all discovered components are present
  • Confirm source strings are visible for each component

Step 7 — Enable nightly pull (optional)

Uncomment the schedule: block in pull_translations.yml in the IG repo, or
rely on org-level orchestration from smart-base.


11. Automated Bulk Registration

For one-time catch-up of all existing DAK IG repos:

smart-base → Actions → Register Translation Projects
→ mode=all, dry_run=true

Review the discovered repo list in the logs, then re-run with dry_run=false.

Discovery mechanism:

GET /search/code?q=filename:dak.json+org:{org}&per_page=100

This finds every repo containing a file named dak.json, regardless of content.
The script then fetches each dak.json and validates the translations block
before attempting registration.


12. Translation Lifecycle

1. AUTHOR pushes content change to main
        │
        ▼
2. commit-pot.yml runs (via ci.yml)
   • IG Publisher extracts FHIR resource strings → .pot
   • extract_translations.py extracts diagram strings → .pot
   • extract_script_strings.py extracts Python script strings → .pot
   • All .pot files committed to main
        │
        ▼
3. Translation services detect .pot changes
   (Weblate: via webhook or polling; others: via API push or scheduled)
   • Translators work in service UI
   • Translations approved / reviewed per service workflow
        │
        ▼
4. pull_translations.yml runs (nightly or workflow_dispatch)
   • For each enabled service:
     - pull_translations.py calls service adapter
     - Service adapter fetches approved .po files
     - .po files written to translations/{service} branch
     - PR created/updated: translations/{service} → main
        │
        ▼
5. IG Admin reviews and merges translation PR
        │
        ▼
6. On merge to main:
   • generate_translation_report.yml runs
   • translation_report.py generates translation-status.md
   • inject_translations.py injects .po into diagram sources
   • Full IG build runs → multilingual pages published

13. Translation Status Report

13.1 Output file

input/pagecontent/translation-status.md

This file is:

  • Auto-generated by translation_report.py; never manually edited
  • Committed to main as part of the pre-publication build step
  • Published as an IG page (Translation Status) in the built site
  • Regenerated on every main branch build (so always current)

13.2 Report structure

# Translation Status

Generated: {ISO 8601 datetime UTC}  
Source language: English (`en`)  
Target languages: Arabic (`ar`), Chinese (`zh`), French (`fr`),
                  Russian (`ru`), Spanish (`es`)

## Summary

| Component | ar | zh | fr | ru | es |
|-----------|----|----|----|----|-----|
| fsh-base | 45% | 72% | 100% | 38% | 89% |
| images-source-diagrams | 0% | 12% | 56% | 0% | 34% |
| scripts-scripts | 10% | 10% | 20% | 5% | 18% |
| **Total** | **32%** | **51%** | **76%** | **24%** | **62%** |

## Component Detail

### fsh-base

Source: `input/fsh/translations/base.pot` | 
[View source strings](../fsh/translations/base.pot)

<details>
<summary>Arabic (ar) — 45% complete (45/100 strings)</summary>

| msgid (English) | ar | Source context |
|-----------------|----|----------------|
| "Patient name" | "اسم المريض" ✅ | [ANCContact.fsh#L12](../fsh/ANCContact.fsh#L12) |
| "Visit date"   | _(untranslated)_| [ANCContact.fsh#L18](../fsh/ANCContact.fsh#L18) |
||||

</details>

<details>
<summary>French (fr) — 100% complete (100/100 strings) ✅</summary>
…
</details>

13.3 Report requirements

Requirement RPT-001: The report MUST show % complete per language per
component as a summary table.

Requirement RPT-002: Each component section MUST be expandable (HTML
<details>/<summary>) to show individual strings with their translations.

Requirement RPT-003: Each string MUST include a source context link
pointing to the file and line number where the string originates (from the
.pot file's #: file:line comment).

Requirement RPT-004: The report MUST indicate untranslated strings (❌) and
translated strings (✅) visually.

Requirement RPT-005: The report MUST be generated in the pre-publication
build step, not in a separate manual step.

Requirement RPT-006: The report MUST use languages from dak.json, not
any hardcoded list.

Requirement RPT-007: An overall completeness percentage per language MUST
appear in the summary table footer row.


14. Python Script Standards

All scripts in input/scripts/ MUST comply with the following standards.

14.1 Business logic in Python

Requirement PY-001: All business logic MUST reside in Python scripts.
Workflow YAML run: blocks MUST contain only:

run: |
  python input/scripts/some_script.py --arg "$ENV_VAR_FROM_ENV_BLOCK"

String manipulation, conditionals on content, and API calls MUST NOT appear
in shell.

14.2 Input handling and sanitisation

Requirement PY-002: All values received from environment variables MUST be
read and sanitised through translation_security.py before use.

Requirement PY-003: API tokens MUST be read from environment variables;
MUST NOT be accepted as function arguments, constructor parameters, or CLI
positional arguments (use os.environ.get() only).

Requirement PY-004: Environment variable → CLI arg wiring in workflows
MUST use the env: block pattern:

env:
  INPUT_FOO: ${{ inputs.foo }}   # non-secret input
  SECRET_TOKEN: ${{ secrets.WEBLATE_API_TOKEN }}  # secret
run: python script.py --foo "$INPUT_FOO"
# Script reads SECRET_TOKEN directly from os.environ

14.3 Logging and observability

Requirement PY-005: Scripts MUST use Python's logging module with
structured log levels (DEBUG / INFO / WARNING / ERROR). print() is
not used for operational output.

Requirement PY-006: Log output MUST NOT contain any secret value.
Token values in logs MUST be passed through redact_for_log().

Requirement PY-007: Scripts MUST emit a summary at completion: how many
items processed, how many succeeded, how many failed.

14.4 Error handling

Requirement PY-008: Scripts MUST return exit(0) on success and
exit(1) on partial/full failure. exit(2) for bad arguments.

Requirement PY-009: Scripts MUST handle API rate limits gracefully by
reading Retry-After / X-RateLimit-Reset headers and sleeping accordingly,
up to a configurable max wait.

Requirement PY-010: Network requests MUST use a requests Session with
timeout=(connect_timeout, read_timeout) set explicitly.

14.5 Documentation

Requirement PY-011: Every script MUST have a module docstring describing:

  • Purpose
  • Usage (CLI invocation)
  • Environment variables consumed
  • Exit codes

Requirement PY-012: A companion file input/scripts/TRANSLATION-ADMIN.md
MUST document:

  • All scripts in the translation pipeline and their relationships
  • Complete list of environment variables and secrets consumed
  • Step-by-step runbooks for common operations
  • Troubleshooting guide for common failures

15. Requirements Summary

Configuration

ID Requirement
CFG-001 All scripts MUST read language list from dak.json#translations.languages
CFG-002 All scripts MUST discover components by scanning for *.pot files
CFG-003 Project slug MUST be {github_org}-{repo_name} (lowercase)
CFG-004 load_dak_config() MUST validate and raise DakConfigError on bad config
CFG-005 dak.json schema MUST be updated and a JSON Schema provided for validation

Logical Model

ID Requirement
LM-001 DAK FSH Logical Model MUST include translations backbone element
LM-002 A JSON Schema for dak.json MUST be provided
LM-003 smart-base dak.json MUST be updated with the 6 UN language entries

Security

ID Requirement
SEC-001 All workflow inputs MUST be sanitised before use in Python
SEC-002 Tokens MUST NEVER appear in logs; use redact_for_log()
SEC-003 assert_no_secret_in_env() MUST be called at script startup
SEC-004 All HTTP requests MUST have timeouts and response size guards
SEC-005 Tokens MUST be secrets, never plaintext workflow_dispatch inputs
SEC-006 set -x MUST NOT appear in any workflow step handling secrets

Registration

ID Requirement
REG-001 Registration MUST be idempotent
REG-002 Missing dak.json → warning + exit 0
REG-003 Components MUST come from discover_components()
REG-004 Tokens MUST be environment secrets only
REG-005 Bulk registration MUST support --dry-run
REG-006 repository_dispatch trigger MUST be supported

Pull / Feature Branch

ID Requirement
PULL-001 Gate on dak.json existence
PULL-002 Languages from translation_config only
PULL-003 Project slug auto-derived from GITHUB_REPOSITORY
PULL-004 Each service uses its own translations/{service} branch
PULL-005 PR auto-created if none exists; updated if one exists
PULL-006 No changes = no commit, no PR
PULL-007 Nightly schedule supported
PULL-008 Multiple services supported in single workflow run

Report

ID Requirement
RPT-001 Summary table: % complete per language per component
RPT-002 Expandable per-component string detail
RPT-003 Source context links per string
RPT-004 ✅/❌ visual indicators
RPT-005 Generated in pre-publication build step
RPT-006 Languages from dak.json
RPT-007 Overall % per language in summary footer

Python Standards

ID Requirement
PY-001 Business logic in Python; shell is wiring only
PY-002 Inputs sanitised via translation_security.py
PY-003 Tokens from os.environ only
PY-004 env: block for secret/input wiring
PY-005 Use logging module; no print() for operational output
PY-006 No secrets in log output
PY-007 Summary log on completion
PY-008 Exit codes 0/1/2
PY-009 Rate-limit handling with Retry-After
PY-010 Explicit request timeouts
PY-011 Module docstrings with usage, env vars, exit codes
PY-012 TRANSLATION-ADMIN.md for admin runbooks

Script Component

ID Requirement
SCR-001 User-facing strings in Python scripts MUST use _()
SCR-002 extract_script_strings.py MUST be in commit-pot.yml pipeline
SCR-003 scripts-scripts component MUST be auto-registered via discovery
SCR-004 Translated strings MUST be loaded at runtime via gettext

Template (smart-dak-empty)

ID Requirement
TMPL-001 MUST include notify_smart_base.yml
TMPL-002 MUST include pull_translations.yml (or workflow_call)
TMPL-003 MUST include skeleton dak.json with translations block
TMPL-004 MUST include weblate.yaml (informational)
TMPL-005 MUST include placeholder .pot directories

16. Open Questions

  1. PR vs. direct commit: The current spec uses PRs for translation pull-back
    (one PR per service). Is there a preference for auto-merge if completeness
    exceeds a threshold (e.g. 80%)? Or always require human merge?

  2. Organisation-level secrets: Should WEBLATE_API_TOKEN (read),
    CROWDIN_API_TOKEN (read), and SMARTBASE_DISPATCH_TOKEN be org-level
    secrets? Requires org admin approval.

  3. Launchpad authentication: Launchpad uses OAuth 1.0. The token handling
    differs from Weblate/Crowdin. Does WHO have an existing Launchpad account
    and OAuth consumer registered?

  4. Crowdin format: Crowdin natively supports .po. Does the WHO Crowdin
    plan include API access, or only web UI?

  5. Narrative markdown as a translation component: Should
    input/pagecontent/*.md be extracted to a .pot and translated? This
    would add a 6th component (pagecontent-narratives). High value but more
    complex injection.

  6. Weblate self-hosting: Is hosted.weblate.org permanent or will WHO
    self-host? Token management and URL config will change.

  7. Machine translation pre-fill: weblate.yaml has LibreTranslate. Should
    Crowdin/Launchpad also have MT pre-fill configured? Which MT services are
    WHO-approved?

  8. zh variant: Should zh be zh_Hans (Simplified) or zh_Hant
    (Traditional)? Currently zh is assumed Simplified. WHO house style?


17. References

Resource URL
Weblate REST API https://docs.weblate.org/en/latest/api.html
Weblate hosted https://hosted.weblate.org
Crowdin API v2 https://developer.crowdin.com/api/v2/
Launchpad Translations API https://launchpad.net/+apidoc/1.0.html#translation_import_queue
smart-base https://github.com/WorldHealthOrganization/smart-base
smart-dak-empty https://github.com/WorldHealthOrganization/smart-dak-empty
pull_translations.yml https://github.com/WorldHealthOrganization/smart-base/blob/main/.github/workflows/pull_translations.yml
commit-pot.yml https://github.com/WorldHealthOrganization/smart-base/blob/main/.github/workflows/commit-pot.yml
Weblate API token https://hosted.weblate.org/accounts/profile/#api
GitHub encrypted secrets https://docs.github.com/en/actions/security-guides/encrypted-secrets
GitHub org-level secrets https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-secrets-for-an-organization
Python gettext https://docs.python.org/3/library/gettext.html
Babel message extraction https://babel.pocoo.org/en/latest/api/messages/extract.html
GNU gettext plural forms https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html
BCP-47 language tags https://www.rfc-editor.org/rfc/rfc5646

DRAFT v0.2 — for review and iteration. Edit this file and open a PR with corrections,
decisions on open questions, or additional requirements. Do not implement
without sign-off on §16.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0