the Chromium logo

The Chromium Projects

i18n for extensions

Introduction

Developers tend to hard-code messages in the code, in their native language, or more often in English, and we need to provide simple enough message replacement system for them to prevent that behavior.

There are two approaches we can use, one widely known Firefox ENTITY replacement, and the other Google Gadgets are using.

Google Gadgets approach

Google gadgets are based on XML spec, which carries some metadata and html/js of a gadget.

Each spec lists supported locales, and links to locale files. Any item can be replaced within the gadget spec (urls, messages...). Substitution is done in container code (iGoogle, Orkut...), where we control the whole process (fallback order for example).

Message catalogs are in XML format (to better support our translation pipeline).

Public API, sample gadget spec and message bundles could be found at http://code.google.com/intl/sr-RS/apis/gadgets/docs/i18n.html.

Firefox approach

Firefox is using XUL files for their extensions (XML format). XML parser automatically replaces DTD ENTITYs within a XML document given DTD file(s). For details see how to localize firefox extensions.

Problem with this approach is that we don't actually have XML/XHTML files tied to an extension. Also, we may want to implement more flexible fallback algorithm.

Proposed solution

We use HTML/JS to develop extensions, and to keep metadata about extensions (manifest) vs. XUL files for Firefox.

We should use modified Google Gadget approach since they are too HTML/JS entities:

See details below.

Locale fallback

Only some locales will have all of the messages translated, or resources generated. Some locales may be completely missing. In both cases Chrome should gracefully fall back to what's available.

To do that we need to order locales in tree like structure based on locale identifiers.

Supported locales

We support larger set of locales than Chrome UI. Current list is (as of 35300): am, ar, bg, bn, ca, cs, da, de, el, en, en_GB, en_US, es, es_419, et, fi, fil, fr, gu, he, hi, hr, hu, id, it, ja, kn, ko, lt,

lv, ml, mr, nb, nl, or, pl, pt, pt_BR, pt_PT, ro, ru, sk, sl, sr, sv, sw, ta, te, th, tr, uk, vi, zh, zh_CN, zh_TW

Replacement policy

To avoid hard-coding strings, developer should use message placeholders in the code/static files.

Message concatenation is usually a bad thing, and should be avoided, but it's possible with MSG_msg_1 + MSG_msg_2.

Message container

Message placeholders and message bodies have simple key-value structure, which can be implemented as:

Proposed JSON format:

{
  "name": {
  "message": "message text - short sentence or even a paragraph with a optional placeholder(s)",
  "description": "Description of a message that should give context to a translator",
  "placeholders": {
    "ph_1": {
      "content": "Actual string that's placed within a message.",
      "example": "Example shown to a translator."
    },
    ...
  },
  ...
  }
}

Example:

{
  "hello": {
    "message": "Hello $YOUR_NAME$",
    "description": "Peer greeting",
    "placeholders": {
      "your_name": {
        "content": "$1",
        "example": "Cira"
      },
      "bye": {
        "message": "Bye from $CHROME$ to $YOUR_NAME$",
        "description": "Going away greeting",
        "placeholders": {
          "chrome": {
            "content": "Chrome",
          },
          "your_name": {
            "content": "$1",
            "example": "Cira"
          }
        }
      }
    }
  }
}

Message format

There are couple of possible forms message can take:

Conflict resolution

Same message ID should exist only once per catalog. If there are duplicates - detected when packing extension - we should ask developer to remove them.

Plural form

Dealing with plural forms is hard. Each language has different rules and special cases. To avoid complexity we are going to use plural neutral form.

Instead of saying "11 file were moved" we could say "Files moved: 11".

This is a valid solution in most cases.

Chrome API

Chrome will automatically replace all message placeholders when loading static files (html, js, manifest...) given the current browser UI language.

Scripts may want to use messages from different locales, or to fetch resources and replace message placeholders in them dynamically.

For that we may need:

Structure on disk

There would be a _locales subdirectory under main extension directory.

It would contain N subdirectories named as locale identifiers (sr, en_US, en, en_GB, ...).

top_extension_directory/_locales/locale_identifier/

Each locale_identifier subdirectory can contain only one messages.json file.

Extension manifest has an optional "default_locale": "language_country" field that points to default language. Some edge cases:

Default locale is used as final fallback option if message couldn't be found for current locale.

Use cases

Manifest

Manifest file contains metadata about extension in JSON format.

When loading manifest file, Chrome should replace all MSG_msg_name identifiers with messages from the catalog and then process the final object.

HTML in general

New tab page and possibly some other static content. We currently use google2 template system? which is somewhat an overkill for couple of pages.

We could deliver message catalogs for each locale as part of installation package, and use message placeholders in new tab source.

External resources

All absolute urls (like href, src...) should be pointed to MSG_some_url, and each locale could provide separate implementation (image, script...).

On loading extension files - html, js - Chrome would replace all MSG_some_url with actual, locale specific, url.

Local Resources

Local resources, like <img src="foo/bar.png"> should be auto resolved to _locale/current_locale/foo/bar.png or if that resource is missing to fallback location.