PHP Hello l10n
This is a small tutorial on how to internationalize some PHP code and localize it. For simplicity reasons, let’s consider our beloved “Hello world”. Here’s the original script:
<?php echo "Hello, world!"; ?>
Its output is pretty straightforward:
Your aim is to internationalize this little script so that your visitors/clients can enjoy your website in their native language. PHP offers 3 main ways to do so:
- PHP Array
- PHP DEFINE statements
In this first scenario, you need to maintain an associative array per language which will map keys or source strings to localized strings. To display those strings, all you need to do is to select the proper array and get the localized text by using the appropriate key:
<?php $LANG = array( "hello_world" => "Hello, world!", ); ?>
<?php $locale = 'en'; if (isset($_GET['lang'])) $locale = $_GET['lang']; include('locale/'. $locale . '.php'); echo $LANG['hello_world']; ?>
You can now set the locale by assigning a language code to the ‘lang’ GET parameter when visiting your website e.g. http://l10n.hello.world.org/?lang=en.
As you may have noticed, the default locale is ‘en’, so you don’t need to set the ‘lang’ parameter explicitly to get the english version. It would be great though if you could support a Hindi version too, wouldn’t be?
<?php $LANG = array( "hello_world" => "नमस्ते, दुनिया!", ); ?>
Guess what the output of http://l10n.hello.world.org/?lang=hi will be:
Internationalizing your website using the
define() method is pretty straight forward, too. You just need to edit the locale files and
index.php as shown below:
<?php define("hello_world", "Hello, world!"); ?>
<?php define("hello_world", "नमस्ते, दुनिया!"); ?>
<?php $locale = 'en'; if (isset$_GET['lang'])) $locale = $_GET['lang']; include('locale/'. $locale . '.php'); echo hello_world; ?>
Internalization of PHP with arrays and define statements is pretty simple and straightforward, yet those methods share a major downside: as your website grows, it’s getting harder and harder to update the locale files. There’s no way to know which strings were added and if the strings are present in all the language files.
Gettext is one of the most popular internationalization and localization systems. It works very nicely with PHP as it does with a bunch of other programming languages like C, C++, Python, etc. With gettext, syncing the locale files with changes in the code base is extremely easy.
Let’s internationalize your website once more, using gettext this time.
First, you need to edit
index.php as shown below and mark strings to be localized by enclosing them inside
<?php $locale = 'en'; if (isset($_GET['lang'])) $locale = $_GET['lang']; putenv("LANGUAGE=".$locale); setlocale(LC_ALL, $locale); $domain = 'messages'; bindtextdomain($domain, "./locale"); textdomain($domain); //Mark up text for localization echo _('Hello, world!'); ?>
Gettext expects a locale directory where all the translated strings will be kept.
locale/ en/ LC_MESSAGES/ messages.po messages.mo hi/ LC_MESSAGES/ messages.po messages.mo
You can extract marked up strings from code in the following way:
$ xgettext -n *.php -o messages.pot
This generates a POT file named
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR , YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSIONn" "Report-Msgid-Bugs-To: n" "POT-Creation-Date: 2012-05-06 23:32+0530n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONEn" "Last-Translator: FULL NAME n" "Language-Team: LANGUAGE n" "Language: n" "MIME-Version: 1.0n" "Content-Type: text/plain; charset=CHARSETn" "Content-Transfer-Encoding: 8bitn" #: index.php:12 msgid "Hello, world!" msgstr ""
At the bare minimum, you need to to specify the charset in the messages.po files to compile them successfully. Set it to “UTF-8”, then generate translation files from messages.pot as follows:
msginit -l en -o locale/en/LC_MESSAGES/messages.po -i messages.pot msginit -l hi -o locale/hi/LC_MESSAGES/messages.po -i messages.pot
The PO file in the source language, i.e, English (“en”) does not need to be translated. In this case, you translate the PO file for Hindi (“hi”) only. After the translation is done, the PO files must be compiled using msgfmt to generate messages.mo files which are used to show the localized text in your website.
$ msgfmt locale/en/LC_MESSAGES/messages.po $ msgfmt locale/hi/LC_MESSAGES/messages.po
As expected, when we visit http://l10n.hello.world.org/?lang=hi we see:
Many people have the opinion that using Gettext for localization is slow compared to localization using PHP arrays and PHP define statements. But, since Apache caches the localization data, the difference in speed is not that big. It finally comes down to a matter of personal taste.
You can find out more details on using gettext with PHP here.
That was a simple application with a single piece of text translated to a single language. Keep in mind though that there is an extremely high probability the framework you use to build your website provides one of the mentioned localization mechanisms. The real problem arises when the number of strings grow and you have to provide translated content to a larger number of languages. Then, it’s getting really hard to
- maintain the locale files by hand,
- hand them over to translators,
- get them back from each translator, and
Localization shouldn’t be that hard and Transifex has helped lots of project maintainers see their work getting easily localized and being accepted by a much wider user base. So, what are you waiting for? =)