By Antonis Garnelis

Transifex @ EuroPython

Greetings from EuroPython 2012, Florence, Italy!

We will participate in the poster sessions tomorrow, Tuesday, and on Thursday.

Come and talk to us!

New payment system and UI polishing

Our UX guy did it again, designing the most minimalistic and beautiful start page we’ve ever had and polishing our service, one bit at a time. No need for screenshots here; you can check out his effort by yourself and maybe show him your love afterwards. =)

In other news, if you haven’t heard of Stripe, you really should. Online payments have never been more convenient neither for clients nor for developers and we couldn’t be happier to announce that Transifex now handles transactions through Stripe:

Add credit card on plan upgrade
Payments made easy
Confirm and complete your payment
Upgrade your plan as easy as it gets

More yet to come..

Upcoming changes in Transifex plans

In a few days we’re launching a new major update of Transifex which we feel pretty excited about. This update brings some changes on the Transifex plans and the way billing is handled. Easier signups, trials and upgrades.

Here’s the nitty gritty:

  • Free Trial period: We’ll now be offering a free 15-day trial period to new users. The trial will be on our Premium plan, allowing the translation of a whopping 150,000 words, unlimited projects, translators and resources, and full access to all features such as Team & TM sharing and Pro file formats.
  • Solo plan at $19: Our Solo plan will now be offered at its standard price of $19/month. All existing Solo users will receive a notification to either enter their credit cards or switch to the Free plan.
  • Brand-new payment engine: Transifex will now be able to handle credit cards directly through the awesomeness that is called Stripe. Existing paying customers will be requested to enter their credit card to migrate their plan to the new engine. (Note that we do not store the CC information itself on our databases for increased security).

Let us know what you think on the comments section, or with an email at support@transifex.com. Detailed information will be announced on the release date. Stay tuned!

PHP Hello l10n

This is a small tutorial on how to internationalize some PHP code and localize it. For simplicity reasons, let’s consider our beloved “Hello world”. Here’s the original script:

index.php:

<?php
echo "Hello, world!";
?>

Its output is pretty straightforward:

Hello, world!

Your aim is to internationalize this little script so that your visitors/clients can enjoy your website in their native language. PHP offers 3 main ways to do so:

  1. PHP Array
  2. PHP DEFINE statements
  3. Gettext

PHP Array

In this first scenario, you need to maintain an associative array per language which will map keys or source strings to localized strings. To display those strings, all you need to do is to select the proper array and get the localized text by using the appropriate key:

locale/en.php:

    <?php
    $LANG = array(
        "hello_world" => "Hello, world!",
    );
    ?>

index.php:

    <?php
    $locale = 'en';

    if (isset($_GET['lang']))
        $locale = $_GET['lang'];
    include('locale/'. $locale . '.php');

    echo $LANG['hello_world'];
    ?>

You can now set the locale by assigning a language code to the ‘lang’ GET parameter when visiting your website e.g. http://l10n.hello.world.org/?lang=en.

As you may have noticed, the default locale is ‘en’, so you don’t need to set the ‘lang’ parameter explicitly to get the english version. It would be great though if you could support a Hindi version too, wouldn’t be?

locale/hi.php:

<?php
$LANG = array(
    "hello_world" => "नमस्ते, दुनिया!",
);
?>

Guess what the output of http://l10n.hello.world.org/?lang=hi will be:

नमस्ते, दुनिया!

PHP Define

Internationalizing your website using the define() method is pretty straight forward, too. You just need to edit the locale files and index.php as shown below:

locale/en.php:

    <?php
    define("hello_world", "Hello, world!");
    ?>

locale/hi.php:

    <?php
    define("hello_world", "नमस्ते, दुनिया!");
    ?>

index.php:

    <?php
    $locale = 'en';

    if (isset$_GET['lang']))
        $locale = $_GET['lang'];
    include('locale/'. $locale . '.php');

    echo hello_world;
    ?>

Gettext

Internalization of PHP with arrays and define statements is pretty simple and straightforward, yet those methods share a major downside: as your website grows, it’s getting harder and harder to update the locale files. There’s no way to know which strings were added and if the strings are present in all the language files.

Gettext is one of the most popular internationalization and localization systems. It works very nicely with PHP as it does with a bunch of other programming languages like C, C++, Python, etc. With gettext, syncing the locale files with changes in the code base is extremely easy.

Let’s internationalize your website once more, using gettext this time.

First, you need to edit index.php as shown below and mark strings to be localized by enclosing them inside _() or gettext().

index.php:

<?php
$locale = 'en';

if (isset($_GET['lang']))
    $locale = $_GET['lang'];

putenv("LANGUAGE=".$locale);
setlocale(LC_ALL, $locale);

$domain = 'messages';
bindtextdomain($domain, "./locale");
textdomain($domain);

//Mark up text for localization
echo _('Hello, world!');
?>

Gettext expects a locale directory where all the translated strings will be kept.

locale/
    en/
        LC_MESSAGES/
            messages.po
            messages.mo
    hi/
        LC_MESSAGES/
            messages.po
            messages.mo

You can extract marked up strings from code in the following way:

$ xgettext -n *.php -o messages.pot

This generates a POT file named messages.pot:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSIONn"
"Report-Msgid-Bugs-To: n"
"POT-Creation-Date: 2012-05-06 23:32+0530n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONEn"
"Last-Translator: FULL NAME n"
"Language-Team: LANGUAGE n"
"Language: n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=CHARSETn"
"Content-Transfer-Encoding: 8bitn"

#: index.php:12
msgid "Hello, world!"
msgstr ""

At the bare minimum, you need to to specify the charset in the messages.po files to compile them successfully. Set it to “UTF-8”, then generate translation files from messages.pot as follows:

msginit -l en -o locale/en/LC_MESSAGES/messages.po -i messages.pot
msginit -l hi -o locale/hi/LC_MESSAGES/messages.po -i messages.pot

The PO file in the source language, i.e, English (“en”) does not need to be translated. In this case, you translate the PO file for Hindi (“hi”) only. After the translation is done, the PO files must be compiled using msgfmt to generate messages.mo files which are used to show the localized text in your website.

$ msgfmt locale/en/LC_MESSAGES/messages.po
$ msgfmt locale/hi/LC_MESSAGES/messages.po

As expected, when we visit http://l10n.hello.world.org/?lang=hi we see:

नमस्ते, दुनिया!

Many people have the opinion that using Gettext for localization is slow compared to localization using PHP arrays and PHP define statements. But, since Apache caches the localization data, the difference in speed is not that big. It finally comes down to a matter of personal taste.

You can find out more details on using gettext with PHP here.

Localization gotchas

That was a simple application with a single piece of text translated to a single language. Keep in mind though that there is an extremely high probability the framework you use to build your website provides one of the mentioned localization mechanisms. The real problem arises when the number of strings grow and you have to provide translated content to a larger number of languages. Then, it’s getting really hard to

  • maintain the locale files by hand,
  • hand them over to translators,
  • get them back from each translator, and
  • deploy.

Localization shouldn’t be that hard and Transifex has helped lots of project maintainers see their work getting easily localized and being accepted by a much wider user base. So, what are you waiting for? =)

Faster system tests in Django

There are countless posts out there evangelizing the importance of testing in
the development process. This is not one of those posts. Just to make sure we
are all on the same page though, as a team we strongly believe you should first
write your tests, then (re)write the actual code again and again, until all tests
pass and finally enjoy a (more) peaceful night. If you don’t do so, you’d better have
Jack Sparrow‘s improvisation skills and love caffeine.

Now, time to get technical. Here’s how we managed to speed up our test suite by a 3x factor.

System tests

We are not talking about “Unit test vs System test” here. Unit tests are fast, granular and localized. They should be used to test as much code as possible. However, they are not a replacement for system tests or integration tests and vice versa. We need system tests to ensure that the separate units fit together nicely to make the entire application work. Since, system tests tend to be slower, their count should be very low compared to unit tests. A reasonable ratio between unit and system tests would be 9:1.

I feel we are being too harsh on system tests, ain’t we? Wouldn’t it be wonderful if you could make system tests faster? The faster the better. Let’s see how we did it in Transifex.

Test setup in Transifex

  • Simple test cases subclass from transifex.txcommon.tests.base.BaseTestCase, a subclass of django.test.TestCase and other helper classes.
  • BaseTestCase is responsible for loading fixtures and setting up test data like sample projects, resources, permissions, user, clients, etc.
  • A test case contains only related test methods.
  • Fixture based.
  • Very few instances of TransactionTestCase, most of them are subclasses of TestCase.
  • Most tests subclass from a transifex.txcommon.tests.base.BaseTestCase (a subclass of TestCase) to load fixtures and setup initial data (like users, projects, resources, teams, etc.) needed by most tests in Transifex.

The way Django runs instances of TestCase

  • Load fixtures (if any) for each test method
  • Setup url map, test outbox and test client
  • Set up initial data for test method in setUp() method.
  • Run test method
  • Rollback changes made in database if database (like postgresql) supports rollback, else truncate tables (in case of MySQL like databases).
  • Reset url map, fixtures, test outbox and test client

Causes of concern

  1. Setting up initial test data for each test method of a test case can add a lot of overhead if there’s a lot of initialization done in the setUp method of the test case (as in case of our test cases subclassed from BaseTestCase).
  2. That overhead gets even worse if there are fixtures included in the test case. Django loads them for each test method. Loading fixtures has a considerable overhead and makes the test suite a lot less maintainable. Small changes in model will break fixture importing.

You may be thinking that “Why the hell do I need to setup a lot of data for each test? I can just setup what data I need.”

Yes, you are correct in that. [1] has got a lot of latency the usual way. But there are other things to consider too. It helps a developer spend less time setting up the world during writing a test. It’s an overkill to setup the world for each test case separately. Also, it leads to redundancy of setup code. About fixtures, we plan to get rid of them in due course of time.

It seems like it’s trade off between the ease of writing tests and test speed. Well, we are kind of greedy in these cases and want to have both 😀

All we needed was to find a way to do away with the latency of setting up the world for the BaseTestCase.

What did we need?

  • Load fixtures once during a run of the entire test suite
  • Setup initial test data once every test case (subclass of BaseTestCase or TestCase)
  • Initial test data setup should do database write as minimum as possible

Solution

  1. Load fixtures in the test runner to ensure that this process runs once for the entire test suite run.

    class TxTestSuiteRunner(DjangoTestSuiteRunner):
    def setup_databases(self, **kwargs):
    return_val = super(TxTestSuiteRunner, self).setup_databases(
    **kwargs)
    databases = connections
    for db in databases:
    management.call_command(‘loaddata’, *fixtures,
    **{‘verbosity’: 0, ‘database’: db})
    return return_val

  2. Initialize test data in setUpClass method of BaseTestCase. Data setup insetUpClasswill be persistent throughout the run of the entire test case. Until and unless required, data initialization insetUp()method of a test case can be skipped. For a simpleTestCase“, Django anyways rolls back all changes done within a test method._
  3. Set up code uses Model.objects.get_or_create() method to fetch/initialize data to minimize database write
  4. Rolling back transactions or truncating tables resets the data before running a test method. But how to reset the variables initialized in setUpClass method? Well, in setUp() method, we copy the class wide variables using copy.copy() to some temporary variables. The test method works with these temporary variables. This leaves the original class wide variables intact.

    from copy import copy
    class BaseTestCase(Languages, NoticeTypes, Translations, TestCase):
    @classmethod
    def setUpClass(cls):
    super(BaseTestCase, cls).setUpClass(cls)
    # Only showing a code snippet…

        # Create teams
        cls._team = Team.objects.get_or_create(language=cls._language,
            project=cls._project, creator=cls._user['maintainer'])[0]
        cls._team_private = Team.objects.get_or_create(
            language=cls._language, project=cls._project_private,
            creator=cls._user['maintainer'])[0]
    
        # ...
    
    def setUp(self):
        super(BaseTestCase, self).setUp(self)
        # Only copy test case wide variables
        # to temporary ones to work with in a
        # test method.
    
        # Only showing a code snippet...
    
        # test method operate on self.team instead of self._team
        # and similarly for other variables too
        self.team = copy(self._team)
        self.team_private = copy(self._team_private)
    
        # ...
    
  5. Don’t set url map, fixtures in _pre_setup() or reset url map, fixtures in _post_teardown method. This needs a bit of tweaking in the _pre_setup() and _post_teardown() methods inherited from django.test.TestCase

    class BaseTestCase(Languages, NoticeTypes, Translations, TestCase):
    # Only showing a code snippet…

    def _pre_setup(self):
        if not connections_support_transactions():
            # truncate tables, load initial date
            # in case database does not support
            # transactions. Hence, no optimization
            # in such cases.
            fixtures = ["sample_users", "sample_site",
                           "sample_languages", "sample_data"]
            if getattr(self, 'multi_db', False):
                databases = connections
            else:
                databases = [DEFAULT_DB_ALIAS]
            for db in databases:
                call_command('flush', verbosity=0, interactive=False,
                              database=db)
                call_command('loaddata', *fixtures, **{'verbosity': 0,
                             'database': db})
    
        else:
            # Optimization achieved if database
            # supports transactions
            if getattr(self, 'multi_db', False):
                databases = connections
            else:
                databases = [DEFAULT_DB_ALIAS]
    
            for db in databases:
                transaction.enter_transaction_management(using=db)
                transaction.managed(True, using=db)
            disable_transaction_methods()
        mail.outbox = []
    
    def _post_teardown(self):
        if connections_support_transactions():
            # If the test case has a multi_db=True flag, teardown all
            # databases. Otherwise, just teardown default.
            if getattr(self, 'multi_db', False):
                databases = connections
            else:
                databases = [DEFAULT_DB_ALIAS]
    
            restore_transaction_methods()
            for db in databases:
                transaction.rollback(using=db)
                transaction.leave_transaction_management(using=db)
        for connection in connections.all():
            connection.close()
    

Results

The results were quite satisfying. With the custom test runner and the new test suite, tests got around 2-3 times faster. The new test suite’s speed up factor is proportional to the number of test methods in a test case when compared to its older counterpart. The new test suite, although not yet perfect , is working quite well. As kbairak said here:

holy shit! @rtnpro ‘s modifications make @transifex ‘s test-suite run like a hamster on coffee !!!

The Hub and Child project types

Depending on the type of your project, you can use Transifex in many ways to get the best workflow. Very often companies have many products that are handled under a single umbrella, what we call a ‘Translation Hub’ on Transifex. The main components of a hub are usually the human resources and the release process, and child projects re-use these elements from the parent project.

Let’s take the Fedora Project for example. The Fedora project on Transifex is a hub that hosts the community’s resources, such as the people involved in the translation and the release process.

Hub projects structure

The maintainers of the child projects, like Anaconda and Firstboot, have full control of their projects and can update their translation resources as needed. The people working on the translations really belong to the hub. Ideally these resources could follow the hub’s release cycle and get shipped under specific release versions (F16, F17, devel). This will help with having more control of what’s necessary to get translated and at each period of time.

So, basically a hub on Transifex is a project that holds the logistics of access control, usually behind structured language teams, and makes it available to child projects. Now, the question is:

How can I actually set this on Transifex?

I would say it’s dead simple. If you maintain a project on Transifex, you probably already saw that your project can be categorized under 3 types:

  • Typical: A typical standalone project. It has its own access control rules and no other project.
  • Hub: A project set as a Hub will aggregate information from other projects. The language table will include the translations of all its child projects.
  • Child: Projects which re-use the translation teams of a hub project.

Just a couple of check boxes! Straight forward, right? Here are some more information which can help:

  • You can only outsource access to a Hub — outsourcing access to a Typical project is not allowed – Kinda obvious, but worth mentioning.
  • A Hub can’t outsource its access to another hub.
  • Outsourcing team control to a Hub needs to be approved by one of the Hub maintainers, unless both are maintained by the same user.
  • Hubs can have their own sub-domains like https://fedora.transifex.com and https://opentranslators.transifex.com. Get in touch with us if you want to set one.

Vrachnis is a Transifexian

Ilias ‘vrachil’ Vrachnis joins the Transifex team as a Systems and Security Engineer. A long-term sysadmin monkey, Ilias was managing his university’s most critical servers prior to joining our team. Ilias will be working on making sure Transifex’s availability is top-notch and will be responsible for our dev team support services such as build servers and continuous integration systems.

You can follow Ilias on Twitter and Google+.

PS: Yup, we also love GitHub‘s introductions of new team members. 😉

The life cycle of a translation resource

One of the core features Transifex provides is handling files with translatable content (resources) in various localization formats, like XML or PO files.

Part of that functionality is to be able to import such files to its internal storage and export them, whenever the user requests them, either to ship them with his software or to translate the file to another language with his local computer. Although both operations might use some customized code to handle certain formats (especially for importing resources), there are specific steps that are followed in each case.

Importing a file

Whenever you upload a file with strings in it, Transifex will try to parse it, extract the necessary information and then store that information in the database.

Parsing

Since each format is different, there are specialized parsers for each
one. In some cases, Transifex uses a third-party parser, like
polib for PO files. In other
cases, we have developed custom parsers.

Extracting the information

The main responsibility of a parser is to extract the necessary
information from the imported file.

In case the file is the source file (that is, it is the file with
the strings in the source language), we are interested in three
things:

  • The keys for the translatable strings (like the msgid entries in
    a PO file). The keys are used to uniquely match the strings in the
    source language with those in translations. We also generate a
    unique hash for each key as an identifier.
  • The translatable strings in the source language, if there are any
    (like the msgstr entries in PO file). These are the actual strings
    of the source language.
  • The template of the file. The template is a skeleton of the source
    file: it is mostly the same, except that the translatable
    strings have been replaced with the hashes of the corresponding
    keys, acting as placeholders. This is necessary for the export
    operation.

In case the file is a translation of the resource in a language, we
are only interested in the translations (this means that any changes
in the file are ignored
).

Storing

As soon as we have the necessary information from the previous step,
we store it in the database as source entities, translations and
templates.

Exporting a file

Whenever a user asks to download a translation file in a particular
language, the file has to be exported from the database.

The procedure is quite standard for all formats. After fetching the template and the
translation strings in the requested language, we do a
search-&-replace in the template, replacing the hashes in it with the
actual strings that correspond to each hash.
Next, any format-specific operations are performed (like adding the
translator copyrights in PO files) and the result is delivered to the
user.

You can find more details for the storage engine of Transifex in the
docs.

Switching to Gravatar

One of the personalization features Transifex offers is the support for avatars; each user is able to associate a small picture with his account, which makes it easier for other users to identify him.

Currently, there are two ways that avatars are supported: by uploading your own image when editing your profile or by using Gravatar.

However, on Thursday, April 5th, we will drop support for user-uploaded avatars and switch completely to Gravatar.

Our goal behind this decision is to make things as simple and easy as possible. We think Gravatar is a very good service; it does one thing and does it well: providing a web-friendly, globally recognized avatar for you across all the websites you visit. So, we feel that there is no point in serving custom avatars for our users anymore.

Setting a Gravatar

If you do not already have one, here is how you can set your Gravatar:

  • Go to the Gravatar signup form at https://en.gravatar.com/site/signup and enter the e-mail address you use on Transifex.
  • If you have registered in the past, a red box will tell you so. Otherwise, continue with your registration.
  • After you have activated your account, you will be able to upload an image from your computer or a URL.

If you do not want to get into that — and that is totally cool with us — Gravatar will render an Identicon for you as a fallback (this is what happens right now as well). An Identicon is a visual representation of your IP address, a digital fingerprint.