6. How to translate documentation and resources?

The problem we had forgotten: what about the rest of text formats?

Solution 1.- Convert to PO files

Docbook XML to PO

We all know that documentation is written in a lot of different formats. Maybe we agree that proprietary formats are unacceptable, and know that binary formats are inconvenient. But which one to use if we can choose?

Nowadays KDE, Gnome, LTDP, Sun's documentation is written in Docbook XML. Why?

Advantages of Docbook:

  • Separation of content and the presentation layer.

    The former sentence was marked up like this

       
       <para>
         Separation of <phrase
         role="example-content">content</phrase> and the
         <phrase role="example-presentation">presentation
           layer</phrase>. 
       </para>
       
    		  

    It was the cascade stylesheet that determined the way the examples are displayed

       
       span.example-content {
         font: 20pt/24pt "Comic Sans MS", penguinattack, fantasy;
         color: blue;
       }
    
       span.example-presentation {
         color: red;
         font-weight: bold;
         font-variant: small-caps;
       }
       
    		  

  • High quality semantic mark-up

  • The validity and correctness of the document can be checked

  • Mature tools to transform to (x)html and pdf

  • Metainformation: encoding, author… (here's a mapping with Dublin Core)

So we have decided to use Docbook for our documentation, but do we have tools to translate it? Well, we have several converters of Docbook XML to PO format:

  1. KDE's poxml

  2. Danilo Segan's xml2po (part of gnome-doc-utils)

General converters

But what about the rest of document formats? What about resources?

  1. OmegaT

    Formats supported: OpenOffice, html, txt to TMX

    [Note] 

    Of course OmegaT is not a converter, but a full-fledged translation editor which is becoming quite mature at the moment.

  2. Denis Barbier and Martin Quinson's po4a (PO for Anything, there's a Debian package): The project goal is to ease translations (and more interestingly, the maintenance of translations) using gettext tools on areas where they were not expected like documentation.

    Formats supported: to and from man's nroff, pod, sgml, xml, docbook, dia, latex, html…

    An example of usage may help us. For Freeduc-doc (LaTeX files) I created a shell script tex2po which included

       TEXINPUTS=/usr/share/texmf/tex/latex/base \
       po4a-gettextize -f latex \
       -m $1 -M iso-8859-1 \
       -p $file.pot \
       -o exclude_include=content-es:application-es
    		

  3. Translate Toolkit

    Formats supported: oo2po, moz2po, csv2po, ts2po, txt2po, html2po, xliff2po, tiki2po

[Warning] Problems?

  • all the problems with PO

  • we had the mark-up and we ignore it