We have a tendency to make the same mistakes again and again. Educational distributions are not an exception.
Copyleft © Juan Rafael Fernández García July 2007.
Multilingualization is not a simple process. Five aspects have to be taken into account:
I. Introduction
II. Locales
III. Fonts
IV. Input (keyboard and Input Methods)
V. Voices
What's the state of m17n in free software? Let's see
The Reason Why (I).
Normally distributions respond
The Reason Why (II): Spotting the Problem
The needs of an educational distribution are different:
The Free Software Solution
The three tasks i18n, l10n and m17n use the gettext technology in the world of free software.
How does a sysadmin cope with them? By means of locales.
(An analysis of /var/lib/dpkg/info/locales.postinst)
The Configuration Mess
Locales can be set system-wide (at /etc/default/locale, or /etc/environment, or /etc/profile, or /etc/bash.bashrc...) ...
... or at a user's level (at ~/.bash_profile, but also at ~/.xsession, or anywhere in ~/.xsession.d/, or the configuration files of Gnome, KDE, XFCE...
... or at application level: LC_ALL=it_IT.UTF-8 date
The cause: historical. The outcome: high chances of problems.
Sort of, only sort of.
The third question is the easiest one, and all modern distributions got it right: UTF-8.
It's not necessary to stress that UTF-8 locales have solved many of the problems that legacy locales created.
The main advantage is that all characters can be displayed independently of the rules (e.g. dictionary order or currency sign) that apply to a single language and or country.
So what's the problem?
How is a new language added?
In recent Ubuntus privileged users can use a widget (provided by language-selector). There's a whole system of dependencies: language-pack-xx brings language-pack-xx-base and recommends language-support-xx, which itself depends on the mozilla, openoffice.org and dictionary packages.
In Debians there used to be language tasks. Now a whole lot of packages have to be searched and added manually.
What if the distro is fixed in a CD or administered remote? Simple - you lose.
Usage of pre-utf8 locales as defaults
/etc/locale.alias = /usr/share/locale/locale.alias is outdated
Copyright (C) 1996-2001,2003 deutsch de_DE.ISO-8859-1 spanish es_ES.ISO-8859-1 ... $ LC_ALL=spanish date s�b jun 30 22:07:41 CEST 2007 $ LC_ALL=es_ES.UTF-8 date sáb jun 30 22:07:55 CEST 2007
Usage of pre-utf8 locales as defaults (cont.)
localeconf's main.py
$Progeny: main.py,v 1.73 2002/06/11 self.language_values = [ "en_US ISO-8859-1", "en_GB ISO-8859-1", "en_CA ISO-8859-1", "fr_FR@euro ISO-8859-15", "fr_CA ISO-8859-1", "de_DE@euro ISO-8859-15", "es_ES@euro ISO-8859-15", "es_MX ISO-8859-1" ]
eurosupport (how to set latin9)
Hardcoded Locales
Hidden Hardcoding
Case One. A user has set a locale in ~/.bashrc and has an .xsession (or something at ~/.xsession.d) file
if [ -f ~/.bashrc ]; then source ~/.bashrc fi
(check your /etc/skel)
Hidden Hardcoding (cont.)
Case Two. Gnome vs. KDE locale selection
In gdm select session KDE and language Italian - it won't work
Why? Because KDE doesn't heed the locale gdm sends it
~/.kde/share/config/kdeglobals [Locale] Country=es Language=es
Localepurge, Our Most Dangerous Ennemy
Localepurge was created to save disk space, and is used heavily
dpkg-reconfigure localepurge creates /etc/locale.nopurge
Even if we reconfigure the languages we choose to save, how do we recover the deleted files unless reinstalling them all?
An UTF-8 locale requires
How can we tell? By using pangrams
LC_ALL=fr_FR.UTF-8 gnome-font-viewer Isabella.ttf
TrueType and OpenType fonts can contain no more than 65,536 glyphs = The UCS has over 1.1 million code points, but only the first 65,536 (the Plane 0: Basic Multilingual Plane, or BMP) have entered into common use.
Again sort of. But are they in your distro? And secondly, did anyone remember to install them?
Free generic Unicode fonts available in Debian Lenny: (see http://en.wikipedia.org/wiki/Unicode_typefaces)
Font | No. characters | No. glyphs |
unifont (bitmap) | 33580 | 33583 |
linux-libertine | 1982 | 1985 |
ttf-bitstream-vera | ||
ttf-dejavu | 3525 | 3611 |
ttf-gentium | 1469 | 1699 |
ttf-freefont | 3914 | 5257 |
ttf-georgewilliams (Monospace, Caslon, Caliban, Cupola) | Caslon: 3684 | Caslon: 3686 |
ttf-junicode | 2235 | 2256 |
ttf-sil-charis (a comprehensive inventory of glyphs needed for almost any Roman- or Cyrillic-based writing system, whether used for phonetic or orthographic needs. In addition, there is provision for other characters and symbols useful to linguists) | 1958 | 3084 |
ttf-sil-doulos | 1958 | 3083 |
Configuring the keyboard
You can exploit /usr/share/X11/xkb/symbols/xx to your profit
key{[ a, A, ae, AE ]}; key {[ o, O, oslash, Oslash ]};
gives us «a A æ Æ o O ø Ø»
Or use the possibility or shifting the keyboard configuration: Desktop - Preferences - Keyboard - Griego politónico
ὸ ὰνθροπος αᾳᾶὰά
e.g. XIM: AltGr + keys
æßðđŋ@ł€¶ŧ«»¢“”nµ
Ctrl+Shift and type in "u" plus the hex number
e.g. SCIM: Ctrl + Space
阿ㄙㄉㄑㄖㄊ
OK, now we know how reading and writing can be implemented. But are they the only ways we interface with language? Don't we speak, don't we listen to people talking?
How does free software do as regards multilingual voice synthesis? (voice recognitions lags way behind)
Again it's no far, but it isn't there yet.
The Tasks Ahead
We need
Some examples
We are going to listen to examples of synthesis created with the versions of festival and espeak available in Debian Testing, using free voices.
Espeak.
espeak -vde -f voice_tests/aleman_utf8.txt -w voice_tests/aleman.wav; \
aplay voice_tests/aleman.wav
espeak -vfr -f voice_tests/frances_utf8.txt -w voice_tests/frances.wav; \
aplay voice_tests/frances.wav
Voice Synthesis with Festival
festival --language italian --tts voice_tests/italiano_latin1.txt
festival --language english --tts voice_tests/ingles.txt
festival --language spanish --tts voice_tests/espanol_latin1.txt
A festival example session
$ festival festival> (voice.list) festival> (voice_Indisys_MP_es_pa_diphone) festival> (intro-spanish) festival> (SayText "hola, amigos") festival> (tts "voice_tests/espanol_latin1.txt" nil) festival> (quit)