Quantcast
Viewing latest article 7
Browse Latest Browse All 35

Cygwin locale for Russian/Russia

On my Windows box, locale command outputs the following:

LANG=ru_RU
LC_CTYPE="ru_RU"
LC_NUMERIC="ru_RU"
LC_TIME="ru_RU"
LC_COLLATE="ru_RU"
LC_MONETARY="ru_RU"
LC_MESSAGES="ru_RU"
LC_ALL=

This is perfectly fine, except that "no charset" in the locale output means "ISO charset", which is ISO-8859-5 for Russian/Russia and has never been used (historically, DOS used CP866, Windows used CP1251 ANSI codepade, and various Unices sticked to KOI8-R before the rise of Unicode era).

The above is consistent with locale charmap output, which is again ISO-8859-5.

Short C example also confirms ISO-8859-5 is used:

#include <stdio.h>

#include <locale.h>
#include <langinfo.h>

int main() {
    const char *locale = setlocale(LC_ALL, "");
    const char *codeset = nl_langinfo(CODESET);
    printf("locale: %s\n", locale);
    printf("codeset: %s\n", codeset);

    return 0;
}

outputs

locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
codeset: ISO-8859-5

Cygwin docs state that

Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.

which is plain wrong (Windows ANSI codepage is CP1251!). Surprisingly, for Belarusian (Eastern Slavic language very close to Russian) be_BY locale the default charset is indeed CP1251 which is in accordance with both the documentation and common sense.

Is this a bug in Cygwin, or am I missing something here?


Viewing latest article 7
Browse Latest Browse All 35

Trending Articles