On my Windows box, locale
command outputs the following:
LANG=ru_RU
LC_CTYPE="ru_RU"
LC_NUMERIC="ru_RU"
LC_TIME="ru_RU"
LC_COLLATE="ru_RU"
LC_MONETARY="ru_RU"
LC_MESSAGES="ru_RU"
LC_ALL=
This is perfectly fine, except that "no charset" in the locale output means "ISO charset", which is ISO-8859-5
for Russian/Russia and has never been used (historically, DOS used CP866
, Windows used CP1251
ANSI codepade, and various Unices sticked to KOI8-R
before the rise of Unicode era).
The above is consistent with locale charmap
output, which is again ISO-8859-5
.
Short C example also confirms ISO-8859-5
is used:
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>
int main() {
const char *locale = setlocale(LC_ALL, "");
const char *codeset = nl_langinfo(CODESET);
printf("locale: %s\n", locale);
printf("codeset: %s\n", codeset);
return 0;
}
outputs
locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
codeset: ISO-8859-5
Cygwin docs state that
Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.
which is plain wrong (Windows ANSI codepage is CP1251
!). Surprisingly, for Belarusian (Eastern Slavic language very close to Russian) be_BY
locale the default charset is indeed CP1251
which is in accordance with both the documentation and common sense.
Is this a bug in Cygwin, or am I missing something here?