NAME
NLS — 
Native Language Support
  Overview
DESCRIPTION
Native Language Support (NLS) provides commands for a single worldwide operating
  system base. An internationalized system has no built-in assumptions or
  dependencies on language-specific or cultural-specific conventions such as:
  - Character
    classifications
- Character comparison
    rules
- Character collation
    order
- Numeric and monetary
      formatting
- Date and time
    formatting
- Message-text language
- Character sets
All information pertaining to cultural conventions and language is obtained at
  program run time.
“Internationalization” (often abbreviated “i18n”) refers
  to the operation by which system software is developed to support multiple
  cultural-specific and language-specific conventions. This is a generalization
  process by which the system is untied from calling only English strings or
  other English-specific conventions. “Localization” (often
  abbreviated “l10n”) refers to the operations by which the user
  environment is customized to handle its input and output appropriate for
  specific language and cultural conventions. This is a specialization process,
  by which generic methods already implemented in an internationalized system
  are used in specific ways. The formal description of cultural conventions for
  some country, together with all associated translations targeted to the native
  language, is called the “locale”.
NetBSD provides extensive support to programmers and
  system developers to enable internationalized software to be developed.
  
NetBSD also supplies a large variety of locales for
  system localization.
All locale information is accessible to programs at run time so that data is
  processed and displayed correctly for specific cultural conventions and
  language.
A locale is divided into categories. A category is a group of language-specific
  and culture-specific conventions as outlined in the list above. ISO C
  specifies the following six standard categories supported by
  
NetBSD:
  - LC_COLLATE
- string-collation order information
- LC_CTYPE
- character classification, case conversion, and other
      character attributes
- LC_MESSAGES
- the format for affirmative and negative responses
- LC_MONETARY
- rules and symbols for formatting monetary numeric
      information
- LC_NUMERIC
- rules and symbols for formatting nonmonetary numeric
      information
- LC_TIME
- rules and symbols for formatting time and date
    information
Localization of the system is achieved by setting appropriate values in
  environment variables to identify which locale should be used. The environment
  variables have the same names as their respective locale categories.
  Additionally, the 
LANG,
  
LC_ALL, and 
NLSPATH
  environment variables are used. The 
NLSPATH
  environment variable specifies a colon-separated list of directory names where
  the message catalog files of the NLS database are located. The
  
LC_ALL and 
LANG environment
  variables also determine the current locale.
The values of these environment variables contains a string format as:
	language[_territory][.codeset][@modifier]
 
Valid values for the language field come from the ISO639 standard which defines
  two-character codes for many languages. Some common language codes are:
  
    
    
    
  
  
    | Language Name | Code | Language Family | 
  
    | ABKHAZIAN | AB | IBERO-CAUCASIAN | 
  
    | AFAN (OROMO) | OM | HAMITIC | 
  
    | AFAR | AA | HAMITIC | 
  
    | AFRIKAANS | AF | GERMANIC | 
  
    | ALBANIAN | SQ | INDO-EUROPEAN (OTHER) | 
  
    | AMHARIC | AM | SEMITIC | 
  
    | ARABIC | AR | SEMITIC | 
  
    | ARMENIAN | HY | INDO-EUROPEAN (OTHER) | 
  
    | ASSAMESE | AS | INDIAN | 
  
    | AYMARA | AY | AMERINDIAN | 
  
    | AZERBAIJANI | AZ | TURKIC/ALTAIC | 
  
    | BASHKIR | BA | TURKIC/ALTAIC | 
  
    | BASQUE | EU | BASQUE | 
  
    | BENGALI | BN | INDIAN | 
  
    | BHUTANI | DZ | ASIAN | 
  
    | BIHARI | BH | INDIAN | 
  
    | BISLAMA | BI |  | 
  
    | BRETON | BR | CELTIC | 
  
    | BULGARIAN | BG | SLAVIC | 
  
    | BURMESE | MY | ASIAN | 
  
    | BYELORUSSIAN | BE | SLAVIC | 
  
    | CAMBODIAN | KM | ASIAN | 
  
    | CATALAN | CA | ROMANCE | 
  
    | CHINESE | ZH | ASIAN | 
  
    | CORSICAN | CO | ROMANCE | 
  
    | CROATIAN | HR | SLAVIC | 
  
    | CZECH | CS | SLAVIC | 
  
    | DANISH | DA | GERMANIC | 
  
    | DUTCH | NL | GERMANIC | 
  
    | ENGLISH | EN | GERMANIC | 
  
    | ESPERANTO | EO | INTERNATIONAL AUX. | 
  
    | ESTONIAN | ET | FINNO-UGRIC | 
  
    | FAROESE | FO | GERMANIC | 
  
    | FIJI | FJ | OCEANIC/INDONESIAN | 
  
    | FINNISH | FI | FINNO-UGRIC | 
  
    | FRENCH | FR | ROMANCE | 
  
    | FRISIAN | FY | GERMANIC | 
  
    | GALICIAN | GL | ROMANCE | 
  
    | GEORGIAN | KA | IBERO-CAUCASIAN | 
  
    | GERMAN | DE | GERMANIC | 
  
    | GREEK | EL | LATIN/GREEK | 
  
    | GREENLANDIC | KL | ESKIMO | 
  
    | GUARANI | GN | AMERINDIAN | 
  
    | GUJARATI | GU | INDIAN | 
  
    | HAUSA | HA | NEGRO-AFRICAN | 
  
    | HEBREW | HE | SEMITIC | 
  
    | HINDI | HI | INDIAN | 
  
    | HUNGARIAN | HU | FINNO-UGRIC | 
  
    | ICELANDIC | IS | GERMANIC | 
  
    | INDONESIAN | ID | OCEANIC/INDONESIAN | 
  
    | INTERLINGUA | IA | INTERNATIONAL AUX. | 
  
    | INTERLINGUE | IE | INTERNATIONAL AUX. | 
  
    | INUKTITUT | IU |  | 
  
    | INUPIAK | IK | ESKIMO | 
  
    | IRISH | GA | CELTIC | 
  
    | ITALIAN | IT | ROMANCE | 
  
    | JAPANESE | JA | ASIAN | 
  
    | JAVANESE | JV | OCEANIC/INDONESIAN | 
  
    | KANNADA | KN | DRAVIDIAN | 
  
    | KASHMIRI | KS | INDIAN | 
  
    | KAZAKH | KK | TURKIC/ALTAIC | 
  
    | KINYARWANDA | RW | NEGRO-AFRICAN | 
  
    | KIRGHIZ | KY | TURKIC/ALTAIC | 
  
    | KURUNDI | RN | NEGRO-AFRICAN | 
  
    | KOREAN | KO | ASIAN | 
  
    | KURDISH | KU | IRANIAN | 
  
    | LAOTHIAN | LO | ASIAN | 
  
    | LATIN | LA | LATIN/GREEK | 
  
    | LATVIAN | LV | BALTIC | 
  
    | LINGALA | LN | NEGRO-AFRICAN | 
  
    | LITHUANIAN | LT | BALTIC | 
  
    | MACEDONIAN | MK | SLAVIC | 
  
    | MALAGASY | MG | OCEANIC/INDONESIAN | 
  
    | MALAY | MS | OCEANIC/INDONESIAN | 
  
    | MALAYALAM | ML | DRAVIDIAN | 
  
    | MALTESE | MT | SEMITIC | 
  
    | MAORI | MI | OCEANIC/INDONESIAN | 
  
    | MARATHI | MR | INDIAN | 
  
    | MOLDAVIAN | MO | ROMANCE | 
  
    | MONGOLIAN | MN |  | 
  
    | NAURU | NA |  | 
  
    | NEPALI | NE | INDIAN | 
  
    | NORWEGIAN | NO | GERMANIC | 
  
    | OCCITAN | OC | ROMANCE | 
  
    | ORIYA | OR | INDIAN | 
  
    | PASHTO | PS | IRANIAN | 
  
    | PERSIAN (farsi) | FA | IRANIAN | 
  
    | POLISH | PL | SLAVIC | 
  
    | PORTUGUESE | PT | ROMANCE | 
  
    | PUNJABI | PA | INDIAN | 
  
    | QUECHUA | QU | AMERINDIAN | 
  
    | RHAETO-ROMANCE | RM | ROMANCE | 
  
    | ROMANIAN | RO | ROMANCE | 
  
    | RUSSIAN | RU | SLAVIC | 
  
    | SAMOAN | SM | OCEANIC/INDONESIAN | 
  
    | SANGHO | SG | NEGRO-AFRICAN | 
  
    | SANSKRIT | SA | INDIAN | 
  
    | SCOTS GAELIC | GD | CELTIC | 
  
    | SERBIAN | SR | SLAVIC | 
  
    | SERBO-CROATIAN | SH | SLAVIC | 
  
    | SESOTHO | ST | NEGRO-AFRICAN | 
  
    | SETSWANA | TN | NEGRO-AFRICAN | 
  
    | SHONA | SN | NEGRO-AFRICAN | 
  
    | SINDHI | SD | INDIAN | 
  
    | SINGHALESE | SI | INDIAN | 
  
    | SISWATI | SS | NEGRO-AFRICAN | 
  
    | SLOVAK | SK | SLAVIC | 
  
    | SLOVENIAN | SL | SLAVIC | 
  
    | SOMALI | SO | HAMITIC | 
  
    | SPANISH | ES | ROMANCE | 
  
    | SUNDANESE | SU | OCEANIC/INDONESIAN | 
  
    | SWAHILI | SW | NEGRO-AFRICAN | 
  
    | SWEDISH | SV | GERMANIC | 
  
    | TAGALOG | TL | OCEANIC/INDONESIAN | 
  
    | TAJIK | TG | IRANIAN | 
  
    | TAMIL | TA | DRAVIDIAN | 
  
    | TATAR | TT | TURKIC/ALTAIC | 
  
    | TELUGU | TE | DRAVIDIAN | 
  
    | THAI | TH | ASIAN | 
  
    | TIBETAN | BO | ASIAN | 
  
    | TIGRINYA | TI | SEMITIC | 
  
    | TONGA | TO | OCEANIC/INDONESIAN | 
  
    | TSONGA | TS | NEGRO-AFRICAN | 
  
    | TURKISH | TR | TURKIC/ALTAIC | 
  
    | TURKMEN | TK | TURKIC/ALTAIC | 
  
    | TWI | TW | NEGRO-AFRICAN | 
  
    | UIGUR | UG |  | 
  
    | UKRAINIAN | UK | SLAVIC | 
  
    | URDU | UR | INDIAN | 
  
    | UZBEK | UZ | TURKIC/ALTAIC | 
  
    | VIETNAMESE | VI | ASIAN | 
  
    | VOLAPUK | VO | INTERNATIONAL AUX. | 
  
    | WELSH | CY | CELTIC | 
  
    | WOLOF | WO | NEGRO-AFRICAN | 
  
    | XHOSA | XH | NEGRO-AFRICAN | 
  
    | YIDDISH | YI | GERMANIC | 
  
    | YORUBA | YO | NEGRO-AFRICAN | 
  
    | ZHUANG | ZA |  | 
  
    | ZULU | ZU | NEGRO-AFRICAN | 
For example, the locale for the Danish language spoken in Denmark using the ISO
  8859-1 character set is da_DK.ISO8859-1. The da stands for the Danish language
  and the DK stands for Denmark. The short form of da_DK is sufficient to
  indicate this locale.
The environment variable settings are queried by their priority level in the
  following manner:
  - If the LC_ALLenvironment
      variable is set, all six categories use the locale it specifies.
- If the LC_ALLenvironment
      variable is not set, each individual category uses the locale specified by
      its corresponding environment variable.
- If the LC_ALLenvironment
      variable is not set, and a value for a particularLC_*environment variable is not set, the value of
      theLANGenvironment variable specifies the
      default locale for all categories. Only theLANGenvironment variable should be set in /etc/profile, since it makes it most
      easy for the user to override the system default using the individualLC_*variables.
- If the LC_ALLenvironment
      variable is not set, a value for a particularLC_*environment variable is not set, and the value of theLANGenvironment variable is not set, the locale
      for that specific category defaults to the C locale. The C or POSIX locale
      assumes the ASCII character set and defines information for the six
      categories.
Character Sets
A character is any symbol used for the organization, control, or representation
  of data. A group of such symbols used to describe a particular language make
  up a character set. It is the encoding values in a character set that provide
  the interface between the system and its input and output devices.
The following character sets are supported in 
NetBSD:
  -  
-  
- ASCII
- The American Standard Code for Information Exchange (ASCII)
      standard specifies 128 Roman characters and control codes, encoded in a
      7-bit character encoding scheme.
-  
-  
- ISO 8859 family
- Industry-standard character sets specified by the ISO/IEC
      8859 standard. The standard is divided into 15 numbered parts, with each
      part specifying broad script similarities. Examples include Western
      European, Central European, Arabic, Cyrillic, Hebrew, Greek, and Turkish.
      The character sets use an 8-bit character encoding scheme which is
      compatible with the ASCII character set.
-  
-  
- Unicode
- The Unicode character set is the full set of known abstract
      characters of all real-world scripts. It can be used in environments where
      multiple scripts must be processed simultaneously. Unicode is compatible
      with ISO 8859-1 (Western European) and ASCII. Many character encoding
      schemes are available for Unicode, including UTF-8, UTF-16 and UTF-32.
      These encoding schemes are multi-byte encodings. The UTF-8 encoding scheme
      uses 8-bit, variable-width encodings which is compatible with ASCII. The
      UTF-16 encoding scheme uses 16-bit, variable-width encodings. The UTF-32
      encoding scheme using 32-bit, fixed-width encodings.
Font Sets
A font set contains the glyphs to be displayed on the screen for a corresponding
  character in a character set. A display must support a suitable font to
  display a character set. If suitable fonts are available to the X server, then
  X clients can include support for different character sets.
  
xterm(1) includes support for
  Unicode with UTF-8 encoding. 
xfd(1)
  is useful for displaying all the characters in an X font.
The 
NetBSD
  wscons(4) console provides
  support for loading fonts using the
  
wsfontload(8) utility.
  Currently, only fonts for the ISO8859-1 family of character sets are
  supported.
Internationalization
  for Programmers
To facilitate translations of messages into various languages and to make the
  translated messages available to the program based on a user's locale, it is
  necessary to keep messages separate from the programs and provide them in the
  form of message catalogs that a program can access at run time.
Access to locale information is provided through the
  
setlocale(3) and
  
nl_langinfo(3) interfaces.
  See their respective man pages for further information.
Message source files containing application messages are created by the
  programmer and converted to message catalogs. These catalogs are used by the
  application to retrieve and display messages, as needed.
NetBSD supports two message catalog interfaces: the
  X/Open 
catgets(3) interface and
  the Uniforum 
gettext(3)
  interface. The 
catgets(3)
  interface has the advantage that it belongs to a standard which is well
  supported. Unfortunately the interface is complicated to use and maintenance
  of the catalogs is difficult. The implementation also doesn't support
  different character sets. The
  
gettext(3) interface has not
  been standardized yet, however it is being supported by an increasing number
  of systems. It also provides many additional tools which make programming and
  catalog maintenance much easier.
Support for Multi-byte
  Encodings
Some character sets with multi-byte encodings may be difficult to decode, or may
  contain state (i.e., adjacent characters are dependent). ISO C specifies a set
  of functions using 'wide characters' which can handle multi-byte encodings
  properly. The behaviour of these functions is affected by the
  
LC_CTYPE category of the current locale.
A wide character is specified in ISO C as being a fixed number of bits wide and
  is stateless. There are two types for wide characters:
  
wchar_t and 
wint_t.
  
wchar_t is a type which can contain one wide character and
  operates like 'char' type does for one character. 
wint_t can
  contain one wide character or WEOF (wide EOF).
There are functions that operate on 
wchar_t, and substitute
  for functions operating on 'char'. See
  
wmemchr(3) and
  
towlower(3) for details. There
  are some additional functions that operate on 
wchar_t. See
  
wctype(3) and
  
wctrans(3) for details.
Wide characters should be used for all I/O processing which may rely on
  locale-specific strings. The two primary issues requiring special use of wide
  characters are:
  - All I/O is performed using multibyte characters. Input
      data is converted into wide characters immediately after reading and data
      for output is converted from wide characters to multi-byte encoding
      immediately before writing. Conversion is controlled by the
      mbstowcs(3),
      mbsrtowcs(3),
      wcstombs(3),
      wcsrtombs(3),
      mblen(3),
      mbrlen(3), and
      mbsinit(3).
- Wide characters are used directly for I/O, using
      getwchar(3),
      fgetwc(3),
      getwc(3),
      ungetwc(3),
      fgetws(3),
      putwchar(3),
      fputwc(3),
      putwc(3), and
      fputws(3). They are also
      used for formatted I/O functions for wide characters such as
      fwscanf(3),
      wscanf(3),
      swscanf(3),
      fwprintf(3),
      wprintf(3),
      swprintf(3),
      vfwprintf(3),
      vwprintf(3), and
      vswprintf(3), and wide
      character identifier of %lc, %C, %ls, %S for conventional formatted I/O
      functions.
SEE ALSO
gencat(1),
  
xfd(1),
  
xterm(1),
  
catgets(3),
  
gettext(3),
  
nl_langinfo(3),
  
setlocale(3),
  
wsfontload(8)
BUGS
This man page is incomplete.