Supported Languages and Systems

The goal of Romanization.NET is to provide a simple, extensive way to romanize widely-used languages as accurately as possible.

Below is a list of all supported languages and systems, with explanations of caveats and limitations if necessary. Languages are ordered lexicographically ascending.

Chinese

Hànyǔ Pīnyīn

The Hànyǔ Pīnyīn system is considered a Readings System, and supports all Hànzì characters in the Unihan database.

The reading types to use can be specified, but default to using all of them.

The order in which readings are returned is as follows:

Hànyǔ Pīnyīn
Hànyǔ Pínlǜ - Hànyǔ Pīnyīn as it appeared in Xiàndài Hànyǔ Pínlǜ Cídiǎn
XHC - Hànyǔ Pīnyīn as it appeared in Xiàndài Hànyǔ Cídiǎn

Japanese

Modified Hepburn

This system is a revised version of the romanization system first published by James Curtis Hepburn, and the one in most widespread use in Japan.

It only supports Kana (Hiragana and Katakana), not Kanji. See below for Kanji support.

This supports syllabic n (ん), long consonants (sokuon, or っ), and long vowels (chōonpu (ー) only).

Limitations

In the Modified Hepburn system, certain pairs of subsequent vowels in the romanized result are to be combined into single long vowels, often indicated with a macron (aa => ā, for example).

The issue is, according to the spec for the system, these combinations depend on whether the two vowels belong to different morphemes - this is not something known to the program. As a result, while some vowel combinations could be done (not all have this requirement), to remain consistent in output, no vowel combination is done.

Kanji (Kun & On) Readings

Kanji are effectively Japan's Hànzì, and share many of the same considerations and even symbols.

While Kana are syllabaries (each character is one syllable, and therefore maps neatly to a distinct sound), Kanji are their own symbols that can be a variable number of syllables. To make things more complicated, each can have multiple readings - in both Kun'yomi and On'yomi.

This is why this system is considered a Readings System for the purposes of this library, which means you can get every known reading from the Unihan database for each character.

The two reading types supported are:

Kun'yomi - often referred to as just Kun - the native reading
On'yomi - often referred to as just On - the Sino-Japanese reading

Additional Notes

Because Kanji often appear alongside supplementary Kana, the system also has a small convenience function that romanizes both Kanji and Kana, using the system of your choice for Kana.

Korean

Revised Romanization of Korean

The Revised Romanization of Korean system is the most commonly used, and does not make use of accents or macrons.

The system has a few provisions for certain kinds of content, which change the romanization somewhat:

Certain special pairs of Jamo are not combined in given names
Whether or not aspiration is reflected in the romanization depends on whether or not the word is a noun
Sometimes it can be helpful to hyphenate syllables, which occassionally makes a difference in disambiguating words with the same romanization (ga-eul vs. gae-ul)

The library's implementation of this system supports all of these provisions as options that can be supplied to the function.

Hanja => Hangeul Readings

Hanja, like Kanji, came from China and share their symbols with Hànzì. As a result, this is also considered a Readings System as some Hanja have multiple possible readings.

As with the other Hànzì-related characters, the supported Hanja are all from the Unihan database.

Only one reading type is supported, which is the Hangeul equivalent pronunciation for each Hanja character.

Additional Notes

Because the goal of this package is, as the name suggests, romanization, the implementation also includes a function for first converting the Hanja to Hangeul, then romanizing the Hangeul using the system of your choice.

Russian

At the time of writing, Russian has no single international standard of romanization/transliteration. Instead different systems are used by different groups for different purposes. As a result, there are many systems all implemented with very similar transliterations.

BGN/PCGN

Developed jointly by the Unites States Board on Geographic Names and the Permanent Committee on Geographical Names for British Official Use, it is designed to be easier for anglophones to pronounce.

Because of this, it's likely a solid choice for romanizing text specifically for English speakers (US/CA/UK audience).

GOST 7.79-2000 System A / ISO 9

GOST 7.79-2000(A) focuses on mapping one Cyrillic character to one Latin character, potentially with diacritics.

ISO 9:1995 is the current standard for Slavic transliteration from the ISO, and is based on ISO/R 9:1968.

The two systems are functionally identical and in this library are combined into one, under the name of GOST 7.79-2000 System A. This is to retain consistency with the other GOST systems included, as it may be strange to have GOST 7.79-2000 System B but have A under a different name.

GOST 7.79-2000 System B

In contrast to the above, GOST 7.79-2000(B) focuses on mapping one Cyrillic character to potentially several Latin characters (eg. щ -> shh), but without the use of diacritics.

GOST 16876-71 Table 1 (UNGEGN)

GOST 16876-71(1) focuses on mapping one Cyrillic character to one Latin character, potentially with diacritics.

It was recommended by the United Nations Group of Experts on Geographical Names (UNGEGN) in 1987.

GOST 16876-71 was most recently updated in 1980, and was abandoned in favour of GOST 7.79-2000 in 2002 by the Russian Federation.

The system was put into effect by the Russian government in 2013 for all citizen passports.

General Road Signs

This is the system generally used for romanization for road signs and the like.

This originally followed GOST 10807-78 (tables 17, 18), but now follows GOST R 52290-2004 (tables Г.4, Г.5).

Supported Languages and Systems

Chinese

Hànyǔ Pīnyīn

Japanese

Modified Hepburn

Limitations

Kanji (Kun & On) Readings

Additional Notes

Korean

Revised Romanization of Korean

Hanja => Hangeul Readings

Additional Notes

Russian

BGN/PCGN

GOST 7.79-2000 System A / ISO 9

GOST 7.79-2000 System B

GOST 16876-71 Table 1 (UNGEGN)

GOST 16876-71 Table 2

Scholarly/Scientific Transliteration

ISO Recommendation No. 9 (ISO/R 9:1968)

American Library Association and Library of Congress (ALA-LC) System

British Standard 2979:1958

ICAO Doc 9303

General Road Signs