Module:Wp/nod/Translit2
English Version
editTemplates and Intended Direct Use
editThe prefix WP/nod used in the icubator has been omitted.
Recommended invocation | Function | Purpose | Testcases |
---|---|---|---|
{{xlit2}} |
trpage |
Transliterates a page from the Tai Tham script to the Thai script, mapping consonants etymologically. | {{xlit2/testcases/control}} |
{{xlit3}} |
trphage |
Transliterates a page from the Tai Tham script to the Thai script, mapping consonants phonetically. It is very much in development. | {{xlit2/testcases/control}} |
{{ᨩᩨ᩵ᨲ᩠ᩅᩫ}} |
lettername |
Returns name of letter (consonant or independent vowel) as convenient for naming Unicode codepoint. See {{ᨩᩨ᩵1A60}} for combining marks.
|
{{ᨩᩨ᩵ᨲ᩠ᩅᩫ/testcases}} |
{{ᨿᩣ᩠ᨠ|word}} |
hardword |
Special handling for when transliteration rules fail. | |
{{#invoke:Translit2|tr|string}} | tr |
Transliterates a string from the Tai Tham script to the Thai script, mapping consonants etymologically. | {{xlit2/testcases}} |
{{#invoke:Translit2|tr|string|true}} | tr |
Transliterates a string from the Tai Tham script to the Thai script, mapping consonants phonetically. It is very much in development. | {{xlit3/testcases}} |
Algorithms of function tr
editWord Boundaries
editThe design assumes that word boundaries will frequently not be indicated. However, it is assumed that marking a word boundary is preferred to invoking the template {{ᨿᩣ᩠ᨠ}}.
Dependent Vowels
editThe simplest analysis for transliteration is to treat final glottal stops as part of the vowel. Northern Thai therefore has 12 vowel qualities, which can be short or long, and occur in an open or a closed syllable. 3 of these are diphthongs, which, unlike the system supported by the Standard Thai orthography, can also occur in short, closed syllables. There are three vowel-consonant combinations which may or must have special symbols - these appear to map straightforwardly to their Standard Thai equivalents. It appears that simply equating ᩂ and ᩄ with ฤ and ฦ works well enough.
The use of ᨠᩢ and ᨠᩡ to mark final /k/ also needs to be handled under the heading of vowels.
The transliteration process treats the vowels of ᨠᩣ and ᨣᩤ identically.
This tables links to the chief area of discussion of the transliteration of each vowel.
Sound quality | Short closed | Short open | Long closed | Long open | Other |
---|---|---|---|---|---|
/a/ | ᨠᩢ อั Yes | ᨠᩡ อะ Yes | ᨠᩣ อา Yes | ᨠᩣ อา Yes | |
/i/ | ᨠᩥ อิ Yes | ᨠᩥ อิ Yes | ᨠᩦ อี Yes | ᨠᩦ อี Yes | |
/ɯ/ | ᨠᩧ อึ Yes | ᨠᩧ อึ Yes | ᨠᩨ อื | ᨠᩨ อือ | |
/u/ | ᨠᩩ อุ Yes | ᨠᩩ อุ Yes | ᨠᩪ อู Yes | ᨠᩪ อู Yes | |
/e/ | ᨠᩮᩢ เอ็ Yes | ᨠᩮᩡ เอะ | ᨠᩮ เอ Yes | ᨠᩮ เอ Yes | |
/ɛ/ | ᨠᩯᩢ แอ็ Yes | ᨠᩯᩡ แอะ | ᨠᩯ แอ Yes | ᨠᩯ แอ Yes | |
/o/ | ᨠᩫ อ Yes | ᨠᩰᩡ โอะ | ᨠᩰᩫ โอ Yes | ᨠᩰ โอ Yes | ᨠᩮᩣ โอ Yes |
/ɔ/ | ᨠᩬᩢ อ็อ | ᨠᩰᩬᩡ เอาะ Yes | ᨠᩬ ออ Yes | ᨠᩬᩴ ออ Yes | |
/ɤ/ | ᨠᩮᩥᩢ เอิ-็ Yes | ᨠᩮᩬᩥᩡ เออะ | ᨠᩮᩥ เอิ | ᨠᩮᩬᩥ เออ | |
/ia/ | ᨠ᩠ᨿᩢ เอีย็ | ᨠ᩠ᨿᩮᩡ เอียะ | ᨠ᩠ᨿ เอีย | ᨠ᩠ᨿᩮ เอีย | |
/ɯa/ | ᨠᩮᩬᩥᩢ เอือ็ Yes | ᨠᩮᩬᩥᩋᩡ เอือะ | ᨠᩮᩬᩥ เอือ | ᨠᩮᩬᩥᩋ เอือ | |
/ua/ | ᨠ᩠ᩅᩢ อ็ว No | ᨠ᩠ᩅᩫᩡ อํวะ | ᨠ᩠ᩅ อว | ᨠ᩠ᩅᩫ อัว |
ᨠᩬ is handled differently with and without a tone mark.
Mai kak in its various forms:
Vowel | Sample Words | ||
---|---|---|---|
/a/ | ᩁᩢ รัก | ||
/aː/ | ᨾᩢᩣ มาก Yes | ||
/ua/ | ᨻ᩠ᩅ᩶ᩡ พวก | ||
/uː/ | ᩃᩪᩢ ลูก | ||
/ɔː/ | ᨯᩬᩢ ดอก | ᨾᩬᩡ ดอก | ᨯᩬᩢᩡ ดอก |
Other symbols:
Sound | /ai/ | /ai/ | /au/ | /au/ | /au/ | /am/ | /rɯ/ | /lɯ/ |
---|---|---|---|---|---|---|---|---|
Spelling | ᨠᩱ ไอ | ᨠᩲ ใอ | ᨠᩮᩢᩣ เอา Yes | ᨠᩳ เอา | ᨠᩪᩦ เอา | ᨠᩣᩴ Yes | ᩂ | ᩄ |
Preposed Vowels
editThe two parts of the short preposed vowels of open syllables (ᨠᩮᩡ ᨠᩯᩡ ᨠᩰᩡ) are treated independently. Any tone mark can be handled independently.
Explicitly short preposed closed vowels (ᨠᩮᩢ᩠ᨠ ᨠᩯᩢ᩠ᨠ) occur rarely if ever with tone marks, and the interaction can be ignored. They can therefore be treated as long open preposed vowel plus a mark converted to maitaikhu (อ็) in place. There appears to be a convention that implicitly short preposed closed vowels are not marked as short in transliteration.
The two parts of the closed vowel ᨠᩰᩫ᩠ᨠ are treated independently.
The compound vowel symbol ᨠᩮᩣ rarely if ever incorporates a tone mark; it can therefore be converted to ᨠᩰ.
The preposed vowels (ᨠᩮ ᨠᩯ ᨠᩰ ᨠᩱ ᨠᩲ) are then converted to the corresponding Thai vowels (เอ แอ โแ ไอ ใอ). Ideally, one would just swap them with every preceding medial consonant and combination of sakot + consonant. One problem that has not been addressed is the the Thai orthographic syllable boundaries within consonant clusters occurs later. For example, the common final element -ᨵᨾ᩠ᨾᩮᩣ in monks’ names transliterates to -ธัมโม using vernacular rules, or -ธมฺโม for academic Pali. The extreme solutions are to interchange with just one consonant and to interchange with the whole cluster. The latter rule works much better for normal text, and is what is currently implemented.
Unambiguous Compound Vowels
editAny combination of combining marks containing ᨠᩬ and a tone mark is treated as a compound vowel; these two marks must be swapped round in transliteration.
The following vowels are handled by a context-free substitution of two characters: ᨠᩯᩢ ᨠᩮᩣ ᨠᩰᩫ ᨠᩣᩴ ᨠᩢᩣ
The following vowels are handled by substituting for a maximal sequence of vowel signs and tone marks excluding SIGN A (ᨠᩡ); the tone marks are not shown in the list: ᨠᩮᩢ ᨠᩰᩬᩡ ᨠᩮᩥᩢ ᨠᩮᩬᩥᩢ ᨠᩮᩢᩣ ᨠᩬᩴ ᨠᩬ (only with tone mark)
ᨠᩡ is excluded from the sequence with the aim of dealing with open short vowels as one deals with long (closed) vowels. The maitaikhu resulting from some of these vowel symbols will not fit above the consonant in some of these syllables; it is placed on the next Thai character, which should be a consonant.
Tricky encoding is used for ᨠᩰᩬᩡ; the substring ᨠᩰᩬ is converted to เกา and the ᨠᩡ is converted independently.
Unambiguous Explicit Simple Vowels
editOnce all compound vowels have been dealt with, most of the simple vowels shown by a single combining mark can be dealt with by simple substitution to a combining mark or nothing. If the order of the encoding proposals is followed, then with one exception, treated as a compound vowel, the order of vowel sign and tone mark will be the same in both scripts. This handles the vowels ᨠᩡ ᨠᩢ ᨠᩣ ᨠᩥ ᨠᩦ ᨠᩧ ᨠᩩ ᨠᩪ ᨠᩫ ᨠᩬ ᨠᩮᩥ ᨠᩳ.
Ambiguous Vowels
editThe 'medial' vowels, those represented by subscript ᨿ and ᩅ , are generally ambiguous. See the section on medial vowels for more information. Certain other vowels are also ambiguous considered in isolation:
Although ᨠᩨ is unambiguous in the context of Northern Thai, it transliterates to อื in closed syllables but to อือ in open syllables.
The combination ᨠᩮᩬᩥ is ambiguous. In closed syllables, it represents /ɯːa/ and is transliterated as เอือ, but in open syllables it represents /ɜː/ and is transliterated as เออ unless it is part of the longer sequences ᨠᩮᩬᩥᩋ and ᨠᩮᩬᩥᩋᩡ which are the long and short syllables corresponding to /ɯːa/ and transliterate as เอือ and เอือะ. Because of word pairs such as ᩋᨶᩣᨳ and ᨶᩣᨳ, the ambiguity cannot always be resolved.
The combination ᨠᩬᩢ is also ambiguous. In an apparently open syllable, mai sat is actually mai kak, as in ᨯᩬᩢ = ᨯᩬᨠ which transliterates as ดอก. In a clearly closed syllable, ᨠᩬᩢ transliterates as อ็อ.
The most complicated logic distinguishing open and closed syllables is as follows:
- Split text into vowel sequence, tones plus sakot, and the next two characters.
- If the next two characters are ᩋᩡ and it needs special handling, apply the special handling and exit
- else if the next two characters are consonant and vowel, tone or sakot (should include medials!) then the syllable is open
- else if the next character is ᩋ and it needs special handling, apply the special handling and exit
- else if the next character is a consonant, the syllable is closed
- otherwise, the syllable is open.
This works on the principle that the syllable should be in a native Thai word, and therefore will not end in two consonants, albeit one silent, and that the final consonant will not be stacked with the initial syllable of the next. Therefore, stacked consonants following the vowel will start a new syllable.
The processing for ᨠᩮᩬᩥ identifies ᨠᩮᩬᩥᩡ as containing it in an open syllable.
This principle breaks down with some loanword spellings, such as ᨣᩬᩢᨷ᩠ᨷᩦ᩶ 'copy'. For now, the simplest solution for ᨠᩬᩢ is to treat spellings such as ᨯᩬᩢ as anomalous.
Medial Vowels
editAmbiguities
editThe sequence ᨠ᩠ᨿᩢ is ambiguous. ᨿ is part of the vowel symbol in: ᨻᩕ᩠ᨿᩢᨠ ᨽ᩠ᨿᩢᨠ but is merely an onset consonant in: ᨡ᩠ᨿᩢ᩶ᩁ ᨺᩪ᩶ᩉ᩠ᨿᩢ᩶ᩁ ᩈ᩠ᨿᩢᨾ᩠ᨽᩪ ᩉ᩠ᨿᩢᨦ ᩉ᩠ᨿᩢᩁ ᩉ᩠ᨿᩢᨷ ᩋᩉ᩠ᨿᩢᨦ
ᨻ᩠ᨿᩢᨷᨯᩯ᩠ᨯ and ᨻ᩠ᨿᩢᨻ look contentious – the MFL transliterates the first syllable as พยับ and พยัพ but records the pronunciation as เปี๊บ.
The sequence is currently treated as a compound vowel, but based on the statistics, ᨻᩕ᩠ᨿᩢᨠ and ᨽ᩠ᨿᩢᨠ (which are the same word) should be listed as exceptions.
The sequence ᨠ᩠ᨿ is also ambiguous between เอีย in closed syllables (e.g. > เรียน) and combinations with implicit vowels such as อยั and อยะ in ᨻ᩠ᨿᨬ᩠ᨩᨶ > พยัญชนะ and ᩋᩣᨶᨱ᩠ᨿ > อานัณยะ.
The sequence ᨠ᩠ᩅᩢ is likewise ambiguous. ᩅ is part of the vowel symbol in ᩉᩖ᩠ᩅᩢᨠ, but is merely an onset consonant in ᨠᩕᩉ᩠ᩅᩢᨯ ᨠ᩠ᩅᩢᨠ ᨠ᩠ᩅᩢᨦ ᨠ᩠ᩅᩢᨯ ᨡ᩠ᩅᩢᨠ ᨡ᩠ᩅᩢᩁ ᨡ᩠ᩅᩢ᩶ᩁ ᨣᩕᩉ᩠ᩅᩢᨯ ᨣᩖᩢ᩵ᨦ᩻ᨧ᩠ᩅᩢᨦ᩻ ᨣ᩠ᩅᩢᨠ ᨣ᩠ᩅᩢᨠ᩻ᨩᩦ᩶᩻ ᨣ᩠ᩅᩢᨯ ᨣ᩠ᩅᩢ᩶ᩁ ᨣᩢ᩠᩵ᨦ᩻ᨧ᩠ᩅᩢᨦ᩻ ᨤ᩠ᩅᩢᨠ ᨤ᩠ᩅᩢᨯ ᨤ᩠ᩅᩢᩁ ᨤ᩠ᩅᩢ᩵ᩁ ᨧ᩠ᩅᩢᨦ ᨧᩢ᩠ᨦᩉ᩠ᩅᩢᨯ ᨩ᩠ᩅᩢᨠ ᨲᩕᩉ᩠ᩅᩢᩁ ᨲᩱ᩶ᩉ᩠ᩅᩢᩁ ᨴ᩠ᩅᩢᨠ ᩅᩯ᩠ᨯ᩻ᩉ᩠ᩅᩢᩁ᩻ ᩈ᩠ᩅᩢᨠ ᩈ᩠ᩅᩢᩁ᩠ᨣ᩺ ᩈ᩠ᩅᩢᩔᨯᩦ ᩈ᩠ᩅᩢᩈᨯᩦ ᩉ᩠ᩅᩢᨠ ᩉ᩠ᩅᩢᨦ ᩉ᩠ᩅᩢᨯ ᩉ᩠ᩅᩢᩁ ᩉ᩠ᩅᩢ᩵ᩁ.
ᩉᩖ᩠ᩅᩢᨠ is therefore treated as an exception.
The sequence ᨠ᩠ᩅ is also ambiguous between the long vowel in a closed syllable and word-final กวะ as in ᩋᩢᩆ᩠ᩅ. The context easily disambiguates.
Implementation
editThe medial vowels are converted by finding the maximal sequence of Tai Tham vowels other than SIGN A, tone marks and vowel killers following consonant + subscript WA or YA. The consonant is required so as to exclude WA and YA acting as final consonants. This handles the compound vowel symbols ᨠ᩠ᩅ ᨠ᩠ᩅᩫ ᨠ᩠ᨿ ᨠ᩠ᨿᩢ ᨠ᩠ᨿᩮ. Combinations with tone marks are not shown. The combination ᨠ᩠ᩅ without any tone marks is not handled explicitly, for the vowel and the interpretation as a consonant cluster are homographs in both Tai Tham and Thai. ᨠ᩠ᨿᩮᩡ is processed as a combination of the vowels ᨠ᩠ᨿᩮᩡ and ᨠᩡ and ᨠ᩠ᩅᩫᩡ is processed as a combination of ᨠ᩠ᩅᩫ and ᨠᩡ.
Implicit Vowels
editPractically, there are three cases where the implicit vowel becomes explicit upon transcribing:
- Final vowel in Pali/Sanskrit Words
- Open syllables in native words and Special Cases
- Closed syllables in Pali/Sanskrit words.
Final Vowel in Pali/Sanskrit Words
editThis requires the detection of the end of words; the final vowel is not written after the first element of a compound. Detection relies on the occurrence of a non-word character.
Consonant-sakot-consonant at the end of the word implies that the word is of Pali or Sanskrit origin, and has a final implicit vowel.
Mai kaa + consonant at the end of a word implies a final implicit vowel: if there were no final implicit vowel, the word would be written mai kaa + sakot + consonant.
Consonant + consonant at the end of a word implies a Pali Sanskrit word, and so a final vowel, with one significant exception. The second consonant might be preceded by a medial vowel, so ᩅ and ᨿ are excluded from the being the first consonant, unless they are not preceded by sakot.
The consonant in final phonetic vowel plus consonant will be converted to a subscript form unless there is very little space for it, so that yields the condition single storey consonant + Pali vowel not a vowel below+ consonant except when that single storey consonant is itself subscript.
Open syllables in native words and Special Cases
editIn general, in the MFL, native open syllables with the implicit vowel in Northern Thai have it made explicit, but it is not made explicit in words of Pali/Sanskrit origin. However, it is made explicit if it is immediately preceded by ra hong even if the word is of Pali/Sanskrit origin. This yields the fairly reliable rule:
Closed syllables in Pali/Sanskrit words
editIn principle, the rule is simple – implicit vowels are transcribed explicitly, and the syllables are closed by explicit clusters. Unfortunately, if for example ᨩ᩠ᨿᨦᩉ᩠ᨾᩲ᩵ were written without a word break, this rule would then be transliterated as เชียงัใหม่. The rule is therefore restricted to clusters that cannot occur at the starts of Tai Tham words. A whitelist of clusters is maintained.
Consonants
editThe general processing of consonants is mostly straightforward. There are a few special cases:
ᨷᩕ is mapped to ปร. ᨻᩛ ᨾᩛ ᨭᩛ ᨱᩛ are mapped to พพ มม ฏฐ ณฐ, and the processes is generalised to all obstruent and nasal labial and retroflex base consonants.
There is one glaring omission – the mapping of final, non-subscript ᩁ to น.
Tones
editlocal export = {}
local gsub = mw.ustring.gsub
local u = mw.ustring.char
local match = mw.ustring.match
local find = mw.ustring.find
local f_yesno -- For module Yesno.
local PAGENAME = mw.title.getCurrentTitle().prefixedText
local function sc(s) return gsub(s, "[ᨠก]", ""); end; -- Remove mark bearers, which are added for readability.
local data = mw.loadData('Module:Wp/nod/Translit_data')
local disruptor = data.disruptor
local tt = data.tt2
local sakot = sc("ᨠ᩠")
local kia = {
[sc("ᨠ᩠ᩅ᩵")] = sc("ก่ว"),
[sc("ᨠ᩠ᩅ᩶")] = sc("ก้ว"),
[sc("ᨠ᩠ᩅᩫ")] = sc("กัว"),
[sc("ᨠ᩠ᩅᩫ᩵")] = sc("กั่ว"),
[sc("ᨠ᩠ᩅᩫ᩶")] = sc("กั้ว"),
[sc("ᨠ᩠ᨿ")] = sc("↶เกีย"),
[sc("ᨠ᩠ᨿ᩵")] = sc("↶เกี่ย"),
[sc("ᨠ᩠ᨿ᩶")] = sc("↶เกี้ย"),
[sc("ᨠ᩠ᨿᩢ")] = sc("↶เกีย็"),
[sc("ᨠ᩠ᨿᩢ᩵")] = sc("↶เกี่ย็"),
[sc("ᨠ᩠ᨿᩢ᩶")] = sc("↶เกี้ย็"),
[sc("ᨠ᩠ᨿᩮ")] = sc("↶เกีย"),
[sc("ᨠ᩠ᨿᩮ᩵")] = sc("↶เกี่ย"),
[sc("ᨠ᩠ᨿᩮ᩶")] = sc("↶เกี้ย"),
-- Hybrid forms
[sc("ᨠ᩠ᩅ้")] = sc("ก้ว"),
[sc("ᨠ᩠ᩅ๊")] = sc("ก๊ว"),
[sc("ᨠ᩠ᩅ๋")] = sc("ก๋ว"),
[sc("ᨠ᩠ᩅᩫ้")] = sc("กั้ว"),
[sc("ᨠ᩠ᩅᩫ๊")] = sc("กั๊ว"),
[sc("ᨠ᩠ᩅᩫ๋")] = sc("กั๋ว"),
[sc("ᨠ᩠ᨿ้")] = sc("↶เกี้ย"),
[sc("ᨠ᩠ᨿ๊")] = sc("↶เกี๊ย"),
[sc("ᨠ᩠ᨿ๋")] = sc("↶เกี๋ย"),
[sc("ᨠ᩠ᨿᩢ้")] = sc("↶เกี้ย็"),
[sc("ᨠ᩠ᨿᩢ๊")] = sc("↶เกี๊ย็"),
[sc("ᨠ᩠ᨿᩢ๋")] = sc("↶เกีย็๋"),
[sc("ᨠ᩠ᨿᩮ้")] = sc("↶เกี้ย"),
[sc("ᨠ᩠ᨿᩮ๊")] = sc("↶เกี๊ย"),
[sc("ᨠ᩠ᨿᩮ๋")] = sc("↶เกี๋ย"),
-- [sc("")] = sc(""),
}
local function pkia(m1, m2)
local r2 = kia[m2]
if r2 then
return m1..r2
else
return m1..m2
end
end
local tone=sc("ᨠ᩵᩶ก่ก้ก๊ก๋")
local pv=sc("ᨠᩰᩬᩢᨠᩱᩩᩥᩴᨠᩲᩪᩨᨠᩮᩧ᩵ᩤᨠᩯᩨᩣᨠᩫ")
local pvt = pv..tone
local pvtk = pvt..sc("ᨠ᩺ᨠ᩼")
local vt = pv..sc("ᨠᩢᩡกะ")..tone
local vts = vt..sakot
local cons_not_wy = "ᨠ-ᨾᩀᩁᩃᩆ-ᩌᩔ" -- Remove ᨿᩂᩄᩅ
local cons = "ᨠ-ᩌᩔ"
local pure_cons = "ᨠ-ᩁᩃᩅ-ᩌᩔ"
local cons_squat = "ᨠ-ᨭᨯ-ᩉᩋᩓᩔ" -- Omit ᨮᩊᩌ
local medial=sc("ᨠᩕᩖᨠᩛ")
local Mw = u(0x10fffe) -- Non-characters
local My = u(0x10ffff)
local Td = u(0xefffe)
-- 2-character transformations.
local kam = {
[sc("ᨠᩯᩢ")] = sc("↶แก็"),
[sc("ᨠᩮᩣ")] = sc("ᨠᩰ"), -- Need to manipulate later as single vowel.
[sc("ᨠᩮᩤ")] = sc("ᨠᩰ"),
[sc("ᨠᩰᩫ")] = sc("ᨠᩰ"),
[sc("ᨠᩣᩴ")] = sc("กำ"),
[sc("ᨠᩤᩴ")] = sc("กำ"),
["ᨷᩕ"] = "ᨸᩕ", -- Straight to Thai would mess up ᨷᩕᩰ
["ᩃᩖ"] = "ᩃ᩠ᩃ", -- Simpler to only generate vowels before clusters with sakot
[sc("ᨠᩢᩣ")] = sc("กา").."ก"
}
-- Context-independent sequences starting with mai ke or mai ko:
local keoext = {
[sc("ᨠᩯᩢ")] = sc("↶แก็"),
[sc("ᨠᩮᩢ")] = sc("↶เก็"), -- Interferes with ᨠᩮᩢ᩵ᩣ
[sc("ᨠᩮᩢ᩵")] = sc("↶เก่ก็↷"), -- Attestation?
[sc("ᨠᩮᩢ᩶")] = sc("↶เก้ก็↷"), -- Attestation?
[sc("ᨠᩰᩬ")] = sc("↶เกา"), -- For ᨠᩰᩬ᩶
[sc("ᨠᩰᩬ᩵")] = sc("↶เก่า"),
[sc("ᨠᩰᩬ᩶")] = sc("↶เก้า"),
[sc("ᨠᩮᩬᩥᩡ")] = sc("↶เกอะ"),
[sc("ᨠᩮᩬᩥ᩵ᩡ")] = sc("↶เก่อะ"),
[sc("ᨠᩮᩬᩥ᩶ᩡ")] = sc("↶เก้อะ"),
[sc("ᨠᩮᩥᩢ")] = sc("↶เกิก็↷"),
[sc("ᨠᩮᩥᩢ᩵")] = sc("↶เกิ่ก็↷"),
[sc("ᨠᩮᩥᩢ᩶")] = sc("↶เกิ้ก็↷"),
[sc("ᨠᩮᩬᩥᩢ")] = sc("↶เกือ็"), -- There is no ᨠᩮᩬᩥᩢ=เกือก
[sc("ᨠᩮᩬᩥᩢ᩵")] = sc("↶เกื่อ็"), -- Unattested
[sc("ᨠᩮᩬᩥᩢ᩶")] = sc("↶เกื้อ็"), -- Unattested
[sc("ᨠᩮᩢᩣ")] = sc("↶เกา"),
[sc("ᨠᩮᩢᩤ")] = sc("↶เกา"),
[sc("ᨠᩮᩢ᩵ᩣ")] = sc("↶เก่า"),
[sc("ᨠᩮᩢ᩵ᩤ")] = sc("↶เก่า"),
[sc("ᨠᩮᩢ᩶ᩣ")] = sc("↶เก้า"),
[sc("ᨠᩮᩢ᩶ᩤ")] = sc("↶เก้า"),
[sc("ᨠᩬᩴ")] = sc("กอ"),
[sc("ᨠᩬᩴ᩵")] = sc("ก᩵อ"),
[sc("ᨠᩬᩴ᩶")] = sc("ก᩶อ"),
[sc("ᨠᩬ᩵")] = sc("ก᩵อ"),
[sc("ᨠᩬ᩶")] = sc("ก᩶อ"),
}
local keoext_phonetic = {
[sc("ᨠᩯᩢ")] = sc("↶แก็"),
[sc("ᨠᩮᩢ")] = sc("↶เก็"), -- Interferes with ᨠᩮᩢ᩵ᩣ
[sc("ᨠᩮᩢ᩵")] = sc("↶เก่ก"), -- Attestation?
[sc("ᨠᩮᩢ᩶")] = sc("↶เก้ก"), -- Attestation?
[sc("ᨠᩰᩬ")] = sc("↶เกา"), -- For ᨠᩰᩬ᩶
[sc("ᨠᩰᩬ᩵")] = sc("↶เก่า"),
[sc("ᨠᩰᩬ᩶")] = sc("↶เก้า"),
[sc("ᨠᩮᩬᩥᩡ")] = sc("↶เกอะ"),
[sc("ᨠᩮᩬᩥ᩵ᩡ")] = sc("↶เก่อะ"),
[sc("ᨠᩮᩬᩥ᩶ᩡ")] = sc("↶เก้อะ"),
[sc("ᨠᩮᩥᩢ")] = sc("↶เกิก"),
[sc("ᨠᩮᩥᩢ᩵")] = sc("↶เกิ่ก"),
[sc("ᨠᩮᩥᩢ᩶")] = sc("↶เกิ้ก"),
[sc("ᨠᩮᩬᩥᩢ")] = sc("↶เกือ"), -- There is no ᨠᩮᩬᩥᩢ=เกือก
[sc("ᨠᩮᩬᩥᩢ᩵")] = sc("↶เกื่อ"), -- Unattested
[sc("ᨠᩮᩬᩥᩢ᩶")] = sc("↶เกื้อ"), -- Unattested
[sc("ᨠᩮᩢᩣ")] = sc("↶เกา"),
[sc("ᨠᩮᩢᩤ")] = sc("↶เกา"),
[sc("ᨠᩮᩢ᩵ᩣ")] = sc("↶เก่า"),
[sc("ᨠᩮᩢ᩵ᩤ")] = sc("↶เก่า"),
[sc("ᨠᩮᩢ᩶ᩣ")] = sc("↶เก้า"),
[sc("ᨠᩮᩢ᩶ᩤ")] = sc("↶เก้า"),
[sc("ᨠᩬᩴ")] = sc("กอ"),
[sc("ᨠᩬᩴ᩵")] = sc("ก᩵อ"),
[sc("ᨠᩬᩴ᩶")] = sc("ก᩶อ"),
[sc("ᨠᩬ᩵")] = sc("ก᩵อ"),
[sc("ᨠᩬ᩶")] = sc("ก᩶อ"),
-- Hybrid forms, for phonetic transliteration
-- [sc("ᨠᩮᩢ้")] = sc("↶เก้ก็↷"),
-- [sc("ᨠᩮᩢ๊")] = sc("↶เก๊ก็↷"),
-- [sc("ᨠᩮᩢ๋")] = sc("↶เก๋ก็↷"),
-- [sc("ᨠᩯᩢ้")] = sc("↶แก้ก็↷"),
-- [sc("ᨠᩯᩢ๊")] = sc("↶แก๊ก็↷"),
-- [sc("ᨠᩯᩢ๋")] = sc("↶แก๋ก็↷"),
[sc("ᨠᩮᩢ้")] = sc("↶เก๊ก"), -- Dirty cheat
[sc("ᨠᩮᩢ๊")] = sc("↶เก๊ก"),
[sc("ᨠᩮᩢ๋")] = sc("↶เก๋ก"),
[sc("ᨠᩯᩢ้")] = sc("↶แก๊ก"), -- Dirty cheat
[sc("ᨠᩯᩢ๊")] = sc("↶แก๊ก"),
[sc("ᨠᩯᩢ๋")] = sc("↶แก๋ก"),
[sc("ᨠᩰᩬ้")] = sc("↶เก้า"),
[sc("ᨠᩰᩬ๊")] = sc("↶เก๊า"),
[sc("ᨠᩰᩬ๋")] = sc("↶เก๋า"),
[sc("ᨠᩮᩬᩥ้ᩡ")] = sc("↶เก้อะ"),
[sc("ᨠᩮᩬᩥ๊ᩡ")] = sc("↶เก๊อะ"),
[sc("ᨠᩮᩬᩥ๋ᩡ")] = sc("↶เก๋อะ"),
-- [sc("ᨠᩮᩥᩢ้")] = sc("↶เกิ้ก็↷"),
-- [sc("ᨠᩮᩥᩢ๊")] = sc("↶เกิ๊ก็↷"),
-- [sc("ᨠᩮᩥᩢ๋")] = sc("↶เกิ๋ก็↷"),
[sc("ᨠᩮᩥᩢ้")] = sc("↶เกิ๊ก"), -- Dirty cheat
[sc("ᨠᩮᩥᩢ๊")] = sc("↶เกิ๊ก"),
[sc("ᨠᩮᩥᩢ๋")] = sc("↶เกิ๋ก"),
-- There is no ᨠᩮᩬᩥᩢ=เกือก
[sc("ᨠᩮᩢ้ᩣ")] = sc("↶เก้า"),
[sc("ᨠᩮᩢ้ᩤ")] = sc("↶เก้า"),
[sc("ᨠᩮᩢ๊ᩣ")] = sc("↶เก๊า"),
[sc("ᨠᩮᩢ๊ᩤ")] = sc("↶เก๊า"),
[sc("ᨠᩮᩢ๋ᩣ")] = sc("↶เก๋า"),
[sc("ᨠᩮᩢ๋ᩤ")] = sc("↶เก๋า"),
[sc("ᨠᩬᩴ้")] = sc("ก้อ"),
[sc("ᨠᩬᩴ๊")] = sc("ก๊อ"),
[sc("ᨠᩬᩴ๋")] = sc("ก๋อ"),
[sc("ᨠᩬ้")] = sc("ก้อ"),
[sc("ᨠᩬ๊")] = sc("ก๊อ"),
[sc("ᨠᩬ๋")] = sc("ก๋อ"),
}
local function pkuue(m1, m2, m3)
if mw.ustring.match(m3, "["..cons.."]["..vts.."]")
or not mw.ustring.match(m3, "^["..cons.."]") then -- Mai kuue is in open syllable
local repl = mw.ustring.match(m2, "^["..tone.."]")
if repl then
return sc("กื")..repl.."อ"..m3 -- Tone substituted at end
else
return sc("กือ")..m3
end
else
return m1..m2..m3 -- Use final character by character substitution
end
end
local function pkuea(m1, m2, m3)
local is_maikoe -- Open syllable, Thai เกอ
local repl = mw.ustring.match(m2, "^["..tone.."]")
if not repl then repl = ""; end
if "ᩋᩡ" == m3 then
return sc("↶เกื")..repl.."อะ" -- Tone substituted at end
elseif mw.ustring.match(m3, "["..cons.."]["..vts.."]") then
is_maikoe = true
elseif mw.ustring.match(m3, "^["..cons.."]") then
if "ᩋ" == mw.ustring.sub(m3, 1, 1) then
return sc("↶เกื")..repl..m3 -- Tone and consonant substituted at end
else
is_maikoe = false -- Closed syllable
end
else
is_maikoe = true
end
if is_maikoe then
return sc("↶เก")..repl.."อ"..m3 -- Tone substituted at end
else
return sc("↶เกื")..repl.."อ"..m3 -- Tone substituted at end
end
end
-- We need a white list of 2-consonant clusters that may occur within words and belong to different phonetic
-- syllables.
-- Clusters that start Northern Thai words are a problem - they need a word boundary hint
-- to handle properly. They could also be a problem within compound words. For now, they
-- require explicit handling.
local white_list = { -- Is there a better idiom for a quickly checked list?
["ᨠ᩠ᨠ"]=1, ["ᨠ᩠ᨡ"]=1, ["ᨣ᩠ᨣ"]=1, ["ᨣ᩠ᨥ"]=1, ["ᨦ᩠ᨠ"]=1, ["ᨦ᩠ᨡ"]=1, ["ᨦ᩠ᨣ"]=1, ["ᨦ᩠ᨥ"]=1, ["ᨦ᩠ᩈ"]=1,
["ᨧ᩠ᨧ"]=1, ["ᨧ᩠ᨨ"]=1, ["ᨩ᩠ᨩ"]=1, ["ᨩ᩠ᨫ"]=1, ["ᨬ᩠ᨧ"]=1, ["ᨬ᩠ᨨ"]=1, ["ᨬ᩠ᨩ"]=1, ["ᨬ᩠ᨫ"]=1, ["ᨬ᩠ᨬ"]=1,
-- When this is applied, ᩋᩘᨩ will have been converted to ᩋᨦ᩠ᨩ.
["ᨦ᩠ᨧ"]=1, ["ᨦ᩠ᨨ"]=1, ["ᨦ᩠ᨩ"]=1, ["ᨦ᩠ᨫ"]=1, ["ᨦ᩠ᩈ"]=1,
["ᨧ᩠ᩈ"]=1, ["ᨬ᩠ᨪ"]=1, ["ᨪ᩠ᨫ"]=1, ["ᨱ᩠ᨬ"]=1, -- Weird, but seen or perceived
-- When this is applied, ᨭᩛ and ᨱᩛ will have been converted to ᨭ᩠ᨮ and ᨱ᩠ᨮ.
-- ᨭᩛ/ᨭ᩠ᨮ may have to be black-listed, for ᨭᩛ is often used for ᨮ.
["ᨭ᩠ᨭ"]=1, ["ᨭ᩠ᨮ"]=1, ["ᨯ᩠ᨯ"]=1, ["ᨯ᩠ᨰ"]=1, ["ᨱ᩠ᨭ"]=1, ["ᨱ᩠ᨮ"]=1, ["ᨱ᩠ᨯ"]=1, ["ᨱ᩠ᨰ"]=1, ["ᨱ᩠ᨱ"]=1,
["ᨲ᩠ᨲ"]=1, ["ᨲ᩠ᨳ"]=1, ["ᨴ᩠ᨴ"]=1, ["ᨴ᩠ᨵ"]=1, ["ᨶ᩠ᨯ"]=1, ["ᨶ᩠ᨲ"]=1, ["ᨶ᩠ᨳ"]=1, ["ᨶ᩠ᨴ"]=1, ["ᨶ᩠ᨵ"]=1, ["ᨶ᩠ᨶ"]=1,
-- When this is applied, ᨻᩛ and ᨾᩛ will have been converted to ᨻ᩠ᨻ and ᨾ᩠ᨻ.
["ᨷ᩠ᨷ"]=1, ["ᨷ᩠ᨹ"]=1, ["ᨻ᩠ᨻ"]=1, ["ᨻ᩠ᨽ"]=1, ["ᨾ᩠ᨷ"]=1, ["ᨾ᩠ᨹ"]=1, ["ᨾ᩠ᨻ"]=1, ["ᨾ᩠ᨽ"]=1, ["ᨾ᩠ᨾ"]=1,
["ᨸ᩠ᨸ"]=1, ["ᨸ᩠ᨹ"]=1, ["ᨾ"]=1, ["ᨷ᩠ᨸ"]=1, ["ᨸ᩠ᨷ"]=1, -- Lao & mixed
["ᨠ᩠ᩇ"]=1, ["ᨦ᩠ᩆ"]=1, ["ᨦ᩠ᩇ"]=1, -- Sanskrit
-- When this is applied, ᩔ will have been converted to ᩈ᩠ᩈ.
["ᨿ᩠ᩉ"]=1, ["ᩃ᩠ᩉ"]=1, ["ᩅ᩠ᩉ"]=1, ["ᨿ᩠ᨿ"]=1, ["ᩃ᩠ᩃ"]=1, ["ᩈ᩠ᩈ"]=1, ["ᨱ᩠ᩉ"]=1, ["ᨬ᩠ᩉ"]=1, ["ᩊ᩠ᩉ"]=1,
-- [""]=1, [""]=1, [""]=1, [""]=1, [""]=1, [""]=1, [""]=1, [""]=1, [""]=1,
}
local function satwl(m1, m2)
if white_list[m2] then
if m1 == "ᨿ" then -- prevent interpretation as mai kia.
return disruptor.."ᨿ"..sc("ᨠᩢ")..m2
else
return m1..sc("ᨠᩢ")..m2
end
else
return m1..m2
end
end
local esf = {
[sc("ᨠᩘᨠ")] = sc("ᨦ᩠ᨠ"), [sc("ᨠᩜ")] = sc("ᨠ᩠ᨾ"), [sc("ᨠᩞ")] = sc("ᨠ᩠ᩈ"), ["ᩔ"] ="ᩈ᩠ᩈ" ,
}
local function pkra(m1, m2)
if "ᨻᩕ" == m1 and "ᩉ" == m2 then
return m1..m2 -- e.g. ᨻᩕᩉᩫ᩠ᨾ and ᨻᩕᩉᩢ᩠ᩈ
else
return m1..sc("ᨠᩡ")..m2
end
end
local function pword(m1, m2)
-- Conceptually this is just
-- return m1..export.hardword(m2, "ᨻ")
-- but that was too slow.
local entry = data.hard[m2]
return m1..(entry and entry[1] or m2)
end
local function pword3(m1, m2)
-- Conceptually this is just
-- return m1..export.hardword(m2, "ᩈ")
-- but that would be too slow.
local entry = data.hard[m2]
return m1..(entry and (entry[2] or entry[1]) or m2)
end
local dbg3 = ''
local function padd_tone(m1, m2, m3, m4, m5)
dbg3 = dbg3..m1..'s'..m2..'s'..m3..'s'..m4..'s'..m5..' => '
local new_tone = Td -- To allow next syllable to be processed at next call.
-- local new_word = {m1, m2, m3, new_tone, m4, m5}
local class = data.class[m1]
local Cres = 'ᨦᨬᨱᨶᨾᨿᩁᩃᩅᩊ'
local has_coda = false
local branch = 0
local sb2c = sc("^[ᨠᩩᩥᨠᩧᩢᨠᩫ]$")
local sb2o = sc("^[ᨠᩩᩥᨠᩧᩢ]$")
if class == 'R' then
local sb15 = sc("^[ᨠᩡᨠ᩺ᨠ᩼]")
if find(m5, '^'..u(0x1A60)) then
if find(m5, u(0x1A60)..'['..Cres..']') then
new_tone = sc("ก๋")
end
has_coda = true
branch = 11
elseif find(m5, "["..cons.."]["..vts.."]") then -- Sakot over-restrictive
has_coda = false
branch = 12
elseif find(m5, "^["..cons.."]") then
has_coda = true
branch = 13
if find(m5, "^["..Cres.."]") then
new_tone = sc("ก๋")
branch = 14
end
elseif find(m5, sb15) then
has_coda = true
branch = 15
elseif find(m5, "^ำ") then
has_coda = true -- Built in!
new_tone = sc("ก๋")
branch = 16
end
if not has_coda then
if #m4 > 0 then
new_tone = sc("ก๋")
branch = 1
elseif find(m3, sb2o) then
branch = 2
elseif #m2 == 0 and #m3 == 0 then
branch = 3
elseif #m2 == 3 and #m3 == 0 then -- count UTF-8 code units.
branch = 4
else
new_tone = sc("ก๋")
branch = 5
end
end
elseif class == 'F' then
local sb25 = sc("^[ᨠ᩺ᨠ᩼]")
local dead = nil
local short = nil
if find(m5, '^'..u(0x1A60)) then
dead = not find(m5, u(0x1A60)..'['..Cres..']')
has_coda = true
branch = 21
if #m2 == 0 and #m3 == 0 and #m4 == 0 then
dead = false -- not a dead syllable
has_coda = true;
branch = 27
end
elseif find(m5, "["..cons.."]["..vts.."]") then -- Sakot over-restrictive
has_coda = false
branch = 22
elseif find(m5, "^["..cons.."]") then
if #m2 == 0 and #m3==0 and #m4==0 then
has_coda = true
dead = false -- Not a dead syllable
branch = 28
else
has_coda = true
branch = 23
dead = not find(m5, "^["..Cres.."]")
end
elseif find(m5, "^ᩡ") then
has_coda = true
dead = true
short = true
branch = 24
elseif find(m5, sb25) then
has_coda = true
dead = false -- Not a dead syllable
branch = 25
elseif find(m5, "^ำ") then
has_coda = true -- Built in!
dead = false
branch = 26
end
if not has_coda then
if #m4 > 0 then
dead = false
branch = 31
elseif find(m3, sb2o) then
dead = true
short = true
branch = 32
else
dead = false -- Not a dead syllable
branch = 33
end
end
if dead then
-- Next fails for compound vowels, but gets corrected by a dirty cheat in keoext_phonetic.
if short == nil then short = find(m3, sb2c) end
if short then
new_tone = sc("ก๊")
else
new_tone = sc("ก้")
end
end
end
-- dbg3 = dbg3..m1..m2..m3..new_tone..m4..m5..branch..' '
return
m1..m2..m3..new_tone..m4..m5
-- table.concat(new_word) -- Slower!
end
function export.tr(text, phonetic)
local tStart = os.clock()
if type(text) == "table" then -- called directly from a template
phonetic = text.args[2]
text = text.args[1]
end
f_yesno = f_yesno or require('Module:Wp/nod/Yesno')
phonetic = f_yesno(phonetic)
text = "cont"..text.."exts" -- Supply ASCII context
local wordchar = u(0x1A20).."-"..u(0x1A7F)..u(0x0E01).."-"..u(0x0E3A)..u(0x0E40).."-"..u(0x0E4E).."↶↷"
local lwordchar = u(0x1A20).."-"..u(0x1A7F)
-- Expect text to have been subjected to Unicode normalisation.
-- Process known 'hard' words.
if phonetic then
text = gsub(text, "([^"..lwordchar.."])(["..lwordchar.."]*)", pword3)
else
text = gsub(text, "([^"..lwordchar.."])(["..lwordchar.."]*)", pword)
end
-- Undo the curse of Davis
text = gsub(text, "("..sakot..")(["..tone.."]+)", "%2%1")
-- Drop syllable-internal mai sam
text = gsub(text, "("..u(0x1A7B)..")(["..vt..medial..sakot.."])", "%2")
-- Deal with haang "ᩛ"
text = gsub(text, "([ᨭ-ᨱ])ᩛ", "%1᩠ᨮ")
text = gsub(text, "([ᨷ-ᨾ])ᩛ", "%1᩠ᨻ")
-- Simplify letters that implicitly stack etc.
text = gsub(text, sc("[ᩔᩜᩘᨠᩞ]"), esf) -- Need U+1A60 in replacements
-- 2-character transformations.
if phonetic then
text = gsub(text, sc("[ᩃᨷᩮᩤᨠᩣᨠᩰᩢ][ᨠᩖᩢᩣᨠᩫᩤᨠᩕᩴ]"), kam)
else
text = gsub(text, sc("[ᩃᨷᩮᩤᨠᩯᩣᨠᩰᩢ][ᨠᩖᩢᩣᨠᩫᩤᨠᩕᩴ]"), kam)
end
-- Manifest mai sat as an implicit vowel.
text = gsub(text, "(["..cons..medial..sc("])([")..cons..sc("]ᨠ᩠[")..cons.."])", satwl)
-- Generate final implicit vowel after cluster. Depends on end of word indication
text = gsub(text, "([^"..sakot.."])(["..cons..sc("]ᨠ᩠[")..cons..
"])([^"..wordchar.."])", "%1%2ะ%3")
text = gsub(text, "([^"..sakot.."])(["..cons.."]["..medial..
"])([^"..wordchar.."])", "%1%2ะ%3")
-- Generate final vowel in other cases.
text = gsub(text, "(["..sc("ᨠᩰᨠᩣᨣᩤ][")..cons.."])([^"..wordchar.."])", "%1ะ%2")
text = gsub(text, "(["..cons_not_wy.."]["..pure_cons.."])([^"..wordchar.."])", "%1ะ%2")
text = gsub(text, "([^"..sakot.."][ᨿᩅ]["..cons.."])([^"..wordchar.."])", "%1ะ%2")
text = gsub(text, "([^"..sakot.."]["..cons_squat..sc("][ᨠᩥᨠᩦᨠᩮᨠᩰ][")..cons..
"])([^"..wordchar.."])", "%1ะ%2")
if phonetic then -- Insert/modify tones
text = gsub(text, sc("ᨠ᩠[ᩅᨿ]"), {[sc("ᨠ᩠ᩅ")] = Mw, [sc("ᨠ᩠ᨿ")] = My})
med3 = sc("ᨠᩕᨠᩖ")..Mw..My
Fc = "ᨣᨩᨴᨻ" -- Consonants that transliterate from low to mid
Rc = "ᨠᨧᨭᨲᨸ" -- Consonants that change from high to mid.
text = gsub(text, "["..Fc..sc("]ᨠᩕ"),
{["ᨣᩕ"] = "คᩕ", ["ᨩᩕ"] = "ชᩕ", ["ᨴᩕ"] = "ทᩕ", ["ᨻᩕ"] = "พᩕ"})
local ctone = {[sc("ᨠ᩵")] = sc("ก้"), [sc("ᨠ᩶")] = sc("ก๊")}
local function ptone(m1, m2) return m1..ctone[m2] end
local Vo = sc("ᩮᩰᩯᩱᩲᨠᩩᩢᨠᩪᩥᨠᩬᩦᨠᩧᨠᩨᨠᩫᨠᩴ")
text = gsub(text, "(["..Fc.."]["..med3.."]*["..Vo.."]*)("..sc("[ᨠ᩵ᨠ᩶])"), ptone)
local Vf = sc("ᩤᩣ")
text = gsub(text, "(["..Fc..Rc.."])(["..med3.."]*)(["..Vo.."]*)([ᩤᩣ]?)"..
"([^"..tone..Vo..Vf..Td.."].)", padd_tone)
text = gsub(text, "(["..Fc..Rc.."])(["..med3.."]*)(["..Vo.."]*)([ᩤᩣ]?)"..
"([^"..tone..Vo..Vf..Td.."].)", padd_tone)
text = gsub(text, "["..Mw..My..Td.."]",
{[Mw] = sc("ᨠ᩠ᩅ"), [My] = sc("ᨠ᩠ᨿ"), [Td] = ""})
end
-- And turn most final ᩁ into ᨶ.
text = gsub(text, "([^"..sc("ᨣᩬᨣ᩠ (").."])ᩁ([^"..wordchar.."])", "%1ᨶ%2")
-- Deal with ᨠ᩠ᨿᩮ and ᨠ᩠ᩅᩫ derivatives.
text = gsub(text, "(["..cons..medial.."])("..sakot.."[ᨿᩅ]["..pvtk.."]*)", pkia)
-- Does not handle Tai Khün ᨠ᩠ᨿ᩺ and ᨠ᩠ᩅ᩺!
-- text = gsub(text, "(["..cons..medial.."])("..sakot.."[ᨿᩅ]["..pvt.."]*)%f[^"..sakot.."]", pkia)
--long transformations
if phonetic then
text = gsub(text, sc("[ᨠᩮᨠᩰᩬᨠᩯ][")..pvt.."]*", keoext_phonetic)
else
text = gsub(text, sc("[ᨠᩮᨠᩰᩬᨠᩯ][")..pvt.."]*", keoext)
end
text = gsub(text, sc("(.ᨠᩕ)([")..cons.."])", pkra)
-- Context-dependent transliterations
-- ᨠᩨ ᨠᩮᩬᩥ (including ᨷᩮᩬᩥ᩵ᩋᩡ)
text = gsub(text, sc("(ᨠᩨ)(["..tone.."]?ᨠ᩠?)(.?.?)"), pkuue)
text = gsub(text, sc("(ᨠᩨ)(["..tone.."]?ᨠ᩠?)(.?.?)"), pkuue) -- Run again for words like ᨾᩨᨳᩨ
text = gsub(text, sc("(ᨠᩮᩬᩥ)(["..tone.."]?ᨠ᩠?)(.?.?)"), pkuea)
text = gsub(text, sc("(ᨠᩮᩬᩥ)(["..tone.."]?ᨠ᩠?)(.?.?)"), pkuea) -- Run again for same reason
if phonetic then
text = gsub(text, ".", data.tt3)
else
text = gsub(text, ".", tt)
end
text = gsub(text, sc("([ᨠ᩠")..u(0x200D).."][ก-ฮ]̱?)↶([เแโไใ])", "↶%2%1")
text = gsub(text, sc("(ᨠ᩠").."[ก-ฮ]̱?)↶([เแโไใ])", "↶%2%1") -- Words like ᩉ᩠ᨦ᩠ᩅᩯ᩶᩻ yielding แหงว้ ๆ
text = gsub(text, "([ก-ฮ]̱?)↶([เแโใไ])", "%2%1")
text = gsub(text, sakot, "")
text = gsub(text, "("..sc("ก็")..")↷([ก-ฮ]̱?)", "%2%1")
return
-- '\n '..(os.clock()-tStart)..' ᩅᩥᨶᩣᨴᩦ:\n'..
string.sub(text, 5, -5) --..dbg3 -- discard added ASCII context
end
local debug = "+DEBUG="
local function plink(m1, m2, m3)
m2 = gsub(m2, "^:", " :")
if m3 == "|" then
return m1.."{{Wp/nod/ᩀ᩵ᩣᨳᩬᨯ-ᨠ|"..m2.."}}|"
else
return m1.."{{Wp/nod/ᩀ᩵ᩣᨳᩬᨯ-ᨠ|"..m2.."}}|"..m2..m3
end
end
local trans_mode
local function recur(m1, m2, m3)
debug = debug..";"..m1..","..m2..","..m3
if mw.ustring.find(m1, "^.%{%{%u%u") then -- Might be magic!
debug = debug..",MAGIC "
return m1..m2..m3
elseif "|" == m2 then
if mw.ustring.find(m3, "^ᩅᨳ=") then -- Don't override
debug = debug..",NOOVERRIDE "
return m1..m2..m3
end
end
debug = debug..",DONE "
return m1.."|ᩅᨳ="..trans_mode..m2..m3
end
local function pparam(m1, m2, m3)
if mw.ustring.find(m2, "^|") or m2 == "" then
return trans_mode
else
return m1..m2..m3
end
end
function export.trpage(page, phonetic, text)
if type(page) == "table" then -- called directly from a template
page = page.args[1] or PAGENAME
phonetic = false
text = nil
else
f_yesno = f_yesno or require('Module:Wp/nod/Yesno')
phonetic = f_yesno(phonetic)
end
text = text or mw.title.new(page):getContent()
-- Remove comments
text = gsub(text, "%<%!%-%-.-%-%-%>", "")
-- Remove invocations of whole page transliteration templates
text = gsub(text, "%{%{Wp/nod/translit%}%}", "")
text = gsub(text, "%{%{Wp/nod/xlit[23]%}%}", "")
-- Remove text that is not to be repeated.
text = gsub(text, "%{%{Wp/nod/ᩀ᩵ᩣᨪ᩶ᩣᩴ%}%}.-%{%{Wp/nod/ᨿᩬᨾᨪ᩶ᩣᩴ%}%}", "")
-- Insert transliteration parameter in plausible looking templates. More work may be
-- necessary to make this robust. Only allow templates whose names begin with a letter,
-- and disallow names beginning with two capitals. (Latter test is deferred to recur().)
debug_extra_parameter = false
if debug_extra_parameter then debug = debug..text end
if phonetic then
trans_mode = 'ᩈ'
else
trans_mode = 'ᨠ'
end
text = gsub(text, "([^%{]%{%{%a[^%{]-)([%}%|])(.-%})", recur)
text = gsub(text, "(%{%{%{ᩅᨳ)([^}]*)(%}%}%})", pparam)
if debug_extra_parameter then debug = debug.." Expanded to:"..text end
-- Hide Wiki-specific markup
-- text = gsub(text, "<ref.-</ref>", "")
-- text = gsub(text, "<references/?>", "")
-- Calculation of frame filched from https://commons.wikimedia.org/wiki/Module:NationAndOccupation/sandbox
local frame=mw.getCurrentFrame()
text = frame:preprocess(text)
-- Protect links
text = gsub(text, "(%[%[)([^%]|]+)([%]|])", plink)
text = frame:preprocess(text) -- Expand Template:ᩀ᩵ᩣᨳᩬᨯ-ᨠ -- TO DO: Move task to plink.
debug = ''
text = export.tr(text, phonetic)
return text -- ..debug
end
function export.lettername(letter, way)
if type(letter) == "table" then -- called directly from a template
way = letter.args["ᩅᨳ"]
letter = letter.args[1]
end
local odd = data.oddname[letter]
local prefix = nil
if mw.ustring.find("ᨭᨮᨰᨱ", letter) then
prefix = 'ระ'
else
prefix = ''
end
if odd then
return odd
elseif 'ᩈ' == way -- planned for transcription)(transliteration
or 'ᨠ' == way
then
local class = data.class[letter]
local newlet = data.tt3[letter]
local tonerule = {["L"]=sc("ก"), ["F"]=sc("ก๊"), ["M"]=sc("ก๋"),
["R"]=sc("ก๋"), ["H"]=sc("ก๋"), }
local tone = tonerule[class]
if class and tone and newlet then
return prefix..newlet..tone.."ะ"
end
elseif 'ᨠ' == way then
local class = data.class[letter]
local newlet = data.tt2[letter]
if class and newlet then
return prefix..newlet.."ะ"
end
else
return letter
end
return "{{Wp/nod/huge|{{Wp/nod/font color|red|white|{{Wp/nod/ᩀ᩵ᩣᨳᩬᨯ|"..letter..
"}} ᨷ᩵ᨸᩮ᩠ᨶᩋᨠ᩠ᨡᩁ}}}}"
end
function export.hardword(word, way)
local frame, advice = nil, nil
if type(word) == "table" then
local frame = word
word=frame.args[1]
way=frame.args["ᩅᨳ"]
advice = frame.args[way] or frame.args["ᨠ"]
end
if word == nil then
return "" --.."x1"
end -- An error message might be appropriate
if way == nil then
return word --.."x2"
end -- Lazy, pointless invocation.
if advice then
return advice --.."x3"
end
local wordin = word
word = gsub(word, u(0x200B), "") -- Strip ZWSP
entry = data.hard[word]
if not entry then
return wordin --.."x4"
end -- Nothing useful achieved
if way == "ᨠ" then -- Transliteration of Northern Thai
-- This is the fallback assumption
elseif way == "ᩈ" then -- Transcription of Northern Thai (i.e. Siamese sound values)
if entry[2] then
return entry[2] --.."x5"
end
end
if entry[1] then
return entry[1] --.."x6"
else
return wordin --.."x7" -- Nothing useful achieved. Consider logging.
end
end
function export.trphage(page, text)
if type(page) == "table" then -- called directly from a template
page = page.args[1] or PAGENAME
end
return export.trpage(page, true, text)
end
return export