Lompat ke isi

Modul:msplitter

Dari Wikikamus bahasa Indonesia, kamus bebas

Dokumentasi untuk modul ini dapat dibuat di Modul:msplitter/doc

--[===[

MODULE "MSPLITTER"

"eo.wiktionary.org/wiki/Modulo:msplitter" <!--2024-Oct-09-->
"id.wiktionary.org/wiki/Modul:msplitter"

Purpose: submodule for "mlawc"

Utilo: submodulo por "mlawc"

Manfaat: submodul untuk "mlawc"

Syfte: submodul foer "mlawc"

Used by templates / Uzata far sxablonoj /
Digunakan oleh templat / Anvaent av mallar:
* none (this module cannot be called from a template)

Required submodules / Bezonataj submoduloj /
Submodul yang diperlukan / Behoevda submoduler:
* none

Incoming: * single table with following content (everything must be
            prevalidated by the caller):
            *  0 (boo) -- desirability of compound cat:s
            *  1 (str) -- pagename AKA input lemma (may NOT be empty)
            *  2 (num) -- split strategy (0...5 or 7)
            *  3 (tab) -- fragments from "%"-syntax assi
            *  4 (tab) -- fragments from "#"-syntax assi
            *  5 (tab) -- fragments for manual split
            *  6 (tab) -- fragments from extra parameter
            *  7 (boo) -- true if extra parameter was used
            *  8 (tab) -- lng stuff with double-letter indexes
            *  9 (boo) -- NR word class
            * 10 (boo) -- KA word class
            * 15 (boo) -- detrc

- desirability of compound cat:s -- index 0 (we split even if
                                             false but no cat:s then)
- lemma (may NOT be empty) -- index 1
- split control parameter -- index 2 3 4 5
- extra parameter -- index 6 7
- language stuff (code and some variants of language name) -- index 8
- word class (reduced to 2 questions) -- index 9 10

Returned: * single table with following content:
            *  0...17 (str) category names
            * 20...37 (nil or boo) main page flags
            * 50      (str) output lemma wikitext or "//" on error
            * 51      (str) debug "qstrtrace"
            * 52      (num) status code (ZERO is OK)

The split strategies available are:
- #S0 automatic multiword split
- #S1 assisted split
- #S2 manual split
- #S3 simple root split
- #S4 simple bare root
- #S5 large letter split
- #S6 reserved
- #S7 no split (splitter still may be called and extra parameter is processed)

List of 6+1+1+1 selectable morpheme types:

C  circumfix           cirkumfikso
I  infix               infikso (EO: -o- -et- -il- ...)
M  standalone root     memstara radiko (EO: tri dek post ...)
N  nonstandalone root  nememstara radiko (EO: fer voj ...)
P  prefix              prefikso
U  suffix              sufikso (postfikso, finajxo, EO: -a -j -n)
-------
W  word                vorto
-------
L  same as "N" but changes linking behavior (only in F210)
-------
X  only after "&" in the extra parameter (caller converts it for us)

These mortyp:s can be used in the split control parameter before colon ":"
with manual split, and in the extra parameter, but then "L" is prohibited
(thus C I M N P U W are left plus maybe X), either after "&", or in fragments
before ":" or "!" (see "spec-splitter-en.txt" for syntax details).

We put only the letter symbol into the category name (except for the type
word) as it otherwise would become unreasonably long. It must contain
3 pieces of information:
- language (consider "-an" in SV and ID)
- mortyp (consider "-an" and "an-" and "an" in SV)
- the morpheme / affix / word itself

It is possible to deactivate (semi-hardcoded configuration in the source
code of "mlawc") only compound categories, or the splitter resulting in the
raw lemma showed without link, or deactivate showing the lemma altogether,
in both latter cases the splitter is inactive and this module is not called
at all.

The automatic splitter ("numsplyt" = 0 and "lfhsplitaa") is fully
automatic and the 2 tables at index 3 and 4 must be empty then.
No error can occur here, but there is risk for a failure that no split
boundaries can be applied, and the output is identical to the input.

The assisted splitter ("numsplyt" = 1 and "lfhsplitaa") is
controlled by 2 prevalidated tables.
* Table contains up to 16 values indexed by integers 0 to 15,
  value type string "1" means do block, type "nil" means do not
  block (the default). Other values should not occur and evaluate to
  do not block like "nil" does.
* Table contains up to 16 values indexed by integers 0 to 15, value:
  * type string:
    * "N" or "I" or "A" (as described in "spec-splitter-en.txt")
    * colon ":" followed by the link target (length 1...40 octet:s NOT
      checked anymore here)
    Beginning char other than "N" or "I" or "A" or ":" should not
    occur and evaluates to do nothing unusual like "nil" does.
  * type "nil" means do nothing unusual (the default)
No error can occur in the assisted splitter, but there is risk
for a failure that no split boundaries can be applied, and the output is
identical to the input.

The manual splitter ("numsplyt" = 2 and "lfhsplitmn") is controlled by one
prevalidated table, the pagename does not even enter the split process,
but a bool revealing whether it contains at least one space does.
* Table contains 1 to 16 strings indexed by integers 0 to 15,
  one string for every fragment. The 5 legal types are:
  * F000 : no brackets, no colon, no slash (visible text no link)
  * F200 : 2 brackets, no colon, no slash (combo target visible text)
  * F201 : 2 brackets, no colon, 1 slash (target / visible text)
  * F210 : 2 brackets, 1 colon, no slash (mortyp : combo target visible text)
  * F211 : 2 brackets, 1 colon, 1 slash (mortyp : target / visible text)
No error can occur in the manual splitter and no failure due to
lack of boundaries either, the "sum check" is part of the prevalidation.
Note that we use slashes and single rectangular brackets "+[I:bug/BUG]"
instead of wikisyntax "[[bug|BUG]]", beware that "[bug|BUG]" would NOT work.

]===]

local exporttable = {}

------------------------------------------------------------------------

---- CONSTANTS [O] ----

------------------------------------------------------------------------

  -- uncommentable EO vs ID constant table (categories)

  -- syntax of insertion and discarding magic string:
  -- "@" followed by 2 uppercase letters and 2 hex numbers
  -- otherwise the hit is not processed, but copied as-is instead
  -- 2 letters select the insertable item from table supplied by the caller
  -- 2 hex numbers control discarding left and right (0...15 char:s)

  -- empty item is legal and results in discarding if some number is non-ZERO

  -- if uppercasing or other adjustment is needed then the caller must take
  -- care of it in the form of 2 or more separate items provided in the table

  -- insertable items defined:
  -- constant:
  -- * LK lng code (unknown "??" legal but take care elsewhere)
  -- * LN lng name (unknown legal, for example "dana" or "Ido")
  -- * LU lng name uppercased (unknown legal, for example "Dana" or "Ido")
  -- * LO lng name not own (empty or nil if own)
  -- * LV lng name uppercased not own (empty or nil if own)
  -- * LY lng name long (for example "bahasa Swedia")
  -- * LZ lng name long not own (empty or nil if own)
  -- * SC script code (for example "T", "S", "P" for ZH, "C" "L" for SH)
  -- variable (we can have 2 word classes):
  -- * WC word class name (for example "substantivo")
  -- * WU word class name uppercased (for example "Substantivo")
  -- * MT mortyp code (for example "C")
  -- * FR fragment (for example "peN-...-an" or "abelujo")

  -- see "lfiultiminsert" and "tablngdbl" use space here and avoid "_"
  -- note the malicious false friendship between EO:frazo kaj ID:frasa

  local contabktaoj = {}
      -- contabktaoj[3] = 'Vortgrupo -@LK00- enhavanta (@FR00) @SC10'             -- EO only if ("boocatdesir" is true) can be many
        contabktaoj[3] = 'Frasa @LZ10 mengandung kata @FR00 @SC10'             -- ID only if ("boocatdesir" is true) can be many
      -- contabktaoj[4] = 'Frazo -@LK00- enhavanta vorton (@FR00) @SC10'          -- EO only if ("boocatdesir" is true) can be many
        contabktaoj[4] = 'Kalimat @LK00 mengandung kata (@FR00) @SC10'         -- ID only if ("boocatdesir" is true) can be many
      -- contabktaoj[5] = 'Vorto -@LK00- enhavanta morfemon @MT00 (@FR00) @SC10'  -- EO only if ("boocatdesir" is true) can be many
        contabktaoj[5] = 'Kata @LK00 mengandung morfem @MT00 (@FR00) @SC10'    -- ID only if ("boocatdesir" is true) can be many

------------------------------------------------------------------------

---- SPECIAL STUFF OUTSIDE MAIN [B] ----

------------------------------------------------------------------------

  -- SPECIAL VAR:S

local qboodetrc = true
local qstrtrace = '<br>' -- for main & sub:s, debug report sent to caller
local qtabktaoj = {}     -- global for compound categories [0]...[52] and ret

------------------------------------------------------------------------

---- DEBUG FUNCTIONS [D] ----

------------------------------------------------------------------------

-- Local function LFDTRACEMSG

-- Enhance upvalue "qstrtrace" with fixed text.

-- for variables the other sub "lfdshowvar" is preferable but in exceptional
-- cases it can be justified to send text with values of variables to this sub

-- no size limit

-- upvalue "qstrtrace" must NOT be type "nil" on entry (is inited to "<br>")

-- uses upvalue "qboodetrc"

local function lfdtracemsg (strshortline)
  if (qboodetrc and (type(strshortline)=='string')) then
    qstrtrace = qstrtrace .. strshortline .. '.<br>' -- dot added !!!
  end--if
end--function lfdtracemsg

------------------------------------------------------------------------

---- MATH FUNCTIONS [E] ----

------------------------------------------------------------------------

local function mathdiv (xdividens, xdivisero)
  local resultdiv = 0 -- DIV operator lacks in LUA :-(
  resultdiv = math.floor (xdividens / xdivisero)
  return resultdiv
end--function mathdiv

local function mathmod (xdividendo, xdivisoro)
  local resultmod = 0 -- MOD operator is "%" and bitwise AND operator lack too
  resultmod = xdividendo % xdivisoro
  return resultmod
end--function mathmod

------------------------------------------------------------------------

-- Local function MATHBITTEST

-- Find out whether single bit selected by ZERO-based index is "1" / "true".

-- Result has type "boolean".

-- Depends on functions :
-- [E] mathdiv mathmod

local function mathbittest (numincoming, numbitindex)
  local boores = false
  while true do
    if ((numbitindex==0) or (numincoming==0)) then
      break -- we have either reached our bit or run out of bits
    end--if
    numincoming = mathdiv(numincoming,2) -- shift right
    numbitindex = numbitindex - 1 -- count down to ZERO
  end--while
  boores = (mathmod(numincoming,2)==1) -- pick bit
  return boores
end--function mathbittest

------------------------------------------------------------------------

---- LOW LEVEL STRING FUNCTIONS [G] ----

------------------------------------------------------------------------

-- Local function LFGPOKESTRING

-- Replace single octet in a string.

-- Input  : * strinpokeout -- empty legal
--          * numpokepoz   -- ZERO-based, out of range legal
--          * numpokeval   -- new value

-- This is inefficient by design of LUA. The caller is responsible to
-- minimize the number of invocations of this, in particular, not to
-- call if the new value is equal the existing one.

local function lfgpokestring (strinpokeout, numpokepoz, numpokeval)
  local numpokelen = 0
  numpokelen = string.len(strinpokeout)
  if ((numpokelen==1) and (numpokepoz==0)) then
    strinpokeout = string.char(numpokeval) -- totally replace
  end--if
  if (numpokelen>=2) then
    if (numpokepoz==0) then
      strinpokeout = string.char(numpokeval) .. string.sub (strinpokeout,2,numpokelen)
    end--if
    if ((numpokepoz>0) and (numpokepoz<(numpokelen-1))) then
      strinpokeout = string.sub (strinpokeout,1,numpokepoz) .. string.char(numpokeval) .. string.sub (strinpokeout,(numpokepoz+2),numpokelen)
    end--if
    if (numpokepoz==(numpokelen-1)) then
      strinpokeout = string.sub (strinpokeout,1,(numpokelen-1)) .. string.char(numpokeval)
    end--if
  end--if (numpokelen>=2) then
  return strinpokeout
end--function lfgpokestring

------------------------------------------------------------------------

local function lfgtestuc (numkode)
  local booupperc = false
  booupperc = ((numkode>=65) and (numkode<=90))
  return booupperc
end--function lfgtestuc

local function lfgtestlc (numcode)
  local boolowerc = false
  boolowerc = ((numcode>=97) and (numcode<=122))
  return boolowerc
end--function lfgtestlc

------------------------------------------------------------------------

-- Local function LFGTESTPUNCTURE

-- Test whether char is an ASCII punctuation sign, return type "boolean".

-- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63
-- dash "-" and apo "'" do NOT count as punctuation
-- here we do NOT include SPACE in the list

local function lfgtestpuncture (numcorde)
  local boopunk = false
  boopunk = ((numcorde==33) or (numcorde==44) or (numcorde==46) or (numcorde==59) or (numcorde==63))
  return boopunk
end--function lfgtestpuncture

------------------------------------------------------------------------

-- Local function LFIADDTHEDASH

local function lfiaddthedash (strafikso, booaddleft, booaddright)
  local numdashlength = 0
  local numbuggar = 0
  numdashlength = string.len (strafikso)
  if (numdashlength~=0) then
    numbuggar = string.byte (strafikso,1,1)
    if (numbuggar==45) then
      booaddleft = false -- avoid "--"...
    end--if
    numbuggar = string.byte (strafikso,numdashlength,numdashlength)
    if (numbuggar==45) then
      booaddright = false -- avoid ..."--"
    end--if
    if (booaddleft) then
      strafikso = "-" .. strafikso
    end--if
    if (booaddright) then
      strafikso = strafikso .. "-"
    end--if
  end--if
  return strafikso
end--function lfiaddthedash

------------------------------------------------------------------------

-- Local function LFIDEBRACKET

-- Separate bracketed part of a string and return the inner or outer
-- part. On failure the string is returned complete and unchanged.
-- There must be exactly ONE "(" and exactly ONE ")" in correct order.

-- Input  : * strde31br, boooutside
--          * numxminlencz -- minimal length of inner part, must be >= 1 !!!

-- Note that for length of hit ZERO ie "()" we have "begg" + 1 = "endd"
-- and for length of hit ONE ie "(x)" we have "begg" + 2 = "endd".

-- Example: "crap (NO)" -> len = 9
--           123456789
-- "begg" = 6 and "endd" = 9
-- Expected result: "NO" or "crap " (note the trailing space)

-- Example: "(XX) YES" -> len = 8
--           12345678
-- "begg" = 1 and "endd" = 4
-- Expected result: "XX" or " YES" (note the leading space)

local function lfidebracket (strde31br, boooutside, numxminlencz)

  local numindoux = 1 -- ONE-based
  local numdlong = 0
  local num31wesel = 0
  local numbegg = 0 -- ONE-based, ZERO invalid
  local numendd = 0 -- ONE-based, ZERO invalid

  numdlong = string.len (strde31br)
  while true do
    if (numindoux>numdlong) then
      break -- ONE-based -- if both "numbegg" "numendd" non-ZERO then maybe
    end--if
    num31wesel = string.byte(strde31br,numindoux,numindoux)
    if (num31wesel==40) then -- "("
      if (numbegg==0) then
        numbegg = numindoux -- pos of "("
      else
        numbegg = 0
        break -- damn: more than 1 "(" present
      end--if
    end--if
    if (num31wesel==41) then -- ")"
      if ((numendd==0) and (numbegg~=0) and ((numbegg+numxminlencz)<numindoux)) then
        numendd = numindoux -- pos of ")"
      else
        numendd = 0
        break -- damn: more than 1 ")" present or ")" precedes "("
      end--if
    end--if
    numindoux = numindoux + 1
  end--while

  if ((numbegg~=0) and (numendd~=0)) then
    if (boooutside) then
      strde31br = string.sub(strde31br,1,(numbegg-1)) .. string.sub(strde31br,(numendd+1),numdlong)
    else
      strde31br = string.sub(strde31br,(numbegg+1),(numendd-1)) -- separate substring
    end--if
  end--if

  return strde31br -- same string variable

end--function lfidebracket

------------------------------------------------------------------------

-- Local function LFIREMOVE2BRA

local function lfiremove2bra (strinmedparenteser)
  local stroututanparenteser = ''
  local numindozux = 1 -- ONE-based
  local numparepanjang = 0
  local numparechar = 0
  numparepanjang = string.len (strinmedparenteser)
  while true do
    if (numindozux>numparepanjang) then
      break
    end--if
    numparechar = string.byte(strinmedparenteser,numindozux,numindozux)
    if ((numparechar~=40) and (numparechar~=41)) then
      stroututanparenteser = stroututanparenteser .. string.char(numparechar)
    end--if
    numindozux = numindozux + 1
  end--while
  return stroututanparenteser
end--function lfiremove2bra

------------------------------------------------------------------------

---- NUMBER CONVERSION FUNCTIONS [N] ----

------------------------------------------------------------------------

-- Local function LFNONEHEXTOINT

-- Convert single quasi-digit (ASCII HEX "0"..."9" "A"..."F")
-- to integer (0...15, 255 invalid).

-- Only uppercase accepted.

local function lfnonehextoint (numdigit)
  local numresult = 255
  if ((numdigit>=48) and (numdigit<=57)) then -- "0"..."9"
    numresult = numdigit-48
  end--if
  if ((numdigit>=65) and (numdigit<=70)) then -- "A"..."F"
    numresult = numdigit-55
  end--if
  return numresult
end--function lfnonehextoint

------------------------------------------------------------------------

---- UTF8 FUNCTIONS [U] ----

------------------------------------------------------------------------

-- Local function LFULNUTF8CHAR

-- Evaluate length of a single UTF8 char in octet:s.

-- Input  : * numbgoctet  -- beginning octet of a UTF8 char

-- Output : * numlen1234x -- unit octet, number 1...4, or ZERO if invalid

-- Does NOT thoroughly check the validity, looks at ONE octet only.

local function lfulnutf8char (numbgoctet)
  local numlen1234x = 0
    if (numbgoctet<128) then
      numlen1234x = 1 -- $00...$7F -- ANSI/ASCII
    end--if
    if ((numbgoctet>=194) and (numbgoctet<=223)) then
      numlen1234x = 2 -- $C2 to $DF
    end--if
    if ((numbgoctet>=224) and (numbgoctet<=239)) then
      numlen1234x = 3 -- $E0 to $EF
    end--if
    if ((numbgoctet>=240) and (numbgoctet<=244)) then
      numlen1234x = 4 -- $F0 to $F4
    end--if
  return numlen1234x
end--function lfulnutf8char

------------------------------------------------------------------------

-- Local function LFUCASEGENE

-- Adjust (generous) case of a single letter (from ASCII + limited extra
-- set from UTF8 with some common ranges) or longer string. (this is GENE)

-- Input  : * strinco7cs : single unicode letter (1 or 2 octet:s) or
--                         longer string
--          * booup7cas  : for desired output uppercase "true" and for
--                         lowercase "false"
--          * boodo7all  : "true" to adjust all letters, "false"
--                         only beginning letter

-- Output : * strinco7cs

-- Depends on functions : (this is GENE)
-- [U] lfulnutf8char
-- [G] lfgpokestring lfgtestuc lfgtestlc
-- [E] mathdiv mathmod mathbittest

-- This process never changes the length of a string in octet:s. Empty string
-- on input is legal and results in an empty string returned. When case is
-- adjusted, a 1-octet or 2-octet letter is replaced by another letter of same
-- length. Unknown valid char:s (1-octet ... 4-octet) are copied. Broken UTF8
-- stream results in remaining part of the output string (from 1 char to
-- complete length of the incoming string) filled by "Z".

-- * lowercase is usually above uppercase, but not always, letters can be
--   only misaligned (UC even vs UC odd), and rarely even swapped (French "Y")
-- * case delta can be 1 or $20 or $50 other
-- * case pair distance can span $40-boundary or even $0100-boundary
-- * in the ASCII range lowercase is $20 above uppercase, b5 reveals
--   the case (1 is lower)
-- * the same is valid in $C3-block
-- * this is NOT valid in $C4-$C5-block, lowercase is usually 1 above
--   uppercase, but nothing reveals the case reliably

-- ## $C2-block $0080 $C2,$80 ... $00BF $C2,$BF no letters (OTOH NBSP mm)

-- ## $C3-block $00C0 $C3,$80 ... $00FF $C3,$BF (SV mm) delta $20 UC-LC-UC-LC
-- upper $00C0 $C3,$80 ... $00DF $C3,$9F
-- lower $00E0 $C3,$A0 ... $00FF $C3,$BF
-- AA AE EE NN OE UE mm
-- $D7 $DF $F7 excluded (not letters)
-- $FF excluded (here LC, UC is $0178)

-- ## $C4-$C5-block $0100 $C4,$80 ... $017F $C5,$BF (EO mm)
-- delta 1 and UC even, but messy with many exceptions
-- EO $0108 ... $016D case delta 1
-- for example SX upper $015C $C5,$9C -- lower $015D $C5,$9D
-- $0138 $0149 $017F excluded (not letters)
-- $0178 excluded (here UC, LC is $FF)
-- $0100 ... $0137 UC even
-- $0139 ... $0148 misaligned (UC odd) note that case delta is NOT reversed
-- $014A ... $0177 UC even again
-- $0179 ... $017E misaligned (UC odd) note that case delta is NOT reversed

-- ## $CC-$CF-block $0300 $CC,$80 ... $03FF $CF,$BF (EL mm) delta $20
-- EL $0370 ... $03FF (officially)
-- strict EL base range $0391 ... $03C9 case delta $20
-- $0391 $CE,$91 ... $03AB $CE,$AB upper
-- $03B1 $CE,$B1 ... $03CB $CD,$8B lower
-- for example "omega" upper $03A9 $CE,$A9 -- lower $03C9 $CF,$89

-- ## $D0-$D3-block $0400 $D0,$80 ... $04FF $D3,$BF (RU mm)
-- * delta $20 $50 1
-- * strict RU base range $0410 ... $044F case delta $20 but there
--   is 1 extra char outside !!!
--   * $0410 $D0,$90 ... $042F $D0,$AF upper
--   * $0430 $D0,$B0 ... $044F $D1,$8F lower
--   * for example "CCCP-gamma" upper $0413 $D0,$93 -- lower $0433 $D0,$B3
-- * extra base char and exception is special "E" with horizontal doubledot
--   case delta $50 (upper $0401 $D0,$81 -- lower $0451 $D1,$91)
-- * same applies for ranges $0400 $D0,$80 ... $040F $D0,$8F upper
--   and $0450 $D1,$90 ... $045F $D1,$9F lower
-- * range $0460 $D1,$A0 ... $04FF $D3,$BF (ancient RU, UK, RUE, ...) case
--   delta 1 and UC usually even, but messy with many exceptions $048x
--   $04Cx (combining decorations and misaligned)

-- Variables "numdel7abs" and "numdel7ta" must be at least 16-bit to avoid
-- misevaluation or wrong wrapping when fitting into the range 128...191,
-- even if no deltas exceeding +-127 are supported (there are very few pairs
-- of char:s exceeding this). Also both can be declared unsigned since only
-- addition and subtraction are performed on them.

-- We peek max 2 values per iteration, and change the string in-place, doing
-- so strictly only if there indeed is a change. This is important for LUA
-- where the in-place write access must be emulated by means of a less
-- efficient function.

local function lfucasegene (strinco7cs, booup7cas, boodo7all)

  local numlong7den = 0 -- actual length of input string
  local numokt7index = 0
  local numlong7bor = 0 -- expected length of single char

  local numdel7abs = 0 -- at least 16-bit, absolute posi delta
  local numdel7ta = 0 -- quasi-signed at least 16-bit, can be negative

  local numdel7car = 0 -- quasi-signed 8-bit, can be negative

  local numcha7r = 0 -- UINT8 beginning char
  local numcha7s = 0 -- UINT8 later char (BIG ENDIAN, lower value here above)
  local numcxa7rel = 0 -- UINT8 code relative to beginning of block $00...$FF

  local boowan7tlowr = false
  local boois7uppr = false
  local boois7lowr = false

  local boomy7bit0x = false -- single relevant bits picked -- b0
  local boomy7bit5x = false -- single relevant bits picked -- b5
  local boopen7din = false -- only fake loop

  local boodo7adj = true -- preASSume innocence -- continue changing
  local boobotch7d = false -- preASSume innocence -- NOT yet botched

  local booc3block = false -- $C3 only $00C0...$00FF SV mm delta 32
  local booc4c5blk = false -- $C4 $C5  $0100...$017F EO mm delta 1
  local boocccfblk = false -- $CC $CF  $0300...$03FF EL mm delta 32
  local bood0d3blk = false -- $D0 $D3  $0400...$04FF RU mm delta 32 80

  booup7cas = not (not booup7cas)
  boowan7tlowr = (not booup7cas)
  numlong7den = string.len (strinco7cs)

  while true do -- genuine loop over incoming string (this is GENE)

    if (numokt7index>=numlong7den) then
      break -- done complete string
    end--if
    if ((not boodo7all) and (numokt7index~=0)) then -- loop can skip index ONE
      boodo7adj = false
    end--if
    boois7uppr  = false -- preASSume on every iteration
    boois7lowr  = false -- preASSume on every iteration
    numdel7ta   = 0 -- preASSume on every iteration
    numlong7bor = 1 -- preASSume on every iteration

    while true do -- fake loop (this is GENE)

      numcha7r = string.byte (strinco7cs,(numokt7index+1),(numokt7index+1))
      if (boobotch7d) then
        numdel7ta = 90 - numcha7r -- "Z" -- delta must be non-ZERO to write
        break -- fill with "Z" char:s
      end--if
      if (not boodo7adj) then
        break -- copy octet after octet
      end--if
      numlong7bor = lfulnutf8char(numcha7r)
      if ((numlong7bor==0) or ((numokt7index+numlong7bor)>numlong7den)) then
        numlong7bor = 1 -- reassign to ONE !!!
        numdel7ta = 90 - numcha7r -- "Z" -- delta must be non-ZERO to write
        boobotch7d = true
        break -- truncated char or broken stream
      end--if
      if (numlong7bor>=3) then
        break -- copy UTF8 char, no chance for adjustment
      end--if

      if (numlong7bor==1) then
        boois7uppr = lfgtestuc(numcha7r)
        boois7lowr = lfgtestlc(numcha7r)
        if (boois7uppr and boowan7tlowr) then
          numdel7ta = 32 -- ASCII UPPER->lower
        end--if
        if (boois7lowr and booup7cas) then
          numdel7ta = -32 -- ASCII lower->UPPER
        end--if
        break -- success with ASCII and one char almost done
      end--if

      booc3block = (numcha7r==195) -- case delta is 32
      booc4c5blk = ((numcha7r==196) or (numcha7r==197)) -- case delta is 1
      boocccfblk = ((numcha7r>=204) and (numcha7r<=207)) -- case delta is 32
      bood0d3blk = ((numcha7r>=208) and (numcha7r<=211)) -- case delta is 32 80 1

      numcha7s = string.byte (strinco7cs,(numokt7index+2),(numokt7index+2)) -- only $80 to $BF
      numcxa7rel = (mathmod(numcha7r,4)*64) + (numcha7s-128) -- 4 times 64
      boomy7bit0x = ((mathmod(numcxa7rel,2))==1)
      boomy7bit5x = mathbittest(numcxa7rel,5)

    if (booc3block) then
      boopen7din = true -- pending flag
      if ((numcxa7rel==215) or (numcxa7rel==223) or (numcxa7rel==247)) then
        boopen7din = false -- not a letter, we are done
      end--if
      if (numcxa7rel==255) then
        boopen7din = false -- special LC silly "Y" with horizontal doubledot
        if (booup7cas) then
          numdel7ta = 121 -- lower->UPPER (distant and reversed order)
        end--if
      end--if
      if (boopen7din) then
        boois7lowr = boomy7bit5x -- mostly regular block, look at b5
        boois7uppr = not boois7lowr
        if (boois7uppr and boowan7tlowr) then
          numdel7ta = 32 -- UPPER->lower
        end--if
        if (boois7lowr and booup7cas) then
          numdel7ta = -32 -- lower->UPPER
        end--if
      end--if (boopen7din) then
      break -- to join mark
    end--if (booc3block) then

    if (booc4c5blk) then
      boopen7din = true -- pending flag
      if ((numcxa7rel==56) or (numcxa7rel==73) or (numcxa7rel==127)) then
        boopen7din = false -- not a letter, we are done
      end--if
      if (numcxa7rel==120) then
        boopen7din = false -- special UC silly "Y" with horizontal doubledot
        if (boowan7tlowr) then
          numdel7ta = -121 -- UPPER->lower (distant and reversed order)
        end--if
      end--if
      if (boopen7din) then
        if (((numcxa7rel>=57) and (numcxa7rel<=73)) or (numcxa7rel>=121)) then
          boois7lowr = not boomy7bit0x -- UC odd (misaligned)
        else
          boois7lowr = boomy7bit0x -- UC even (ordinary align)
        end--if
        boois7uppr = not boois7lowr
        if (boois7uppr and boowan7tlowr) then
          numdel7ta = 1 -- UPPER->lower
        end--if
        if (boois7lowr and booup7cas) then
          numdel7ta = -1 -- lower->UPPER
        end--if
      end--if (boopen7din) then
      break -- to join mark
    end--if (booc4c5blk) then

    if (boocccfblk) then
      boois7uppr = ((numcxa7rel>=145) and (numcxa7rel<=171))
      boois7lowr = ((numcxa7rel>=177) and (numcxa7rel<=203))
      if (boois7uppr and boowan7tlowr) then
        numdel7ta = 32 -- UPPER->lower
      end--if
      if (boois7lowr and booup7cas) then
        numdel7ta = -32 -- lower->UPPER
      end--if
      break -- to join mark
    end--if (boocccfblk) then

    if (bood0d3blk) then
      if (numcxa7rel<=95) then -- messy layout but no exceptions
        boois7lowr = (numcxa7rel>=48) -- delta $20 or $50
        boois7uppr = not boois7lowr
        numdel7abs = 32 -- $20
        if ((numcxa7rel<=15) or (numcxa7rel>=80)) then
          numdel7abs = 80 -- $50
        end--if
      end--if
      if ((numcxa7rel>=96) and (numcxa7rel<=129)) then -- no exceptions here
        boois7lowr = boomy7bit0x -- UC even (ordinary align)
        boois7uppr = not boois7lowr
        numdel7abs = 1
      end--if
      if (numcxa7rel>=138) then -- some misaligns here  !!!FIXME!!!
        boois7lowr = boomy7bit0x -- UC even (ordinary align)
        boois7uppr = not boois7lowr
        numdel7abs = 1
      end--if
      if (boois7uppr and boowan7tlowr) then
        numdel7ta = numdel7abs -- UPPER->lower
      end--if
      if (boois7lowr and booup7cas) then
        numdel7ta = -numdel7abs -- lower->UPPER
      end--if
      break -- to join mark
    end--if (bood0d3blk) then

      break -- finally to join mark -- unknown non-ASCII char is a fact :-(
    end--while -- fake loop -- join mark (this is GENE)

    if ((numlong7bor==1) and (numdel7ta~=0)) then -- no risk of carry here
      strinco7cs = lfgpokestring (strinco7cs,numokt7index,(numcha7r+numdel7ta))
    end--if
    if ((numlong7bor==2) and (numdel7ta~=0)) then -- no risk of carry here
      numdel7car = 0
      while true do -- inner genuine loop
        if ((numcha7s+numdel7ta)<192) then
          break
        end--if
        numdel7ta = numdel7ta - 64 -- get it down into range 128...191
        numdel7car = numdel7car + 1 -- BIG ENDIAN 6 bits with carry
      end--while
      while true do -- inner genuine loop
        if ((numcha7s+numdel7ta)>127) then
          break
        end--if
        numdel7ta = numdel7ta + 64 -- get it up into range 128...191
        numdel7car = numdel7car - 1 -- BIG ENDIAN 6 bits with carry
      end--while
      if (numdel7car~=0) then -- in-place change only if needed
        strinco7cs = lfgpokestring (strinco7cs,numokt7index,(numcha7r+numdel7car))
      end--if
      if (numdel7ta~=0) then -- in-place change only if needed
        strinco7cs = lfgpokestring (strinco7cs,(numokt7index+1),(numcha7s+numdel7ta))
      end--if
    end--if
    numokt7index = numokt7index + numlong7bor -- advance in incoming string

  end--while -- genuine loop over incoming string (this is GENE)

  return strinco7cs

end--function lfucasegene

------------------------------------------------------------------------

---- HIGH LEVEL STRING FUNCTIONS [I] ----

------------------------------------------------------------------------

-- Local function LFIULTIMINSERT

-- Insert selected substitute strings into request string at given positions
-- with optional discarding if the substitute string is empty. Discarding
-- is protected from access out of range by clamping the distances.

-- Input  : * strrekvest -- request string containing placeholders
--                          (syntax see below)
--          * tabsubstut -- list with substitute strings using two-letter
--                          codes as keys, non-string in the table is safe and
--                          has same effect as empty string, still type "nil"
--                          or empty string "" are preferred

-- Output : * strhazil

-- Syntax of the placeholder:
-- * "@" followed by 2 uppercase letters and 2 hex numbers, otherwise
--   the hit is not processed, but copied as-is instead
--   * 2 letters select the substitute from table supplied by the caller
--   * 2 hex numbers control discarding left and right (0...15 char:s)

-- Empty item in "tabsubstut" is legal and results in discarding if some of
-- the control numbers is non-ZERO. Left discarding is practically performed
-- on "strhazil" whereas right discarding on "strrekvest" and "numdatainx".

-- If uppercasing or other adjustment is needed, then the caller must
-- take care of it by providing several separate substitute strings with
-- separate names in the table.

-- Depends on functions :
-- [G] lfgtestnum lfgtestuc
-- [N] lfnonehextoint

local function lfiultiminsert (strrekvest,tabsubstut)

  local varduahuruf = 0
  local strhazil = ''
  local numdatalen = 0
  local numdatainx = 0 -- src index
  local numdataoct = 0 -- maybe @
  local numdataodt = 0 -- UC
  local numdataoet = 0 -- UC
  local numammlef = 0 -- hex and discard left
  local numammrig = 0 -- hex and discard right
  local boogotplejs = false

  numdatalen = string.len(strrekvest)
  numdatainx = 1 -- ONE-based

  while true do -- genuine loop, "numdatainx" is the counter
    if (numdatainx>numdatalen) then -- beware of risk of overflow below
      break -- done (ZERO iterations possible)
    end--if
    boogotplejs = false
    numdataoct = string.byte(strrekvest,numdatainx,numdatainx)
    numdatainx = numdatainx + 1
    while true do -- fake loop
      if ((numdataoct~=64) or ((numdatainx+3)>numdatalen)) then
        break -- no hit here
      end--if
      numdataodt = string.byte(strrekvest, numdatainx   , numdatainx   )
      numdataoet = string.byte(strrekvest,(numdatainx+1),(numdatainx+1))
      if ((not lfgtestuc(numdataodt)) or (not lfgtestuc(numdataoet))) then
        break -- no hit here
      end--if
      numammlef = string.byte(strrekvest,(numdatainx+2),(numdatainx+2))
      numammrig = string.byte(strrekvest,(numdatainx+3),(numdatainx+3))
      numammlef = lfnonehextoint (numammlef)
      numammrig = lfnonehextoint (numammrig)
      boogotplejs = ((numammlef~=255) and (numammrig~=255))
      break
    end--while -- fake loop -- join mark
    if (boogotplejs) then
      numdatainx = numdatainx + 4 -- consumed 5 char:s, cannot overflow here
      varduahuruf = string.char (numdataodt,numdataoet)
      varduahuruf = tabsubstut[varduahuruf] -- risk of type "nil"
      if (type(varduahuruf)~='string') then
        varduahuruf = '' -- type "nil" or invalid type gives empty string
      end--if
      if (varduahuruf=='') then
        numdataoct = string.len(strhazil) - numammlef -- this can underflow
        if (numdataoct<=0) then
          strhazil = ''
        else
          strhazil = string.sub(strhazil,1,numdataoct) -- discard left
        end--if
        numdatainx = numdatainx + numammrig -- discard right this can overflow
      else
        strhazil = strhazil .. varduahuruf -- insert / expand
      end--if
    else
      strhazil = strhazil .. string.char(numdataoct) -- copy char as-is
    end--if (boogotplejs) else
  end--while

  return strhazil

end--function lfiultiminsert

------------------------------------------------------------------------

-- Local function LFIFINDITEMS

-- Search in string primarily intended for LFIULTIMINSERT.

-- Input  : * long string where to search (for example "Kapvorto (@LK00)")
--          * even number of char:s what to search (for example "WCWU")

-- Output : * boolean ("true" in any found, "false" for our example)

local function lfifinditems (strwhere, strandevenwhat)

  local strcxztvaa = ''
  local numcxzlen = 0
  local numcxzind = 1 -- ONE-based step TWO
  local boofoundthecrap = false

  numcxzlen = string.len(strandevenwhat)
  while true do
    if (numcxzind>=numcxzlen) then
      break -- not found
    end--if
    strcxztvaa = '@' .. string.sub(strandevenwhat,numcxzind,(numcxzind+1))
    boofoundthecrap = (string.find(strwhere,strcxztvaa,1,true)~=nil)
    if (boofoundthecrap) then
      break -- found any of them, done
    end--if
    numcxzind = numcxzind + 2
  end--while
  return boofoundthecrap

end--function lfifinditems

------------------------------------------------------------------------

---- HIGH LEVEL FUNCTIONS [H0] ----

------------------------------------------------------------------------

-- Local function LFILEFTRIGHT

-- Brew wikilink from 2 elements.

local function lfileftright (strbigleft, strbigright)
  local strwikilink = ''
  if  ((strbigright=='') or (strbigleft==strbigright)) then
    strwikilink = strbigleft -- save bloat
  else
    strwikilink = strbigleft .. '|' .. strbigright -- here genuine wall needed
  end--if
  strwikilink = '[[' .. strwikilink .. ']]' -- always link
  return strwikilink
end--function lfileftright

------------------------------------------------------------------------

-- Local function LFFILLKATON

-- Add one string and maybe one bool to global "qtabktaoj" provided the
-- string is nonempty and not yet in and there is some space left.

-- This function has exclusive write access to "qtabktaoj". Do NOT write
-- to it in any other way except during declaration + initialization.

-- We allow max 16 cat:s from auto split or split control parameter and
-- max 4 cat:s from extra parameter but there is a sum limit of 18.

local function lffillkaton (stritem, boomain)
  local numsrchindex = 0
  local varpeek = 0
  while true do
    if (numsrchindex==18) then
      break -- no free slot left
    end--if
    varpeek = qtabktaoj[numsrchindex]
    if (varpeek==stritem) then
      numsrchindex = 18
      break -- already in
    end--if
    if (varpeek==nil) then
      break -- found free slot
    end--if
    numsrchindex = numsrchindex + 1
  end--while
  if (numsrchindex~=18) then
    qtabktaoj[numsrchindex] = stritem
    if (boomain) then
      qtabktaoj[numsrchindex+20] = true
    end--if
  end--if
end--function lffillkaton

------------------------------------------------------------------------

-- Local function LFHGET345NONIL

-- we read from global "contabktaoj" index 3...5

-- "nummortyyp" mortyp "W" has code 87 and gives index 3 or 4
-- "nummortyyp" mortyp other has code < 87 (ZERO is safe) and gives index 5
-- "boofraazo" can be assigned to "false" if not needed (index 5)

local function lfhget345nonil (nummortyyp, boofraazo)

  local strctlstring = ''
  local numpiinx = 0 -- temp 3...5

  if (nummortyyp==87) then
    numpiinx = 3 -- vortgrupo contains "W"
    if (boofraazo) then
      numpiinx = 4 -- kalimat contains "W"
    end--if
  else
    numpiinx = 5 -- word can contain C I M N P U but obviously not "W"
  end--if
  strctlstring = contabktaoj[numpiinx] -- pick main data string risk for "nil"
  if (type(strctlstring)~='string') then
    strctlstring = '' -- fool-proof
  end--if

  return strctlstring -- can be empty but NOT type "nil"

end--function lfhget345nonil

------------------------------------------------------------------------

---- HIGH LEVEL FUNCTIONS [H5] ----

------------------------------------------------------------------------

-- Local function LFHSPLITAA

-- Perform the automatic multiword split or assisted split controlled
-- by 2 prevalidated tables.

-- Note that the split can sort of fail and return same string, most notably
-- if no split boundaries exist, or some do exist but all are blocked.

-- Counting of the boundaries is tricky. We DO count the suppressed ones but
-- do NOT count multiple consecutive non-letters more than once. Thus the
-- boundaries are between words only and at begin and end, there CANNOT
-- be empty content between 2 boundaries. We usually have 2 faked empty
-- boundaries at begin and end, but they can also be real and count then.

-- For example "AND YES, we !,definit-ely,! can." contains 5 words (that can
-- become 5 output fragments numbered 0...4) words and 5 input boundaries
-- (numbered 0...4). In the text "?va?" there are 2 boundaries at begin
-- and end.

-- We need sub "lfiultiminsert" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "boomorfium" must be
-- false unless lng in "tabkoudo" is valid and known.

-- Names of the categories are built from "contabktaoj" index 3 (vortgrupo)
-- or 4 (frazo) but here not 5 (vorto, useful for manual split). Categories
-- are brewed only if "boomorfium" is true, the split does not fail, and the
-- individual fragment is not blocked. For example "va" will neither link nor
-- categorize but "va?" will do both. The "#"..."N"-syntax blocks both linking
-- and morpheme categorization (if the latter is enabled otherwise). Even if
-- linking is blocked for other reason (most notably only 1 fragment generated
-- after split attempt) then categorization is suppressed as well.

-- Input  : * "strlemmain"   -- input text (pagename)
--          * "tabblokr"     -- index 0...15 holes permitted, from "%"
--          * "tablinker"    -- index 0...15 holes permitted, from "#"
--          * "boomorfium"   -- "true" if compound cat:s are desired
--          * "bookalimat"   -- "true" is word class "KA" was specified
--          * "tabkoudo"     -- lng stuff ("??" legal but needs "boomorfium")
-- Output : * "stromong"     -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s (index [20]...[35] main page status not used here).

-- Depends on functions :
-- [I] lffillkaton lfiultiminsert
-- [U] lfulnutf8char lfucasegene (generous)
-- [G] lfgpokestring lfgtestuc lfgtestlc
-- [E] mathdiv mathmod mathbittest

-- This sub depends on "HIGH LEVEL FUNCTIONS"\"lfhget345nonil" and
-- "HIGH LEVEL FUNCTIONS"\"lfileftright".

local function lfhsplitaa (strlemmain, tabblokr, tablinker, boomorfium, bookalimat, tabkoudo)

  local varrisktabl = 0 -- can be type "nil"
  local strfragment = ''
  local strfragdext = '' -- right part with visible text (wall not included)
  local stromong = '' -- final result
  local strkattcty = ''
  local strkatoon = '' -- for "lffillkaton"
  local numloonginp = 0 -- length of input
  local numinxed = 0 -- ZERO-based index of input char:s
  local numboundrinp = 0 -- counter of detected boundaries include suppressed
  local numoutfrag = 0 -- counter of produced fragments
  local numotcot = 0
  local numotcet = 0
  local numotcuu = 0 -- control code from "tablinker" (ZERO is "nil" ie none)
  local boohavechar = false
  local booqboueof = false -- combo status: boundary char or end of string
  local booprevqbe = false -- previous combo status
  local boosuppress = false -- suppress split but still do count the boundary
  local boodolnkkat = false -- do link and maybe categorize the fragment

  numloonginp = string.len(strlemmain)

  while true do
    if (numinxed==numloonginp) then
      boohavechar = false
      booqboueof = true -- copied whole string and end of fragment
      boosuppress = false -- last chance, we must output accumulated fragment
    else
      boohavechar = true -- can be part of word or boundary !!!
      numotcot = string.byte (strlemmain,(numinxed+1),(numinxed+1))
      numinxed = numinxed + 1 -- ZERO-based
      booqboueof = ((numotcot==32) or lfgtestpuncture(numotcot))
      boosuppress = (tabblokr[numboundrinp]=="1")
    end--if
    if (booprevqbe and (booqboueof==false)) then
      numboundrinp = numboundrinp + 1 -- count even suppressed boundaries
    end--if
    booprevqbe = booqboueof -- assign previous status for next round
    if (booqboueof and (not boosuppress) and (strfragment~='')) then
      strfragdext = strfragment -- visible text right of the wall "|"
      boodolnkkat = false -- preassume no link no cat
      if ((stromong~='') or boohavechar) then -- avoid selflink to page
        varrisktabl = tablinker[numoutfrag] -- can be type "nil"
        numotcuu = 0
        if (type(varrisktabl)=='string') then
          numotcuu = string.byte (varrisktabl,1,1)
        end--if
        if (numotcuu==73) then -- "I" lowercase
          strfragment = lfucasegene (strfragment,false,false)
        end--if
        if (numotcuu==65) then -- "A" uppercase
          strfragment = lfucasegene (strfragment,true,false)
        end--if
        if (numotcuu==58) then -- ":" explicit replace
          strfragment = string.sub (varrisktabl,2,string.len(varrisktabl))
        end--if
        boodolnkkat = (numotcuu~=78) -- "boodolnkkat" needed below 2 times
      end--if ((stromong~='') or boohavechar) then
      if (boodolnkkat) then
        stromong = stromong .. lfileftright (strfragment,strfragdext) -- wlink
      else
        stromong = stromong .. strfragment -- add raw fragment no link
      end--if
      if (boomorfium and boodolnkkat) then
        strkattcty = lfhget345nonil (87,bookalimat) -- always "W" thus 5 imposs
        numotcet = string.len(strkattcty) -- this is automatic or assisted
        if (numotcet>=2) then
          tabkoudo["WC"] = nil -- no stupid word class here
          tabkoudo["WU"] = nil -- no stupid word class here
          tabkoudo["MT"] = nil -- a word does not have any morpheme type
          tabkoudo["FR"] = strfragment
          strkatoon = lfiultiminsert (strkattcty,tabkoudo)
          lffillkaton (strkatoon,false) -- NOT main page -- "qtabktaoj"
        end--if (numotcet>=2) then
      end--if (boomorfium and boodolnkkat) then
      strfragment = ''
      numoutfrag = numoutfrag + 1 -- count fragments "lffillkaton" separately
    end--if (booqboueof and (not boosuppress) and (strfragment~='')) then
    if (boohavechar) then
      if (booqboueof and (not boosuppress)) then
        stromong = stromong .. string.char(numotcot) -- add non-linkable char
      else
        strfragment = strfragment .. string.char(numotcot) -- add chr to fragm
      end--if
    else
      break -- done all
    end--if
  end--while

  return stromong

end--function lfhsplitaa

------------------------------------------------------------------------

-- Local function LFHSPLITMN

-- Perform the manual split controlled by one prevalidated table. Actually
-- the table contains the presplit complete lemma and the pagename is not
-- needed at all. Max 16 fragments can come in, type "F000" does count. We
-- rely on all details being prevalidated (number of fragments, plusses and
-- rectangular brackets, colons and slashes, only valid uppercase letters
-- before colon, legal use of "L:", ...).

-- We need sub "lfiultiminsert" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "boomorkat" must be
-- false unless lng in "tabkuodo" is valid and known.

-- Names of the categories are built from "contabktaoj" index 3 (vortgrupo)
-- or 4 (frazo) or 5 (vorto).

-- The source string uses slashes "/" as field separator but the destination
-- string uses walls "|".

-- Omitting deleted characters and dash adding are performed only for
-- fragment type "F210" ie only one field after ":" and no slash "/".
-- Also "L" is permitted for fragment type "F210" only but this is
-- prevalidated. Note that in the early prevalidation step the debracketing
-- for the "sum check" is NOT limited to fragment type "F210".

-- We have to maintain 2 separate fragment counters. For example valid syntax
-- "[M:kung]+a+[M:doeme]" gives 3 input fragments in "tabmnfragoj", but only
-- 2 output fragments in "qtabktaoj", and we want them to have indexes 0
-- and 1, not 0 and 2. The out counter is not explicit, it is the content
-- of "qtabktaoj" processed in "lffillkaton".

-- There is a problem with the wikisyntax, for example "[[no]]pe" will act as
-- "[[no|nope]]" ie the visible link text will continue beyond the bracket
-- and cover the "pe", whereas "[[no]]??" does not trigger such behavior. To
-- prevent this from happening we must add something invisible, and we use
-- "<i></i>".

-- here we DO introduce wikilinks with double brackets and walls
-- here we DO expand "+" to " + " (between fragments)
-- here we DO add dashes to some affixes (fragment type "F210")
-- here we do NOT carry out the "sum check" (done in the prevalidation)

-- Input  : * "tabmnfragoj"    -- prevalidated presplit table "+[I:bug/BUG]"
--          * "boomorkat"      -- "true" if compound cat:s are desired
--          * "bookalymat"     -- "true" is word class "KA" was specified
--          * "tabkuodo"       -- lng stuff ("??" legal but needs "boomorkat")
-- Output : * "strumung"       -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s (index [20]...[35] main page status not used here).

-- Depends on functions :
-- [H] lfhget345nonil
-- [I] lffillkaton lfiultiminsert
-- [G] lfgtestuc

-- This sub depends on "STRING FUNCTIONS"\"lfidebracket" and
-- "STRING FUNCTIONS"\"lfiremove2bra" and
-- "STRING FUNCTIONS"\"lfiaddthedash" and
-- "HIGH LEVEL FUNCTIONS"\"lfifinditems" and
-- "HIGH LEVEL FUNCTIONS"\"lfileftright".

local function lfhsplitmn (tabmnfragoj, boomorkat, bookalymat, tabkuodo)

  local varrysktabl = 0 -- from in table can be type "nil"
  local strumung = '' -- final result with links
  local strwalzleft = ''
  local strwallrght = ''
  local strwallcatg = '' -- same as "strwalzleft" unless "L"-trick is used
  local strkattctx = ''
  local strkatton = '' -- for "lffillkaton"
  local numinnfrog = 0 -- counter in "tabmnfragoj" type "F000" does count
  local numlenfrago = 0 -- ONE-based last valid index
  local numivnxed = 0 -- ONE-based index of char:s inside fragment
  local numcuaar = 0
  local numcuabr = 0 -- +1
  local numcuacr = 0 -- +2
  local numcom1of79z = 0 -- 0 | 67 C 73 I 76 L 77 M 78 N 80 P 85 U | 87 W
  local booeldtrick = false -- true for the "L"-trick giving type "N"
  local booright = false -- false left | true right
  local boohavecolon = false
  local boo210magic = false -- enhance and strip then
  local booneedmor = false

  while true do -- outer loop counts fragments in table
    booeldtrick = false -- separate verdict for every fragment
    boohavecolon = false -- separate verdict for every fragment
    boo210magic = false -- separate verdict for every fragment
    numcom1of79z = 0 -- default none, separate verdict for every fragment
    varrysktabl = tabmnfragoj [numinnfrog] -- can be type "nil" !!!
    numinnfrog = numinnfrog + 1
    if (type(varrysktabl)~='string') then
      break -- give up on "nil"
    end--if
    numlenfrago = string.len (varrysktabl) -- cannot be empty
    numivnxed = 1 -- ONE-based
    numcuaar = string.byte (varrysktabl,1,1)
    if (numcuaar==43) then
      numivnxed = 2 -- ONE-based skip the "+" even for type "F000" far below
      strumung = strumung .. ' + ' -- add the spaces here
      numcuaar = string.byte (varrysktabl,2,2) -- pick new char cannot be "+"
    end--if
    if (numcuaar==91) then -- bracketed []-fragment processed char-by-char
      numivnxed = numivnxed + 1 -- now at least 2
      strwalzleft = ''
      strwallrght = ''
      booright = false
      numcuabr = 0
      numcuacr = 0 -- minimal fe "[M:x]" 5 char:s 1...5 or 2...6
        if ((numlenfrago-numivnxed)>=3) then
          numcuabr = string.byte (varrysktabl,numivnxed,numivnxed)
          numcuacr = string.byte (varrysktabl,(numivnxed+1),(numivnxed+1))
        end--if
        if ((numcuacr==58) and lfgtestuc(numcuabr)) then
          numcom1of79z = numcuabr -- "numcuabr" is prevalidated ;-)
          numivnxed = numivnxed + 2 -- eat it away too
          boohavecolon = true -- fragment type "F210" or "F211"
          if (numcom1of79z==76) then
            booeldtrick = true -- fe "fer(o)" -> link "fero" and categ "fer"
            numcom1of79z = 78 -- "L" -> "N"
          end--if
        end--if
        while true do -- inner loop counts char:s in a bracketed []-fragment
          if (numivnxed==numlenfrago) then
            break -- skip trailing ']' guaranteed to exist
          end--if
          numcuaar = string.byte (varrysktabl,numivnxed,numivnxed)
          if (booright) then
            strwallrght = strwallrght .. string.char (numcuaar) -- wall NOT po
          else
            if (numcuaar==47) then
              booright = true -- source separating slash "/"
            else
              strwalzleft = strwalzleft .. string.char (numcuaar)
            end--if
          end--if
          numivnxed = numivnxed + 1
        end--while
        if (strwallrght=='') then
          strwallrght = strwalzleft -- type "F200" or "F210"
          boo210magic = boohavecolon -- magic qualifies only if type is F210
        end--if
        if (boo210magic) then -- try enhance left fe "il" -> "-il-"
          if (numcom1of79z==80) then
            strwalzleft = lfiaddthedash (strwalzleft,false,true) -- P
          end--if
          if (numcom1of79z==85) then
            strwalzleft = lfiaddthedash (strwalzleft,true,false) -- U
          end--if
          if (numcom1of79z==73) then
            strwalzleft = lfiaddthedash (strwalzleft,true,true) -- I
          end--if
        end--if
        strwallcatg = strwalzleft -- seize after enhancing before stripping
        if (boo210magic) then -- always strip but in various ways
          strwalzleft = lfiremove2bra (strwalzleft) -- link "kac(o)" -> "kaco"
          if (booeldtrick) then -- "L" -> "N"
            strwallcatg = lfidebracket (strwallcatg,true,1) -- for the category
          else
            strwallcatg = lfiremove2bra (strwallcatg) -- for the category
          end--if
        end--if
      strumung = strumung .. lfileftright (strwalzleft,strwallrght) .. '<i></i>' -- always link
      if (boomorkat and (numcom1of79z~=0)) then
        strkattctx = lfhget345nonil (numcom1of79z,bookalymat) -- 3 or 4 or 5
        numcuaar = string.len(strkattctx) -- this is the manual split
        if (numcuaar>=2) then
          booneedmor = lfifinditems(strkattctx,"MT") -- need it ??
          tabkuodo["WC"] = nil -- no stupid word class here
          tabkuodo["WU"] = nil -- no stupid word class here
          if (booneedmor) then
            tabkuodo["MT"] = string.char(numcom1of79z) -- morpheme type
          else
            tabkuodo["MT"] = nil -- no morpheme type here
          end--if
          tabkuodo["FR"] = strwallcatg -- fragment or word
          strkatton = lfiultiminsert (strkattctx,tabkuodo)
          lffillkaton (strkatton,false) -- NOT main page -- "qtabktaoj"
        end--if (numcuaar>=2) then
      end--if (boomorkat and (numcom1of79z~=0)) then
    else
      strumung = strumung .. string.sub (varrysktabl,numivnxed,numlenfrago) -- copy type F000 as-is
    end--if (numcuaar==91) else
  end--while

  return strumung

end--function lfhsplitmn

------------------------------------------------------------------------

-- Local function LFHSPLITSI

-- Perform the simple root split (3, "$S") or simple bare
-- root (4, "$B") strategy. Pagename is needed.

-- $S simple root split    suno  -> sun        + [-o/o] kat "N!sun" + "U:-o"
--                         Suno  -> [suno/Sun] + [-o/o] kat "N!sun" + "U:-o"
-- $B simple bare root     sun   -> sun                 kat "M!sun"
--                         Sun   -> [sun/Sun]           kat "M!sun"
-- $B simple bare root NR  #     -> #                   kat "N!#"
--                                ("#" represents a Chinese letter)

-- Note that for $S the mortyp is always "N" (nonstandalone) whereas
-- for $B it can be either "M" (standalone) or "N".

-- We need sub "lfiultiminsert" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "bookomdez" must be
-- false unless lng in "tablngbah" is valid and known.

-- Names of the categories are built from "contabktaoj" index 5 (vorto).

-- Input  : * "strhalaman"     -- input lemma ie pagename
--          * "numkodsplit"    -- 3 or 4 for $S or $B
--          * "bookomdez"      -- "true" if compound cat:s are desired at all
--          * "boonitro"       -- "true" if word class is NR
--          * "tablngbah"      -- lng stuff ("??" legal but needs "bookomdez")
-- Output : * "strymyng"       -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s and maybe index [20]...[35] with main page status.
-- In fact only one index (probably [20]) can receive the "true" here.

-- Depends on functions :
-- [H] lffillkaton lfiultiminsert lfhget345nonil
-- [U] lfulnutf8char lfucasegene (generous)
-- [G] lfgpokestring lfgtestuc lfgtestlc
-- [E] mathdiv mathmod mathbittest

local function lfhsplitsi (strhalaman, numkodsplit, bookomdez, boonitro, tablngbah)

  local strtakkctx = '' -- contabktaoj[5] index 5 is hardcoded
  local strymyng = '' -- screen
  local strlover = '' -- brewed from "strhalaman" : "Suno" -> "suno"
  local strnolast = '' -- brewed from "strhalaman" : "Suno" -> "Sun"
  local strnolaslow = '' -- brewed from "strlover" : "Suno" -> "sun"
  local strkatroot = ''
  local strcatoton = ''
  local nummortyp = 0 -- 77 "M" or 78 "N" only
  local numdewsx = 0
  local numlasst = 0 -- last char of lemma or ZERO if not separated
  local numcauar = 0
  local numcaubr = 0
  local booindeedlow = false

  numdewsx = string.len (strhalaman)
  strlover = lfucasegene(strhalaman,false,false)
  booindeedlow = (strlover==strhalaman)
  numlasst = 0 -- needed far below
  nummortyp = 77 -- "M"
  if (boonitro or (numkodsplit==3)) then
    nummortyp = 78 -- "N"
  end--if
  if (numkodsplit==3) then
    strnolast = string.sub (strhalaman,1,(numdewsx-1)) -- cut off last char
    strnolaslow = string.sub (strlover,1,(numdewsx-1)) -- cut off & lowercase
    numlasst = string.byte (strhalaman,numdewsx,numdewsx) -- needed far below
    if (booindeedlow) then
      strymyng = strnolast -- as-is lowercase
    else
      strymyng = '[[' .. strlover .. '|' .. strnolast .. ']]' -- link
    end--if
    strymyng = strymyng .. ' + [[-' .. string.char(numlasst) .. '|' .. string.char(numlasst) .. ']]'
    strkatroot = strnolaslow -- $S
  end--if
  if (numkodsplit==4) then
    if (booindeedlow) then
      strymyng = strhalaman -- as-is lowercase
    else
      strymyng = '[[' .. strlover .. '|' .. strhalaman .. ']]' -- link
    end--if
    strkatroot = strlover -- $B
  end--if

  if (bookomdez) then
    strtakkctx = lfhget345nonil (0,false) -- pick main data string 5 hardcoded
    numcauar = string.len(strtakkctx) -- simple "strtakkctx" can be used twice
    if (numcauar>=2) then
      tablngbah["WC"] = nil -- no stupid word class here
      tablngbah["WU"] = nil -- no stupid word class here
      tablngbah["MT"] = string.char(nummortyp)
      tablngbah["FR"] = strkatroot
      strcatoton = lfiultiminsert (strtakkctx,tablngbah)
      lffillkaton (strcatoton,true) -- YES main page -- "qtabktaoj"
      if (numlasst~=0) then
        tablngbah["MT"] = 'U' -- last letter is suffix "U"
        tablngbah["FR"] = '-' .. string.char(numlasst)
        strcatoton = lfiultiminsert (strtakkctx,tablngbah)
        lffillkaton (strcatoton,false) -- NOT main page -- "qtabktaoj"
      end--if
    end--if (numcauar>=2) then
  end--if (bookomdez) then

  return strymyng

end--function lfhsplitsi

------------------------------------------------------------------------

-- Local function LFSPLITLALE

-- Perform the large letter split (5, "$H").

-- The lemma is split into single letters. This is most useful for but
-- not restricted to Chinese ones. Note that for this split the mortyp is
-- always "M" (standalone). Use manual split for other cases.

-- We need sub "lfiultiminsert" (2 para) and table "contabktaoj"
-- controlling the structure of the cat name. "bookomdoz" must be
-- false unless lng in "tablngbaih" is valid and known.

-- Names of the categories are built from "contabktaoj" index 5 (vorto).

-- Input  : * "strhilaman"     -- input lemma ie pagename
--          * "bookomdoz"      -- "true" if compound cat:s are desired at all
--          * "tablngbaih"     -- lng stuff ("??" legal but needs "bookomdez")
-- Output : * "strygyng"       -- wikitext to be sent to screen

-- This function fills global "qtabktaoj" index [0]...[15] with names of
-- morpheme cat:s (index [20]...[35] main page status not used here).

-- Depends on functions :
-- [H] lfhget345nonil
-- [I] lfiultiminsert
-- [U] lfulnutf8char

-- This sub depends on "HIGH LEVEL FUNCTIONS"\"lffillkaton".

local function lfsplitlale (strhilaman, bookomdoz, tablngbaih)

  local strtookctj = '' -- contabktaoj[5] index 5 is hardcoded
  local strygyng = '' -- screen
  local strbeexess = ''
  local strcatatan = ''
  local numinwwlen = 0
  local numwwindex = 1 -- ONE-based
  local numwwchar = 0
  local numwwlen = 0

  numinwwlen = string.len(strhilaman)

  while true do -- genuine loop, counter is "numwwindex" step 1...4
    if (numwwindex>numinwwlen) then
      break -- done (risk of overflow)
    end--if
    numwwchar = string.byte (strhilaman,numwwindex,numwwindex)
    numwwlen = lfulnutf8char (numwwchar)
    if (numwwlen==0) then
      strygyng = strhilaman -- this is criminal
      break -- some compound cat:s may be left behind :-(
    end--if
    strbeexess = string.sub (strhilaman,numwwindex,(numwwindex+numwwlen-1))
    if (strygyng~='') then
      strygyng = strygyng .. ' + '
    end--if
    strygyng = strygyng .. '[[' .. strbeexess .. ']]'
    if (bookomdoz) then
      strtookctj = lfhget345nonil (0,false) -- pick main data string 5 hardco
      numwwchar = string.len(strtookctj) -- this is large letter split
      if (numwwchar>=2) then
        tablngbaih['WC'] = nil -- no stupid word class here
        tablngbaih['WU'] = nil -- no stupid word class here
        tablngbaih['MT'] = 'M'
        tablngbaih['FR'] = strbeexess
        strcatatan = lfiultiminsert (strtookctj,tablngbaih)
        lffillkaton (strcatatan,false) -- NOT main page -- "qtabktaoj"
      end--if (numwwchar>=2) then
    end--if (bookomdoz) then
    numwwindex = numwwindex + numwwlen -- step 1...4 risk of overflow
  end--while

  return strygyng

end--function lfsplitlale

------------------------------------------------------------------------

---- VARIABLES [R] ----

------------------------------------------------------------------------

function exporttable.ek (arxframent)

  -- general unknown type

  local vartamp = 0     -- variable without type

  -- special type "args" AKA "arx"

  local arxspecial = 0  -- from module

  -- general tab in from caller ("qtabktaoj" is elsewhere)

  local tabbluck        = {}  -- from "%"-syntax assi
  local tablynx         = {}  -- from "#"-syntax assi
  local tabmnfrags      = {}  -- for manual split
  local tabextfriig     = {}  -- from extra parameter
  local tablngdbl       = {}  -- double-letter indexes

  -- general str ("qstrtrace" is elsewhere)

  local strkaatctl = ''  -- picked from "contabktaoj" via "lfhget345nonil"
  local strlemmain = ''  -- lemma in
  local strlemmaut = ''  -- bold lemma (maybe split) out
  local strutmp    = ''  -- temp

  -- general num

  local numsplyt   = 0 -- split strategy (0 auto 1 assisted 2 manu 7 none)

  local numerr    = 0
  local numtamp   = 0
  local numoct    = 0
  local numodt    = 0
  local numlindex = 0

  -- general boo from caller

  local boocatdesir = false
  local booexteval  = false  -- true if we got the extra parameter
  local boohavnyrr  = false  -- true if we got "NR"
  local boohavkall  = false  -- true if we got "KA"

------------------------------------------------------------------------

---- MAIN [Z] ----

------------------------------------------------------------------------

  ---- ASSIGN AND BOAST ----

  lfdtracemsg ('This is "msplitter" submodule') -- unconditional

  ---- GET THE ARX ----

  arxspecial = arxframent.args
  while true do -- fake loop
    if (type(arxspecial)~='table') then

      lfdtracemsg ('Overall bad data type') -- "qstrtrace"

      numerr = 2 -- #E02
      break
    end--if
    boocatdesir = arxspecial[ 0]
    strlemmain  = arxspecial[ 1]
    numsplyt    = arxspecial[ 2]
    if ((type(boocatdesir)~='boolean') or (type(strlemmain)~='string') or (type(numsplyt)~='number')) then

      lfdtracemsg ('Index 0...2 bad data type') -- "qstrtrace"

      numerr = 3 -- #E03
      break
    end--if
    tabbluck    = arxspecial[ 3]
    tablynx     = arxspecial[ 4]
    tabmnfrags  = arxspecial[ 5]
    tabextfriig = arxspecial[ 6]
    booexteval  = arxspecial[ 7] -- boolean between tables !!!
    tablngdbl   = arxspecial[ 8]
    boohavnyrr  = arxspecial[ 9] -- NR
    boohavkall  = arxspecial[10] -- KA
    boodetrc    = (arxspecial[15]==true)
    if ((type(booexteval)~='boolean') or (type(tablngdbl)~='table') or (type(boohavnyrr)~='boolean') or (type(boohavkall)~='boolean')) then

      lfdtracemsg ('Index 7...10 bad data type') -- "qstrtrace"

      numerr = 4 -- #E04
    end--if
    break
  end--while -- fake loop

  if (numerr==0) then
    lfdtracemsg ('Incoming table OK')
  end--if

  ---- SPLIT THE LEMMA IF NEEDED ----

  -- process from "strlemmain" (sudah guaranteed to be
  -- non-empty) to "strlemmaut" (actually NOT for manual split)

  -- "numsplyt" : 0 auto 1 assisted 2 manu 3 srs 4 sbr 5 lale 7 none

  -- we skip the split and copy only if:
  -- * "numsplyt" is 7 (#S7 no split)

  -- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63
  -- dash "-" and apo "'" do NOT count as punctuation (for auto and assisted)

  -- we depend on "boocatdesir" (they can switch off some cat:s)
  -- we depend on "boohavkall" (switches between "vortgrupo" and "frazo")

  -- "qtabktaoj" is very global
  -- 0...17 cat names without "Category:" prefix, unused "nil"
  -- 20...37 "true" if main page, otherwise "nil"
  -- "lfhsplitaa" and "lfhsplitmn" and "lfhsplitsi" and "lfsplitlale" will fill
  -- it (via "lffillkaton") and below more content comes from extra parameter

  if (numerr==0) then
    if (numsplyt<2) then -- ZERO or ONE -> auto or assisted #S0 #S1
      strlemmaut = lfhsplitaa (strlemmain, tabbluck, tablynx, boocatdesir, boohavkall, tablngdbl)
    end--if
    if (numsplyt==2) then -- 2 -> manu #S2
      strlemmaut = lfhsplitmn (tabmnfrags, boocatdesir, boohavkall, tablngdbl)
    end--if
    if ((numsplyt==3) or (numsplyt==4)) then -- 3 4 -> simple #S3 #S4
      strlemmaut = lfhsplitsi (strlemmain, numsplyt, boocatdesir, boohavnyrr, tablngdbl)
    end--if
    if (numsplyt==5) then -- 5 -> lale #S5
      strlemmaut = lfsplitlale (strlemmain, boocatdesir, tablngdbl)
    end--if
    if (numsplyt==7) then -- 7 -> no split #S7
      strlemmaut = strlemmain -- no split, "strlemmaut" needed for visible part
    end--if
  end--if

  ---- BREW UP TO 4 EXTRA CATEGORIES ----

  -- from extra parameter sent to us in "tabextfriig" and "booexteval"

  -- with "booexteval" true prevalidated morphemes are be in
  -- "tabextfriig" incl prefix fe "C:" or "M!", the caller
  -- converts possible "&"-syntax to 1 or 2 fragments

  -- with "booexteval" false the extra parameter was empty and
  -- we do nothing here

  if ((numerr==0) and boocatdesir and booexteval) then
    numlindex = 0
    while true do
      vartamp = tabextfriig[numlindex] -- risk of type "nil"
      if (type(vartamp)=='string') then
        numoct = string.byte(vartamp,1,1) -- C I M N P U W
        numodt = string.byte(vartamp,2,2) -- ":" 58 or "!" 33
        numtamp = string.len (vartamp)
        strutmp = string.sub (vartamp,3,numtamp) -- prevalidated morpheme string
        strkaatctl = lfhget345nonil (numoct,boohavkall) -- pick main data str
        numtamp = string.len(strkaatctl) -- this is main brewing 4 extra cat:s
        if (numtamp>=2) then
          bootimp = lfifinditems(strkaatctl,"MT") -- need it ??
          tablngdbl["WC"] = nil -- no stupid word class here
          tablngdbl["WU"] = nil -- no stupid word class here
          if (bootimp) then
            tablngdbl["MT"] = string.char(numoct) -- morpheme type
          else
            tablngdbl["MT"] = nil -- no morpheme type here
          end--if
          tablngdbl["FR"] = strutmp
          strutmp = lfiultiminsert (strkaatctl,tablngdbl)
          lffillkaton (strutmp,(numodt==33)) -- MAYBE main page -- "qtabktaoj"
        end--if (numtamp>=2) then
      else
        break -- abort at type "nil"
      end--if (type(vartamp)=='string') else
      numlindex = numlindex + 1
    end--while
  end--if

  ---- PREPARE RETURN ----

  if (numerr~=0) then
    strlemmaut = "//" -- still use qtabktaoj [52] to check status
  end--if
  qtabktaoj [50] = strlemmaut -- unconditionally
  qtabktaoj [51] = qstrtrace -- unconditionally, cannot be empty
  qtabktaoj [52] = numerr -- unconditionally

  ---- RETURN THE RESULT TABLE ----

  return qtabktaoj

end--function

  ---- RETURN THE JUNK LUA TABLE ----

return exporttable