Decode Arabic to Arabic Presentation Form
JavaScript (ES6), 231 ... 148 146 bytes
I/O format: list of code points
a=>a.map(t=(c,i)=>c%32?(g=A=>n^c%49?g([x=n%32<26,k*x,x&=454642>>n++*25%54%26,x].map(x=>x&&++k)):A[t=t>1^2*(a[i+1]>1569)]||A[t^=2])(n=k=0)+65151:c)
Try it online!
a => // a[] = input array
a.map(t = (c, i) => // for each code point c at position i in a[]:
c % 32 ? // if c is not 0x0640:
( g = A => // g is a recursive function taking a list A[]
n ^ c % 49 ? // if we haven't reached the correct character:
g( // do a recursive call:
[ // build a list describing the existence of:
x = n % 32 < 26, // - isolated: if not in range 0x063B-0x0640
k * x, // - final: same as isolated, except for the
// first entry which is undefined
x &= // - initial: we use a lookup bit-mask
454642 >> n++ // with the following hash formula:
* 25 % 54 % 26, // ((n * 25) mod 54) mod 26
x // - medial: same as initial
].map(x => x && ++k) // turn it into a list of indices
) // end of recursive call
: // else:
A[ // output the relevant entry from A[]:
t = // update t:
t > 1 ^ 2 * ( // (previous was initial or medial) XOR
a[i + 1] > 1569 // 2 * (this is not the last letter
) // and the next letter is not 0x0621)
] || // if A[t] is not defined:
A[t ^= 2] // use A[t ^ 2] instead
)(n = k = 0) // initial call to g with n = k = 0
+ 65151 // add 0xFE7F
: // else:
c // output c unchanged
) // end of map()
05AB1E, 110 107 bytes
1ƵFLŽ$§R©+ε•7Ö›±ćY¨Γ∊%•ƵBвNåi1‚1ª]˜4ô®¸šIŽ6d-D₂›6*-èRćURviy2èDiyнsë0ëyθDΘIN>èŽ6dQ~i\y1è1ë0]Xs_i¦}н)IŽ6ηDŠkǝ
-3 bytes due to looser I/O rules.
I/O as a list of code-point integers.
Try it online or verify all test cases.
Explanation:
1 # Push a 1 (which we'll use later on)
We start by creating the lookup table:
ƵF # Push compressed integer 116
L # Pop and push a list in the range [1,116]
Ž$§ # Push compressed integer 25156
R # Reverse it to 65152
© # Store this value in variable `®` (without popping)
+ # Add it to each value in the list, to make the range [65153,65268]
ε # Map each value to:
•7Ö›±ćY¨Γ∊%• # Push compressed integer 35731296019285847629599
ƵB # Push compressed integer 112
в # Convert the larger integer to base-112 as list:
# [1,3,5,7,13,19,41,43,45,47,109,111]
Nåi # If the map-index is in this list:
1‚1ª # Pair it with a 1, and add a second 1 to this list
]˜ # Close the map and if-statement, and flatten the list of values
4ô # Split it into parts of size 4
®¸š # Prepend [65152] (from variable `®`) in front of this list
Which will now hold the following list:
[[65152],[65153,65154,1,1],[65155,65156,1,1],[65157,65158,1,1],[65159,65160,1,1],[65161,65162,65163,65164],[65165,65166,1,1],[65167,65168,65169,65170],[65171,65172,1,1],[65173,65174,65175,65176],[65177,65178,65179,65180],[65181,65182,65183,65184],[65185,65186,65187,65188],[65189,65190,65191,65192],[65193,65194,1,1],[65195,65196,1,1],[65197,65198,1,1],[65199,65200,1,1],[65201,65202,65203,65204],[65205,65206,65207,65208],[65209,65210,65211,65212],[65213,65214,65215,65216],[65217,65218,65219,65220],[65221,65222,65223,65224],[65225,65226,65227,65228],[65229,65230,65231,65232],[65233,65234,65235,65236],[65237,65238,65239,65240],[65241,65242,65243,65244],[65245,65246,65247,65248],[65249,65250,65251,65252],[65253,65254,65255,65256],[65257,65258,65259,65260],[65261,65262,1,1],[65263,65264,1,1],[65265,65266,65267,65268]]
Which corresponds to the following table:
ISO 8859-6 Isolated Final Initial Medial
1569[0621] 65152[FE80]
1570[0622] 65153[FE81] 65154[FE82] 1[n\a] 1[n\a]
1571[0623] 65155[FE83] 65156[FE84] 1[n\a] 1[n\a]
1572[0624] 65157[FE85] 65158[FE86] 1[n\a] 1[n\a]
1573[0625] 65159[FE87] 65160[FE88] 1[n\a] 1[n\a]
1574[0626] 65161[FE89] 65162[FE8A] 65163[FE8B] 65164[FE8C]
1575[0627] 65165[FE8D] 65166[FE8E] 1[n\a] 1[n\a]
1576[0628] 65167[FE8F] 65168[FE90] 65169[FE91] 65170[FE92]
1577[0629] 65171[FE93] 65172[FE94] 1[n\a] 1[n\a]
1578[062A] 65173[FE95] 65174[FE96] 65175[FE97] 65176[FE98]
1579[062B] 65177[FE99] 65178[FE9A] 65179[FE9B] 65180[FE9C]
1580[062C] 65181[FE9D] 65182[FE9E] 65183[FE9F] 65184[FEA0]
1581[062D] 65185[FEA1] 65186[FEA2] 65187[FEA3] 65188[FEA4]
1582[062E] 65189[FEA5] 65190[FEA6] 65191[FEA7] 65192[FEA8]
1583[062F] 65193[FEA9] 65194[FEAA] 1[n\a] 1[n\a]
1584[0630] 65195[FEAB] 65196[FEAC] 1[n\a] 1[n\a]
1585[0631] 65197[FEAD] 65198[FEAE] 1[n\a] 1[n\a]
1586[0632] 65199[FEAF] 65200[FEB0] 1[n\a] 1[n\a]
1587[0633] 65201[FEB1] 65202[FEB2] 65203[FEB3] 65204[FEB4]
1588[0634] 65205[FEB5] 65206[FEB6] 65207[FEB7] 65208[FEB8]
1589[0635] 65209[FEB9] 65210[FEBA] 65211[FEBB] 65212[FEBC]
1590[0636] 65213[FEBD] 65214[FEBE] 65215[FEBF] 65216[FEC0]
1591[0637] 65217[FEC1] 65218[FEC2] 65219[FEC3] 65220[FEC4]
1592[0638] 65221[FEC5] 65222[FEC6] 65223[FEC7] 65224[FEC8]
1593[0639] 65225[FEC9] 65226[FECA] 65227[FECB] 65228[FECC]
1594[063A] 65229[FECD] 65230[FECE] 65231[FECF] 65232[FED0]
1601[0641] 65233[FED1] 65234[FED2] 65235[FED3] 65236[FED4]
1602[0642] 65237[FED5] 65238[FED6] 65239[FED7] 65240[FED8]
1603[0643] 65241[FED9] 65242[FEDA] 65243[FEDB] 65244[FEDC]
1604[0644] 65245[FEDD] 65246[FEDE] 65247[FEDF] 65248[FEE0]
1605[0645] 65249[FEE1] 65250[FEE2] 65251[FEE3] 65252[FEE4]
1606[0646] 65253[FEE5] 65254[FEE6] 65255[FEE7] 65256[FEE8]
1607[0647] 65257[FEE9] 65258[FEEA] 65259[FEEB] 65260[FEEC]
1608[0648] 65261[FEED] 65262[FEEE] 1[n\a] 1[n\a]
1609[0649] 65263[FEEF] 65264[FEF0] 1[n\a] 1[n\a]
1610[064A] 65265[FEF1] 65266[FEF2] 65267[FEF3] 65268[FEF4]
not in the list:
1600[0640] 1600[0640]
After that, we'll use the input to get a quartet from this list:
I # Push the input-list of codepoint integers
Ž6d # Push compressed integer 1569
- # And subtract it from each integer in the list
D # Duplicate this list
₂› # Check which are larger than 26 (1 if truthy; 0 if falsey)
6* # Multiply that by 6 (6 if truthy; 0 if falsey)
- # And subtract that from the values in the list as well
è # Then index it into our list of quartets
And now we apply the rules specified in the challenge description on these quartets of [isolated, final, initial, medial]
.
R # Reverse the list
ć # Extract the head; pop and push remainder-list and head separated
U # Pop the head, and store it in variable `X`
# (`X` now holds the last letter)
R # Reverse the list back
v # Loop `y` over each remaining quartet:
i # If the top of the stack is 1 (it's an isolated or final form):
# (which is always truthy for the first letter, since we pushed a 1 at the start)
y2è # Get the initial form of `y`
D # Duplicate it
i # If it's 1, thus no initial form is available:
yн # Push the isolated form of `y` instead
s # And swap so the 1 is at the top for the next iteration
ë # Else, thus an initial form is available:
0 # Push a 0 for the next iteration
ë # Else (it's an initial or medial form instead):
yθ # Get the medial form of `y`
DΘ # Duplicate it, and 05AB1E-truthify it (1 if 1; else 0)
I # Push the input-list of codepoints again
N> # Use the loop-index+1
è # To index into this list for the next letter codepoint
Ž6d # Push compressed integer 1569
Q # And check if this next codepoint is equal to it (1 if truthy; 0 if falsey)
~i # If either of the two checks is truthy:
\ # Discard the duplicated medial form that's on the stack
y1è # Get the final form of `y`
1 # And push a 1 for the next iteration
ë # Else (it's a medial, and the next character is NOT `ء` (U+0621)):
0 # Push a 0 for the next iteration
] # Close the loop and all inner if-else statements
X # Push the last letter from variable `X`
s_i # If we ended with a 0 (thus an isolated or medial form)
¦ # Remove the first item from `X`
}н # After the if-statement: pop and push the first item of `X`
# (so the isolated form if we ended with an isolated or final form,
# or the final form if we ended with an initial or medial form)
) # Wrap all values on the stack into a list
I # Push the input-list of codepoints again
Ž6η # Push compressed integer 1600
DŠ # Duplicate it, and triple swap (`®`,1600,1600 to 1600,`®`,1600)
k # Get the index of 1600 inside the codepoints list (-1 if not present)
ǝ # And insert the 1600 at that index in our created list
# (which doesn't work on negative indices)
# (after which this list of codepoint integers is output implicitly)
See this 05AB1E tip of mine (sections How to compress large integers? and How to compress integer lists?) to understand why ƵF
is 116
; Ž$§
is 25156
; •7Ö›±ćY¨Γ∊%•
is 35731296019285847629599
; ƵB
is 112
; •7Ö›±ćY¨Γ∊%•ƵBв
is [1,3,5,7,13,19,41,43,45,47,109,111]
; Ž6d
is 1569
; or Ž6η
is 1600
.