Cross-Alphabetic Characters

JavaScript (ES6), 197 179 bytes

Returns an array of 3 ratios in [0..1].

s=>[...s].map(_=>(x='b;C6cC6%c>b^[<$]_3--_c_acC-----$+aKHbKK[`H`H]'[(p=s[a='charCodeAt'](l++)%202%116%89)>>1][a]()-36,x/=p&1||8,L+=x/4&1,G+=x/2&1,C+=x&1),l=L=G=C=0)&&[L/l,G/l,C/l]

Try it online!

How?

We use the (rather inefficient) hash function % 202 % 116 % 89 to transform each character code into an index in [0..88]. The corresponding lookup table consists of 3-bit entries where bit #2 = Latin, bit #1 = Greek and bit #0 = Cyrillic. Using decimal digits, this gives:

Click to copy

76273722773722017732767267300071731711117377737577371111111111000775474476474767744474447

We append an extra 1 to get en even number of entries and encode this bit stream with printable ASCII characters in the range [37..99] (% to c), with 6 bits of payload data per character.

This leads to the following string:

Click to copy

b;C6cC6%c>b^[<$]_3--_c_acC-----$+aKHbKK[`H`H]

The offset was chosen to avoid characters such as \ that would have required escaping.

Jelly, 56 bytes

^{A hash may well be shorter.}

Click to copy

O:⁹:2;ON©œị“ŒḂI4ƥƒⱮıtɱN¦“¤COṙṚ¹`“ÑṂḄẈɼ]ġÐ’b4¤+4Bṙ®Ḣµ€S÷L

A monadic link returning a list of ratio amounts in the order English, Greek, Russian.

Try it online!
...or see a fully formatted output (including the implied rounding to one decimal place)

How?

We wish to have code which translates each possible character to a triple of ones and zeros representing whether they belong to each of the alphabets (much like the table in the question where C is 1 0 1). Once that is done we can sum across these and divide by the length to yield the ratios (between zero and one inclusive) - this is just S÷L (seen at the right of the code).

For any given character We know that if the ordinal is less than 256 it counts as English, if it is greater than 1024 it counts as Russian, and that if it is in-between 256 and 1024 it counts as Greek. As such taking the ordinal and integer dividing by 256 and then integer dividing the result by two yields 0 for the space and Latin characters (count as English), 1 for Hellenic (count as Greek), and 2 for Cyrillic (count as Russian). This is just O:⁹:2 in Jelly (seen at the left of the code).

If we rotate the triples of bits such that the natural alphabet bit* is the most significant then we can encode the lower two bits (as values between zero and three inclusive) in a look-up table with three rows and then rotate right by the numbers found above.

When we do this there are two things worthy of note - 1. Jelly has a rotate-left-by atom not a rotate-right-by one; 2. the Hellenic row of the look-up table would start with a zero (since Ξ is only Greek), thwarting a simple base-4 encoding (since leading zeros are not encodable). To alleviate (1) we can rotate-left by the negated value and to alleviate (2) we can encode our rows in reverse and index into them with the negative amount. This way we can negate both the row and column index with a single byte (N) as such our row and column indices may be calculated with O:⁹:2;ON.

Note that Jelly now has a multi-dimensional indexing atom, œị.

The table is formed from three large numbers which, once converted to base four, give the lower bits required for Cyrillic, Greek and Latin(+Space) respectively. They are of minimal length such that modular indexing by the negated ordinal values is possible - 47, 25, and 30 respectively (the .s are at unused indexes):

Click to copy

1: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 3 3 2 3 1 3 3 3 1 3 0 0 0 0 3 0 1 3 0 3 0 0 0 0 0 0
   . . . . . . . . . Я Ю Э Ь Ы Ъ Щ Ш Ч Ц Х Ф У Т С Р П О Н М Л К Й И З Ж Е Д Г В Б А Ё . . . . .

2: 3 2 3 1 0 3 1 3 0 2 3 3 0 0 3 2 3 3 0 0 3 2 3 0 1
   Μ Λ Κ Ι Θ Η Ζ Ε Δ Γ Β Α Ω Ψ Χ Φ Υ Τ Σ . Ρ Π Ο Ξ Ν

3: 3 3 0 0 0 3 0 0 0 3 3 2 3 0 3 0 2 3 0 0 3 0 1 3 3 0 0 3 0 2
   Y X W V U T S R Q P O N M L K J I H G F E D C B A . .   . Z

As an example consider the character Φ at Unicode point U+03A6 (which should yield [0,1,1]) it has an ordinal value of (3×16²+10×16+6 =) 934. (O:⁹:2 means 934//256//2 =) 1 identifying it as part of the Hellenic block. The ;O concatenates the ordinal giving us [1,934] and the N then negates both values giving us [-1,-934]. Since Jelly indexing is both 1-based and modular and there are three rows the -1 references the second of the three rows (row 2 in the above code-block), since the middle row has a length of 25 the -934 references the (-934%25 =) 16^th entry in that row, which is 2. The code then adds four (the most significant bit) giving us 6 which converted to binary is [1,1,0]. The code then rotates this left by each of [-1,-934] and takes the head (i.e. the rotation left by -1, a rotation right by 1) yielding [0,1,1] as required.

* English for space since it's grouped with the Latin characters

Commented code

Click to copy

O:⁹:2;ON©œị“...“...“...’b4¤+4Bṙ®Ḣµ€S÷L - Link: list of characters        e.g.: "СЯ"
                                 µ€    - for €ach character:                С       Я
O                                      -   cast to ordinal               1057    1071
  ⁹                                    -   literal 256
 :                                     -   integer division                 4       4
   :2                                  -   integer divide by 2              2       2
      O                                -   cast to ordinal               1057    1071
     ;                                 -   concatenate                  [2,1057] [2,1071]
       N                               -   negate                     [-2,-1057] [-2,-1071]
        ©                              -   copy to register for later
                          ¤            -   nilad followed by link(s) as a nilad:
           “...“...“...’               -     list of integers encoded in base 250 = [4951760157204492290900832256, 1043285073970097, 1081712651052809266]
                        b4             -     convert to base 4                    = [[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,3,3,2,3,1,3,3,3,1,3,0,0,0,0,3,0,1,3,0,3,0,0,0,0,0,0],[3,2,3,1,0,3,1,3,0,2,3,3,0,0,3,2,3,3,0,0,3,2,3,0,1],[3,3,0,0,0,3,0,0,0,3,3,2,3,0,3,0,2,3,0,0,3,0,1,3,3,0,0,3,0,2]]
         œị                            -   index into                       2       0                   ^--[-2,-1071]   [-2,-1057]--^
                           +4          -   add four                         6       4
                             B         -   convert to binary             [1,1,0] [1,0,0]
                               ®       -   recall from register       [-2,-1057] [-2,-1071]
                              ṙ        -   rotate left         [[1,0,1],[0,1,1]] [[0,0,1],[1,0,0]]
                                Ḣ      -   head                          [1,0,1] [0,0,1]
                                   S   - sum                                 [1,0,2]
                                     L - length                                 2
                                    ÷  - divide                            [0.5,0,1]
                                       -   i.e.: 50.0% Latin, 0% Greek, 100% Russian

Ruby, 165 bytes

Click to copy

->s{(0..2).map{|x|s.chars.map{|c|o=c.ord;(o<33?7:"ĝ]ē¯]÷W59WUė½ñĝĕ×ßoĝėÏė55#{?!*15}"[o-[913,1040,65][y=o>>7<=>7]].ord+226>>3*-~y)[x]*1.0}.sum/s.size}}

Try it online!

Edit: Significantly golfed the code, and most importantly, squeezed 3 translation sequences into one UTF-8 string. The original longer code is kept below for better readability and explanation of the logic.

Ruby, 211 bytes

Click to copy

->s{(0..2).map{|x|s.chars.map{|x|o=x.ord;o<33?7:o<91?"77517117317173771117111773"[o-65]:o<938?"7762737237673276702776722"[o-913]:"74764744444767776757767#{?4*15}"[o-1040]}.inject(0.0){|y,z|y+=z.to_i[x]}/s.size}}

Try it online!

May not be the most efficient approach, but does the job. Uses a translation table for each alphabet with character occurrence in different scripts encoded by the bits of the number (in the order: Latin, Greek, Russian). The output is an array of percentages in the same order.

To fix the outlier Ё case I extended the Russian-only block of 4-s from 10 positions at the end of the alphabet to 15. This way, Ё gets picked correctly with negative index (and we are not required to handle lowercase letters than correspond to these extra indices).

Cross-Alphabetic Characters

JavaScript (ES6), 197 179 bytes

How?

Jelly, 56 bytes

How?

Ruby, 165 bytes

Ruby, 211 bytes

Tags:

Unicode

Statistics

Code Golf

Related

Recent Posts