Count the bytes of a program

Shell + coreutils, 6

This answer becomes invalid if an encoding other than UTF-8 is used.

wc -mc

Test output:

$ printf '%s' "(~R∊R∘.×R)/R←1↓ιR" | ./ 
     17      27

In case the output format is strictly enforced (just one space separating the the two integers), then we can do this:

Shell + coreutils, 12

echo`wc -mc`

Thanks to @immibis for suggesting to remove the space after the echo. It took me a while to figure that out - the shell will expand this to echo<tab>n<tab>m, and tabs by default are in $IFS, so are perfectly legal token separators in the resulting command.

GolfScript, 14 12 bytes


Try it online on Web GolfScript.


GolfScript doesn't have a clue what Unicode is; all strings (input, output, internal) are composed of bytes. While that can be pretty annoying, it's perfect for this challenge.

UTF-8 encodes ASCII and non-ASCII characters differently:

  • All code points below 128 are encoded as 0xxxxxxx.

  • All other code points are encoded as 11xxxxxx 10xxxxxx ... 10xxxxxx.

This means that the encoding of each Unicode character contains either a single 0xxxxxxx byte or a single 11xxxxxx byte (and 0 to 5 10xxxxxx bytes).

By dividing all bytes of the input by 64, we turn 0xxxxxxx into 0 or 1, 11xxxxxx into 3, and 10xxxxxx into 2. All that's left is to count the bytes whose quotient is not 2.


                (implicit) Read all input and push it on the stack.
.               Push a copy of the input.
 ,              Compute its length (in bytes).
  p             Print the length.
   {     },     Filter; for each byte in the original input:
    64/           Divide the byte by 64.
       2^         XOR the quotient with 2.
                If the return is non-zero, keep the byte.
           ,    Count the kept bytes.
                (implicit) Print the integer on the stack.

Python, 42 40 bytes

lambda i:[len(i),len(i.encode('utf-8'))]

Thanks to Alex A. for the two bytes off.

Straightforward, does what it says. With argument i, prints the length of i, then the length of i in UTF-8. Note that in order to accept multiline input, the function argument should be surrounded by triple quotes: '''.

EDIT: It didn't work for multiline input, so I just made it a function instead.

Some test cases (separated by blank newlines):

f("Hello, World!")
13 13

friends = ['john', 'pat', 'gary', 'michael']
for i, name in enumerate(friends):
    print "iteration {iteration} is {name}".format(iteration=i, name=name)
156 156

17 27