Count the bytes of a program
Shell + coreutils, 6
This answer becomes invalid if an encoding other than UTF-8 is used.
wc -mc
Test output:
$ printf '%s' "(~R∊R∘.×R)/R←1↓ιR" | ./count.sh
17 27
$
In case the output format is strictly enforced (just one space separating the the two integers), then we can do this:
Shell + coreutils, 12
echo`wc -mc`
Thanks to @immibis for suggesting to remove the space after the echo
. It took me a while to figure that out - the shell will expand this to echo<tab>n<tab>m
, and tabs by default are in $IFS
, so are perfectly legal token separators in the resulting command.
GolfScript, 14 12 bytes
.,p{64/2^},,
Try it online on Web GolfScript.
Idea
GolfScript doesn't have a clue what Unicode is; all strings (input, output, internal) are composed of bytes. While that can be pretty annoying, it's perfect for this challenge.
UTF-8 encodes ASCII and non-ASCII characters differently:
All code points below 128 are encoded as
0xxxxxxx
.All other code points are encoded as
11xxxxxx 10xxxxxx ... 10xxxxxx
.
This means that the encoding of each Unicode character contains either a single 0xxxxxxx
byte or a single 11xxxxxx
byte (and 0 to 5 10xxxxxx
bytes).
By dividing all bytes of the input by 64, we turn 0xxxxxxx
into 0 or 1, 11xxxxxx
into 3, and 10xxxxxx
into 2. All that's left is to count the bytes whose quotient is not 2.
Code
(implicit) Read all input and push it on the stack.
. Push a copy of the input.
, Compute its length (in bytes).
p Print the length.
{ }, Filter; for each byte in the original input:
64/ Divide the byte by 64.
2^ XOR the quotient with 2.
If the return is non-zero, keep the byte.
, Count the kept bytes.
(implicit) Print the integer on the stack.
Python, 42 40 bytes
lambda i:[len(i),len(i.encode('utf-8'))]
Thanks to Alex A. for the two bytes off.
Straightforward, does what it says. With argument i
, prints the length of i
, then the length of i
in UTF-8. Note that in order to accept multiline input, the function argument should be surrounded by triple quotes: '''
.
EDIT: It didn't work for multiline input, so I just made it a function instead.
Some test cases (separated by blank newlines):
f("Hello, World!")
13 13
f('''
friends = ['john', 'pat', 'gary', 'michael']
for i, name in enumerate(friends):
print "iteration {iteration} is {name}".format(iteration=i, name=name)
''')
156 156
f("(~R∊R∘.×R)/R←1↓ιR")
17 27