Empty strings at the beginning and end of split
From the fine manual:
split(pattern=$;, [limit]) → anArray
[...]
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
So trailing "null fields" are suppressed because the documentation says they are. If you want the trailing empty string, ask for it:
'abc'.split(/c/, -1) # [ 'ab', '' ]
Why would it behave that way? Probably because it matches Perl's split
behavior:
If
LIMIT
is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.
and we see that using a negative limit
, again, gives us the trailing empty string:
$ perl -e 'print join(",", split(/c/, "abc")), "\n"'
ab
$ perl -e 'print join(",", split(/c/, "abc", -1)), "\n"'
ab,
Why copy Perl's behavior? Ask Matz.
After reading AWK's specification following mu is too short, I came to feel that the original intention for split
in AWK was to extract substrings that correspond to fields, each of which is terminated by a punctuation mark like ,
, .
, and the separator was considered something like an "end of field character". The intention was not splitting a string symmetrically into the left and the right side of each separator position, but was terminating a substring on the left side of a separator position. Under this conception, it makes sense to always have some string (even if it is empty) on the left of the separator, but not necessarily on the right side of the separator. This may have been inherited to Ruby via Perl.