Are Blanks, Spaces, and tabs part of a string?
Since your tag indicates "Regular expression", I assume you are referring to the POSIX character classes [:blank:]
and [:space:]
.
This overview table shows that [:blank:]
is a subset of [:space:]
:
[:space:]
contains everything usually designated as "whitespace characters", i.e. "space" (the character\x20
, generated when pressing the "space" bar), horizontal tab, vertical tab, formfeed etc.[:blank:]
contains only those characters which produce "empty space" within the same line, i.e. "space" and horizontal tab\t
.(*)
And yes, in the context of computer input, all these are characters and should therefore also be thought of as characters when designing a regular expression.
Update Here is a similar discussion.
(*) Note: as pointed out by Stéphane Chazelaz, there are BSD-based implementations where [:blank:]
can also contain vertical tabulation and formfeed.
There is no such thing as "blank", in this context. All you have are characters, and some characters that don't actually print anything visible to you in normal text. However, everything is expressed in terms of characters, yes. There are quite a few non-printing characters in ASCII, you can find a full list here: https://web.itu.edu.tr/sgunduz/courses/mikroisl/ascii.html. The ones you are likely to encounter in text files are the various whitespace characters which are:
- Space:
- Tab:
\t
- Newline:
\n
- Carriage return:
\r
And, less commonly:
- Bell:
\a
- Backspace:
\b
- Vertical tab:
\v
- Form feed:
\f
You also have the NULL (\0
) which is non-printing but doesn't appear in text files, as well as the special escape (\e
or ^[
) and Control-Z (^Z
) characters but, again, not really found in text files.
Relevant links
- https://en.wikipedia.org/wiki/Control_character
- https://www.asciitable.com/
So, a "blank" can be a space or a tab or another whitespace character. Or, if you are working with Unicode and not ASCII, you have various other weird things as well. But no matter what you have, they will be characters. When you see whitespace in text, the computer sees some character. A "blank" is never the absence of a character, it is always the presence of a non-printing character.