use of colon symbol in regular expression
I've decided to go you one better and explain the entire regex:
^ # anchor to start of line
( # start grouping
( # start grouping
[\w]+ # at least one of 0-9a-zA-Z_
: # a literal colon
) # end grouping
? # this grouping is optional
\/\/ # two literal slashes
) # end capture
? # this grouping is optional
(
(
[\d\w] # exactly one of 0-9a-zA-Z_
# having \d is redundant
| # alternation
% # literal % sign
[a-fA-f\d]{2,2} # exactly 2 hexadecimal digits
# should probably be A-F
# using {2} would have sufficed
)+ # at least one of these groups
( # start grouping
: # literal colon
(
[\d\w]
|
%
[a-fA-f\d]{2,2}
)+
)? # Same grouping, but it is optional
# and there can be only one
@ # literal @ sign
)? # this group is optional
(
[\d\w] # same as [\w], explained above
[-\d\w]{0,253} # includes a dash (-) as a valid character
# between 0 and 253 of these characters
[\d\w] # end with \w. They want at most 255
# total and - cannot be at the start
# or end
\. # literal period
)+ # at least one of these groups
[\w]{2,4} # two to four \w characters
(
: # literal colon
[\d]+ # at least one digit
)?
(
\/ # literal slash
(
[-+_~.\d\w] # one of these characters
| # *or*
% # % with two hex digit combo
[a-fA-f\d]{2,2}
)* # zero or more of these groups
)* # zero or more of these groups
(
\? # literal question mark
(
&? # literal & or & (semicolon optional)
(
[-+_~.\d\w]
|
%
[a-fA-f\d]{2,2}
)
=? # optional literal =
)* # zero or more of this group
)? # this group is optional
(
# # literal #
(
[-+_~.\d\w]
|
%
[a-fA-f\d]{2,2}
)*
)?
$ # anchor to end of line
It's important to understand what the metacharacters/sequences are. Some sequences are not meta when used in certain contexts (especially a character class). I've cataloged them for you:
meta with no context
^
-- zero width start of line()
-- grouping/capture?
-- zero or one of the preceding sequence+
-- one or more of the preceding sequence*
-- zero or more of the preceding sequence[]
-- character class\w
-- alphanumeric characters and_
. Opposite of\W
|
-- alternation{}
-- length assertion$
-- zero width end of line
This excludes :
, @
, and %
from having any special/meta meaning in the raw context.
meta inside character class
]
ends the character class. -
creates a range of characters unless it is at the start or the end of the character class or escaped with a backslash.
grouping assertions
A (?
combination starts a grouping assertion. For example, (?:
means group but do not capture. This means that in the regex /(?:a)/
, it will match the string "a"
, but a
is not captured for use in replacement or match groups as it would be from /(a)/
.
?
can also be used for lookahead/lookbehind assertions with ?=
, ?!
, ?<=
, ?<!
. (?
followed by any sequence except what I mentioned in this section is just a literal ?
.
There is no special use for colon :
in your case :
(([\w]+:)?\/\/)?
will match http://
, https://
, ftp://
...
You can find one special use for colon : every capturing group starting by (?:
won't appear in the results.
Example, with "foobarbaz" in input :
/foo((bar)(baz))/
=>{ [1] => 'barbaz', [2] => 'bar', [3] => 'baz' }
/foo(?:(bar)(baz))/
=>{ [1] => 'bar', [2] => 'baz' }
Colon :
is simply colon. It means nothing, except special cases like, for example, clustering without capturing (also known as a non-capturing group):
(?:pattern)
Also it can be used in character classes, for example:
[[:upper:]]
However, in your case colon is just a colon.
Special characters used in your regex:
In character class [-+_~.\d\w]
:
-
means-
+
means+
_
means_
~
means~
.
means.
\d
means any digit\w
means any word character
These symbols have this meaning because they are used in a symbol class []
.
Without symbol class +
and .
have special meaning.
Other elements:
=?
means=
that can occur 0 or 1 times; in other words=
that can occur or not, optional=
.