Is XSS possible when < is not escaped, but also removed if followed by a character?
No, in a HTML context you cannot inject new tags without allowing letters after the opening bracket. Still, this filtering technique is unnecessarily risky.
The HTML parser of your web browser parses code as a state machine. To understand what your options are, have a look at the HTML syntax specification and the possible state transitions.
Your injection point happens to be in the data state (which is the "default" state, outside of any tags):
8.2.4.1 Data state
Consume the next input character:
U+0026 AMPERSAND (&)
Switch to the character reference in data state.
"<" (U+003C)
Switch to the tag open state.
U+0000 NULL
Parse error. Emit the current input character as a character token.
EOF
Emit an end-of-file token.
Anything else
Emit the current input character as a character token.
For XSS, the only interesting continuation is to open a tag with <
and switch to the tag open state:
8.2.4.8 Tag open state
Consume the next input character:
"!" (U+0021)
Switch to the markup declaration open state.
"/" (U+002F)
Switch to the end tag open state.
Uppercase ASCII letter
Create a new start tag token, set its tag name to the lowercase version of the current input character (add 0x0020 to the character's code point), then switch to the tag name state. (Don't emit the token yet; further details will be filled in before it is emitted.)
Lowercase ASCII letter
Create a new start tag token, set its tag name to the current input character, then switch to the tag name state. (Don't emit the token yet; further details will be filled in before it is emitted.)
"?" (U+003F)
Parse error. Switch to the bogus comment state.
Anything else
Parse error. Switch to the data state. Emit a U+003C LESS-THAN SIGN character token. Reconsume the current input character.
Here your options are a-z
, A-Z
, !
,/
and ?
.
You have not been clear about whether special characters are blacklisted, too. But even if they are not, you're out of luck:
- From
<!
you only get to a comment (<!--
), a doctype declaration (<!DOCTYPE
) or a CDATA section (<![CDATA
). These are not real DOM nodes and hence they are useless for XSS. (For instance, you wouldn't be able to attach an event handler to a comment.) <?
might be interesting in XML, but is treated as a comment in HTML.</
will only let you close tags.
You may have noticed that the specification also doesn't tolerate any padding characters, such as spaces, tabs, newlines or control characters.
If you want to dig a little deeper and verify the implementation, you can always look into the source code. For example, this extract is part of the HTML5 tokenizer used in Mozilla Firefox. As you can see, the tag open state adheres closely to the specification:
case NS_HTML5TOKENIZER_TAG_OPEN: {
for (; ; ) {
if (++pos == endPos) {
NS_HTML5_BREAK(stateloop);
}
c = checkChar(buf, pos);
if (c >= 'A' && c <= 'Z') {
endTag = false;
clearStrBufAndAppend((char16_t) (c + 0x20));
state = P::transition(mViewSource, NS_HTML5TOKENIZER_TAG_NAME, reconsume, pos);
NS_HTML5_BREAK(tagopenloop);
} else if (c >= 'a' && c <= 'z') {
endTag = false;
clearStrBufAndAppend(c);
state = P::transition(mViewSource, NS_HTML5TOKENIZER_TAG_NAME, reconsume, pos);
NS_HTML5_BREAK(tagopenloop);
}
switch(c) {
case '!': {
state = P::transition(mViewSource, NS_HTML5TOKENIZER_MARKUP_DECLARATION_OPEN, reconsume, pos);
NS_HTML5_CONTINUE(stateloop);
}
case '/': {
state = P::transition(mViewSource, NS_HTML5TOKENIZER_CLOSE_TAG_OPEN, reconsume, pos);
NS_HTML5_CONTINUE(stateloop);
}
case '\?': {
if (viewingXmlSource) {
state = P::transition(mViewSource, NS_HTML5TOKENIZER_PROCESSING_INSTRUCTION, reconsume, pos);
NS_HTML5_CONTINUE(stateloop);
}
if (P::reportErrors) {
errProcessingInstruction();
}
clearStrBufAndAppend(c);
state = P::transition(mViewSource, NS_HTML5TOKENIZER_BOGUS_COMMENT, reconsume, pos);
NS_HTML5_CONTINUE(stateloop);
}
case '>': {
if (P::reportErrors) {
errLtGt();
}
tokenHandler->characters(nsHtml5Tokenizer::LT_GT, 0, 2);
cstart = pos + 1;
state = P::transition(mViewSource, NS_HTML5TOKENIZER_DATA, reconsume, pos);
NS_HTML5_CONTINUE(stateloop);
}
default: {
if (P::reportErrors) {
errBadCharAfterLt(c);
}
tokenHandler->characters(nsHtml5Tokenizer::LT_GT, 0, 1);
cstart = pos;
reconsume = true;
state = P::transition(mViewSource, NS_HTML5TOKENIZER_DATA, reconsume, pos);
NS_HTML5_CONTINUE(stateloop);
}
}
}
tagopenloop_end: ;
}
So, the XSS filter you describe seems to be safe per the HTML specification. It's very thin ice, tough. You never know, if some vendor comes up with a quirky implementation that could still enable an exploit. (Microsoft, I'm looking at you!)
The correct XSS protection is therefore to simply escape the brackets.
Very unlikely. The only vector I can see is a character encoding bug whereby the backend language (e.g. PHP) is configured to use a different character encoding than UTF-8, and it isn't encoding-aware when outputting the string.
That said, I'd say it's almost certainly not possible to get XSS in this instance. You could, however, get XSS in any location where you're already inside a tag (e.g. echo into an attribute).