How can sanitation that escapes single quotes be defeated by SQL injection in SQL Server?
There are a few cases where this escape function will fail. The most obvious is when a single quote isn't used:
string table= "\"" + table.Replace("'", "''") + "\""
string var= "`" + var.Replace("'", "''") + "`"
string index= " " + index.Replace("'", "''") + " "
string query = "select * from `"+table+"` where name=\""+var+"\" or id="+index
In this case, you can "break out" using a double-quote, a back-tick. In the last case there is nothing to "break out" of, so you can just write 1 union select password from users--
or whatever sql payload the attacker desires.
The next condition where this escape function will fail is if a sub-string is taken after the string is escaped (and yes I have found vulnerabilities like this in the wild):
string userPassword= userPassword.Replace("'", "''")
string userName= userInput.Replace("'", "''")
userName = substr(userName,0,10)
string query = "select * from users where name='"+userName+"' and password='"+userPassword+"'";
In this case a username of abcdefgji'
will be turned into abcdefgji''
by the escape function and then turned back into abcdefgji'
by taking the sub-string. This can be exploited by setting the password value to any sql statement, in this case or 1=1--
would be interpreted as sql and the username would be interpreted as abcdefgji'' and password=
. The resulting query is as follows:
select * from users where name='abcdefgji'' and password=' or 1=1--
T-SQL and other advanced sql injection techniques where already mentioned. Advanced SQL Injection In SQL Server Applications is a great paper and you should read it if you haven't already.
The final issue is unicode attacks. This class of vulnerabilities arises because the escape function is not aware of multi-byte encoding, and this can be used by an attacker to "consume" the escape character. Prepending an "N" to the string will not help, as this doesn't affect the value of multi-byte chars later in the string. However, this type of attack is very uncommon because the database must be configured to accept GBK unicode strings (and I'm not sure that MS-SQL can do this).
Second-Order code injection is still possible, this attack pattern is created by trusting attacker-controlled data sources. Escaping is used to represent control characters as their character literal. If the developer forgets to escape a value obtained from a select
and then uses this value in another query then bam the attacker will have a character literal single quote at their disposal.
Test everything, trust nothing.
With some additional stipulations, your approach above is not vulnerable to SQL injection. The main vector of attack to consider is SQL Smuggling. SQL Smuggling occurs when similiar unicode characters are translated in an unexpected fashion (e.g. ` changing to ' ). There are several locations where an application stack could be vulnerable to SQL Smuggling.
Does the Programming language handle unicode strings appropriately? If the language isn't unicode aware, it may mis-identify a byte in a unicode character as a single quote and escape it.
Does the client database library (e.g. ODBC, etc) handle unicode strings appropriately? System.Data.SqlClient in the .Net framework does, but how about old libraries from the windows 95 era? Third party ODBC libraries actually do exist. What happens if the ODBC driver doesn't support unicode in the query string?
Does the DB handle the input correctly? Modern versions of SQL are immune assuming you're using N'', but what about SQL 6.5? SQL 7.0? I'm not aware of any particular vulnerabilities, however this wasn't on the radar for developers in the 1990's.
Buffer overflows? Another concern is that the quoted string is longer than the original string. In which version of Sql Server was the 2GB limit for input introduced? Before that what was the limit? On older versions of SQL, what happened when a query exceeded the limit? Do any limits exist on the length of a query from the standpoint of the network library? Or on the length of the string in the programming language?
Are there any language settings that affect the comparison used in the Replace() function? .Net always does a binary comparison for the Replace() function. Will that always be the case? What happens if a future version of .NET supports overriding that behavior at the app.config level? What if we used a regexp instead of Replace() to insert a single quote? Does the computer's locale settings affect this comparison? If a change in behavior did occur, it might not be vulnerable to sql injection, however, it may have inadvertently edited the string by changing a uni-code character that looked like a single quote into a single quote before it ever reached the DB.
So, assuming you're using the System.String.Replace() function in C# on the current version of .Net with the built-in SqlClient library against a current (2005-2012) version of SQL server, then your approach is not vulnerable. As you start changing things, then no promises can be made. The parameterized query approach is the correct approach for efficiency, for performance, and (in some cases) for security.
WARNING The above comments are not an endorsement of this technique. There are several other very good reasons why this the wrong approach to generating SQL. However, detailing them is outside the scope for this question.
DO NOT USE THIS TECHNIQUE FOR NEW DEVELOPMENT.
DO NOT USE THIS TECHNIQUE FOR NEW DEVELOPMENT.
DO NOT USE THIS TECHNIQUE FOR NEW DEVELOPMENT.
Using query parameters is better, easier, and faster than escaping quotes.
Re your comment, I see that you acknowledged parameterization, but it deserves emphasis. Why would you want to use escaping when you could parameterize?
In Advanced SQL Injection In SQL Server Applications, search for the word "replace" in the text, and from that point on read some examples where developers inadvertently allowed SQL injection attacks even after escaping user input.
There is an edge case where escaping quotes with \
results in a vulnerability, because the \
becomes half of a valid multi-byte character in some character sets. But this is not applicable to your case since \
isn't the escaping character.
As others have pointed out, you may also be adding dynamic content to your SQL for something other than a string literal or date literal. Table or column identifiers are delimited by "
in SQL, or [
]
in Microsoft/Sybase. SQL keywords of course don't have any delimiters. For these cases, I recommend whitelisting the values to interpolate.
Bottom line is that escaping is an effective defense, if you can ensure that you do it consistently. That's the risk: that one of the team of developers on your application could omit a step and do some string interpolation unsafely.
Of course, the same is true of other methods, like parameterization. They're only effective if you do them consistently. But I find it's easier and quicker to use parameters, than to figure out the right type of escaping. Developers are more likely to use a method that is convenient and doesn't slow them down.