Is it more secure to overwrite the value char[] in a String

By fiddling with the internal contents of String instances, you incur the risk of severely breaking your application.

The first reason is that String instances are supposed to be immutable, which means that instances may be reused; when you modify "your" string you may actually modify other strings that are conceptually distinct but happen to have the same contents. This kind of reuse can also happen internally, if String instances really refer an underlying char[] with a couple of index to delimit a chunk within that array. See this page for some more details. Generally speaking, code that uses String instances relies on their immutability, and breaking that invariant can lead to far-reaching disagreeable consequences.

A second reason is that the internal contents of String instances are not documented and may change. In fact, they already did so several times. If we consider only the Sun/Oracle JVM (already a bold move, since there are other JVM out there, e.g the one from IBM), then Java 6 versions (from update 21 onwards) can use compressed strings, meaning that the char[] is automatically converted to a byte[] if the characters happen to all be in the 0..255 range (i.e. all characters are really part of Latin-1). The "compressed strings" were designed to get top marks in some benchmarks, but were later on dropped (Java 7 does not have them). However, this is sufficient to show that the internal storage format may change without prior notice. And they did it again in Java 7 update 6.

Thus, using an alternate JVM, or simply updating your JVM to a later version (as is highly recommended when there are security holes to fix), can totally break your code, possibly silently, meaning that you get data corruption instead of a clean exception that merely kills your application. This is undesirable, so don't do it. You cannot reliably muck with how String instances are internally organized. As a side note, accessing private fields is also not a really viable option for Java applets (you cannot do it with an unsigned applet, for instance).

A third reason, and perhaps the most compelling of the three, is that overwriting sensitive values in memory does not (reliably) work in Java. To know why, you have to understand how garbage collection algorithms work (this article is a very nice introduction to the basics). From the programmer's point of view, things are simple: an object is allocated, sits there in RAM, and when the application code ceases to reference it, the GC reclaims the memory. Internally, though, things may differ. In particular, most efficient GC algorithms tend to move objects in memory, i.e. really copy them from place to place. This is invisible to your code, because the GC adjusts references: since Java is strongly typed, you cannot notice that the internal representation of a pointer changed (you cannot cast a reference to an integer, for instance). This kind of copying allows for faster GC operation and better locality (with regards to caches). However, it implies that several copies of your precious data may survive elsewhere in RAM, completely out of your reach. Even if you could reliably overwrite your String contents, this would only impact the current storage area for that instance, leaving ghost copies of it untouched.

(In the Sun/Oracle JVM, GC that internally copy objects appeared around Java 1.3. This can be seen in their design for library code; old code used char[] for passwords, so as to prevent automatic reuse as may happen with String, and promote manual overwriting; newer code uses String because the library designers understood that this overwriting would not be reliable anyway.)


Does this mean that Java is inherently insecure ? No, because the importance of overwriting sensitive data in memory is greatly exaggerated. The idea that Thou Shallt Overwrite Passwords And Keys is one of these inherited dogmas: something that was relevant in a specific case long ago, but is now applied and enforced by lots of people who receive it as Divine Wisdom and fail to understand what it is really about. Overwriting memory is a nice thing to do for application code that runs on compromised systems, when attackers are not very competent: the scenario is an average home owner with a PC full of malware. The malware has full control of the machine, but, being a simple automated piece of code, it does not really exploit that control; the malware simply scans the RAM for sequences of characters that look like, say, credit card information. So we are talking of doomed client systems that manage to survive only because the attackers prefer it that way, and the data scavenging may be (potentially) mitigated with prompt overwriting of sensitive data only because the human attackers that control the malware simply don't have time to do a proper job of extracting the interesting bits, and instead have to rely on the dumbest of brutal full-memory scans.

None of this applies to a server application, or to client code that handles secrets with an actual non-negligible value. If a malicious attacker is in position to scan the RAM for sensitive data, and that data is worth 1 or 2 minutes of explicit attention from the human attacker, then no amount of overwriting will save you. Thus, in many contexts where security matters, overwriting of passwords and keys is just wasted effort, that gives a feeling of security, but does not actually improves things (though it may be convenient to awe auditors).

Compounding the issue is the fact that when your sensitive data appears in your Java code, it has already gone through various layers that are out of your reach. For instance, if you read the password from a file, then copies of it are retained in RAM used as cache by the kernel, and possibly one or two bounce buffers maintained by Java as intermediaries between the native world and the abstraction that Java offers. If the password was received from network over SSL, then the password again went through the internal buffering of the SSL library, that you cannot control. If we are talking about a client application and the password was merely typed by the user, then any malware that can scan memory also runs a keylogger and got the password even before it reached your code.

Therefore, as a summary: no, using reflection to overwrite your password in memory does NOT really improve security. It makes your code much more liable to break (even upon a simple minor update of the JVM), but offers no actual tangible gain in security. So don't do it.


Note: we talked about Java here, but all of the above equally applies to most other programming languages and frameworks, including .NET (C#), PHP, Ruby, Node.js, Python, Go... If you really want to keep track of sensitive data, then you have to use a language that is close enough to the bare metal (assembly, C, Forth) and follow it throughout the system, including base libraries, the kernel, and the device drivers. If you simply concentrate on the application code, then you are guaranteed to miss the point.