What does "random" mean in the context of password creation?
"Random" means: "that which the attacker does not know".
The important point to understand is that attack costs are always on average. They don't make sense on a single data point. An attacker may always get lucky and find the right password on his first try. This is merely improbable.
If you generate passwords as sequences of purely random characters, then you may obtain "HelloWorld"; but usually you won't, and, crucially, the attacker won't be able to guess with non-negligible probability that your password consists of two concatenated English words because, on average, it does not.
One way to say it is that password entropy is not a property of the password, but of the process that generated the password; and it does not impact the contents of a single password, but the average contents of passwords, taken over sufficiently many experiments. More on password entropy here.
Averages are still the important notion because the attacker, like everybody else, thinks in terms of economics (although he, like most other people, is not completely aware of it). The attacker won't bother attacking your password if his chances of breaking it are lower than his chances of winning millions of dollars at the lottery. Even if he may always "get lucky", the lottery is much less effort, and 50 millions of dollars are a lot more rewarding than an access to your Facebook account.
“Random” means that all possibilities in the search space (passwords of up to N character chosen in a set S) have the same probability (up to a small tolerance). The intent is that the adversary (the guy who wants to break the encryption by guessing the password) has no better strategy than try all possible passwords. With a random password, the adversary has to try half the passwords in the search space to get a 50% chance of guessing right.
Let's say you're generating a 10-character password where each character is a lowercase or uppercase letter. That's 5210 ≈ 1.45⋅1017 possibilities, i.e. over a hundred million billion. The probability of generating gkwwpBnePU
as a password is thus one in a hundred million billion and some change. The probability of generating HelloWorld
is exactly the same, so there is no advantage to you to pick one over the other: the two choices are equally strong.
Sure, the attacker could guess HelloWorld
. But they have an equal chance of guessing gkwwpBnePU
.
If you know that the attacker is working off a dictionary, then you may want to avoid words in this dictionary. However this is only useful if the dictionary represents a significant fraction of your password space. If that's the case, your password space isn't large enough.
Let's say the attacker's dictionary consists of a million words and he'll try two words together. That's pretty large already — 1012 cracking attempts will require a small cluster of computers to carry out in a reasonable time. There's less than a chance in 100000 that your randomly-generated password is in that search space. You gain a tiny advantage in avoiding this search space, but there's a cost. First, you're adding complexity (so adding a risk of bugs, e.g. to accidentally eliminate more of the search space than you intended). Second, you don't really know what the adversary will do. Maybe one adversary uses this particular dictionary, but another adversary doesn't (and even the first guy will change their strategy once they find out what your password generation policy is). For any adversary who doesn't use this particular dictionary, you're helping them by restricting your password space. So this is counterproductive.
Choosing a password from an obscure language would be even worse. No matter how obscure the language is, if you have a dictionary for it, then you can assume that your adversary has one. Restricting to dictionary words would immensely reduce the search space and raise the capacity to find the password by brute force from infeasible to easy.
Even if I generate a sequence of characters using the best "randomizer" ever, the chances that I get
HelloWorld
and the chances that I get f.ex.gkwwpBnePU
are in my understanding exactly the same, so does "random" in this context mean "as distant as possible from any real word"? But if yes, doesn't this make the password not-so-random after all?
Yes. This is a known problem with excluding so-called "weak keys" in cryptography. By excluding certain classes of weak keys, the remaining key-space has been reduced. From time to time, key selection algorithms have popped up that accidentally excluded almost all keys, leaving a very small search space for attackers.
The reasoning you describe is a typical precursor to this type of error: If certain keys are "weak", then surely the complete opposite of those keys would be "strong", right? However, if you try to find a key that is "the exact opposite" of a common phrase like Hello World
, it will be guessable to an attacker that applies your "exact opposite" mapping function to common phrases.
There is a huge difference between avoiding weak keys, and choosing only keys that are maximally distant from a weak key according to some distance metric (the latter is a serious mistake stemming from a misunderstanding about threat models and probabilities; don't do that).
So, avoid weak keys like Hello World
, but not to the extent of narrowing the key choice down to a search space that is as small as the set of "weak" keys.