Isn't "Dave's protocol" good if only the database, and not the code, is leaked?
Yes, But..
To make it nice and clear... We're talking about a database-only compromise when an attacker has access to the database but not the application source code. In that case the attacker will get the password hashes but will be unable to crack them and get the original passwords because of Dave's custom algorithm. So in the case of a database-only breach, yes, Dave's password algorithm will protect passwords more than if he had used MD5 or SHA1.
However That's only one possible avenue for system leaks. There is one key fact that trashes the "math" that makes Dave's homebrew algorithm seem reasonable.
Half of all breaches start internally.
(sources 1 2 3) Which is a very sobering fact, once you let it sink in. Of the half of breaches caused by employees, half of them are accidental and half are intentional. Dave's algorithm can be helpful if all you are worried about is a database-only leak. If that is all you are worried about though, then the threat model you are protecting against in your head is wrong.
To pick just one example, developers by definition have access to the application source code. Therefore if a developer gains read-only access to the production database they now have everything they need to easily crack the passwords. Dave's custom algorithm is now useless, because it relies on old and easy-to-crack hashes.
However, if Dave had used a modern password hashing algorithm and used both a salt and pepper, the developer who gained access to a database-only dump would have absolutely nothing useful at all.
That is just one random example but the overall point is simple: there are plenty of data leaks that happen in the real world where proper hashing would have stopped actual damage when Dave's algorithm could not.
In Summary
It's all about defense in depth. It's easy to create a security measure that can protect against one particular kind of attack (Dave's algorithm is a slight improvement over MD5 for protecting against database-only leaks). However, that doesn't make a system secure. Many real-world breaches are quite complicated, taking advantage of weaknesses at multiple points in a system in order to finally do some real damage. Any security measure that starts with the assumption "This is the only attack vector I have to worry about" (which is what Dave did) is going to get things dangerously wrong.
This doesn't answer the question about Dave's protocol specifically, but I wanted to address the more general question, for the Daves around the world who are writing their own hashes. There are a few things, Daves, that you need to realize:
- You are not a cryptographer. That's not a slight against you; I'm not one, either. But even if you were a cryptographer, you'd have to be the best in the entire world to be certain that your algorithm had no flaws which could compromise security, because even the experts mess up a lot (all four words are separate links). Among other things, potential flaws in hashes include:
- Accidental reversibility. Maybe you didn't mean it, but you put too much information into the "hash", and now it can be trivially reversed, even without brute-force. For an example of a "complex" algorithm which is nevertheless pretty easy to reverse, look at linear congruential generators.
- Not enough complexity on CPUs, GPUs, ASICs, etc. This is surprisingly hard to do; there's a reason there's only, like, three libraries to do password hashing, and they're all based off the same ideas. Unless you're intimately familiar with how GPUs and ASICs work, you're most likely going to build something that can be run much quicker on GPUs than CPUs, instantly negating any other protections you have.
- Too much complexity where you're actually running it, combined with the last point. It's very easy to point to your performance testing and say, "Look, it takes me 30 seconds to do 30 hashes, that's great!" Except you're, again, not a cryptographer or GPU dev, so you don't realize that your complex additions and multiplications can actually be replicated quite easily on GPUs, so they can crack 30 million hashes in 30 seconds, all the while DoSing your service by trying to log in more than once a second.
- Insufficient uniformity. A theoretically perfect password hash function's output is indistinguishable from a true random number generator's, when fed varying input. In practice, we can't quite get there, but we can get incredibly close. Your algorithm might not. And no, "looking" random does not mean it's actually close enough; if you're inexperienced enough to be writing your own secret crypto for "better" security, you're inexperienced enough not to know how to spot true randomness.
- Even if you build your algorithm entirely out of good, solid crypto primitives, you can still put them together wrong.
- You are not a cybersecurity programmer. There's probably a better word for that, but point is, you haven't specialized in writing code which correctly implements algorithms, even ones like your own. For a very brief list of possible issues which could be visible from the database alone, each of which is linked to the first Google result for "[item] attack":
- Frequency analysis
- Known-plaintext
- Data remanence
And all that is just thinking exclusively about offline attacks on databases, where the thinking is done by a college student who isn't even majoring in cybersecurity. I guarantee you I've missed quite a few things. I've completely skipped over all the other attack vectors for MITMs, malicious clients, etc. I've also omitted mention of every error that could happen even if you used an off-the-shelf product; consider it an exercise for the reader to figure out how you could use even good crypto wrong. Finally, I've entirely omitted the common class of errors where the developer uses encryption where they should be using hashes, which I see occasionally.
So, in sum, Dave, whenever you think you've got the best idea for a secret, internal hash to use for your production code and it isn't to use a standard, off-the-shelf, public, thoroughly tested product, remember this:
You don't.
Just use bcrypt. (Or Argon2)
(As a side note, if you're just building an algorithm for fun and/or self-education, feel free to ignore all of this. Building your own algorithm to protect passwords in production is dangerous because you'll build a weak algorithm that offers little to no protection. Building your own algorithm to see if you can break it is an excellent way to pass the time, stimulate your mind, and maybe even learn some crypto.)
In the case of a breach of the database and not the source code, Dave might have made things better compared to plain SHA1. But...
The source code is likely to be leaked too, as Conor Mancone explains.
The homebrew might screw up the hash, making it even less safe than just a plain SHA1. God knows how Daves strange contraption interacts with the internals of the hashing algorithm. If nothing else, Dave has created a little maintainance hell for those coming after him, and big messes are never good for security.
It gives a false sense of security. Had Dave not been so proud of his brilliant solution, he might have taken the time to read up on how to do password hashing properly. From the question, it is clear that Dave thinks what he has done is better than say bcrypt. It is not.
The little extra protection given by the homebrew algorithm could have been achieved with a pepper instead. That is better than a homebrew algorithm in every possible way.
So yes, a homebrew might be better than SHA1 under some very specific circumstances. If it is better on average is an open question, but the answer doesn't really matter. The point is that it is terrible compared to a real password hashing algorithm, and that is exactly what the home brewing stopped Dave from implementing.
Long story short, Dave fucked this up.