Secure Implementation of Password Database
The current general best practices for authentication are in the NIST SP 800-63-3 Digital Identity Guidelines standards, especially in SP-63B Authentication and Lifecycle Management.
These NIST standards are an easy read for developers, and besides telling you what to do, it also talks about why you want to do certain things. (If you want more details than the NIST standards give, we're happy to help.)
That said, let's look over your current authentication system and address your concerns:
The user is prompted for username and password on registration page
The registration page?
This is probably mistyped and you meant a login page, but if there is a registration page that anyone on your network can access, then you need to explicitly define a seperate authorization system. For example, anyone can register for an account on any online store, but this account doesn't give you access to the part of their site that lets admins change the prices on their products.
The server hashes the password using SHA256 to create hashes of equal-length
This step doesn't add any security, or improve system performance in any meaningful way, especially since you're using bcrypt, which will discard any part of the password past the first 72 characters anyways.
It is unlikely, but this step could reduce entropy. The reduction in entropy is insignificant in the grand scheme of things, and requires users to already be using long and randomly generated passwords, but since this is an extra step that doesn't improve security, I suggest leaving it out.
A random salt is generated using a reasonable work-factor.
I'll admit that I was very confused seeing this step at first. If this were about car maintenance, it would make as much sense to me as "Added synthetic oil with a reasonable octane rating." However, in the comments it was revealed that the specific bcrypt library being used is the bcrypt library hosted by PyPI.
I can't find the source code (and am allergic to languages that use whitespace to delineate scope), but based on the documentation, it appears that the library's function call to generate the bcrypt's parameters is named bcrypt.gensalt(workfactor)
, and this method itself takes a work factor as its parameter... Extending the car maintenance metaphor, it would be as if there were a function named vehicle.refuel(viscosity)
, or vehicle.changeoil(octane)
.
A salt, in authentication and cryptography jargon, is a random value that is added to a plaintext, to make it infeasible to pre-calculate the output of a cryptographic hash function or key derivation function. The salt itself is not a secret; its only strength is that it's not known beforehand.
A salt's value is measured in how long it is, also called its entropy. (Entropy in cryptography is a nuanced topic, that depends on more than just length, but for any reasonable library that creates a salt, the longer the salt, the more entropy it has.)
A work-factor, on the other hand, is jargon that is specific to key-stretching algorithms, such as this bcrypt KDF. The work factor defines how much work the CPU (or GPU, FPGA, or ASIC when you're the attacker) needs to do, and its main feature is that it takes longer to calculate the output, using steps that can't be bypassed, guessed, or ignored. This may seem counterintuitive; you want your application to run as fast as possible, right?
Well, the general risk is that your database of passwords will be leaked. Top retailers leak passwords all the time, and you don't have the millions of dollars to spend securing applications like top retailers do, so assume that your database of passwords might eventually be leaked, too.
The tradeoff here is, when logging in, your user will have to wait an extra second. When an attacker is cracking your passwords, they can only guess one password per second, per CPU that they put to the task. (Motivated attackers can take some shortcuts, but even 10 guesses per second is MUCH better than billions of guesses per second if you're just using single round SHA256 instead of a key-stretching algorithm.)
So, the salt and the work factor are both important parameters that have nothing to do with each other. I'm sorry that your first exposure to the concept was through a very poorly named function.
The hashed password is hashed again, this time with the salt. (The salt generation and hash uses python's bcrypt)
Minor nitpick, but this is because I'm being thorough: The results of bcrypt aren't a hash, even though many people call it a hash, and everyone here will know what you're talking about if you continue to call it a hash. The results are a stretched password or a derived key.
The username, work-factor, salt, and hash are stored in the password database.
If the result looks something like this, then you already have the work factor, salt, and derived key all together:
$2a$08$0SN/h83Gt1jZMR6924.Kd.HaK3MyTDt/W8FCjUOtbY3Pmres5rsma
The 2a
is the algorithm (bcrypt with unicode support).
The 08
is the work factor.
The next 22 characters, 0SN/h83Gt1jZMR6924.Kd.
, are the salt.
And the rest is the derived key.
If you're getting something different from your bcrypt library, find a different library.
When a user attempts to log in, the entered password is hashed in the same way and compared to the record in the database. (The salt used for comparison is fetched from the password database using the entered username as the key.)
Great. However, the library that you're using is already doing the heavy lifting for you, so use your library to its fullest. The bcrypt.checkpw
function will pull out the work factor, salt, and derived key for you. It will then run its KDF and compare the results.
Using this library function means that, if you decide to change the default work factor in the future, you don't have to have separate code to handle the older derived keys, as bcrypt.checkpw
will be able to figure out the desired work factor on its own.
If the hashes match, a cookie is given to the user which grants them access to the restricted web pages.
This sounds great on the face of it... but make sure that the cookie itself doesn't contain the authorization. It should contain a randomly generated string (session ID) and when accessing the site, the server should look at that session ID and check its own resources to see if that session ID is properly authorized. Clean up the session IDs after a while, too.
If your cookie contains "is_authorized=true" instead of a session ID where the server double checks its own resources, then people can just make their own authorization cookies and will never need to authenticate.
Concerns:
Saving the hash, work-factor, and salt in the same database seems like bad practice.
The salt and work factors are not sensitive in any way, but are necessary to validate a password. At best, any attempt to hide the salt will result in security-through-obscurity, which isn't any sort of security at all. You might as well store the work factor and salt right alongside the derived key, to save yourself the headache, because splitting them up aren't going to prevent an attacker from getting to them.
Does this method of hashing (reasonably) ensure that no one can determine a users password, including myself?
As far as your responsibility as a web developer? Yes.
And thank you for asking if it reasonably ensures that no one can determine a password. They can still be guessed in offline attacks, but using a key stretching algorithm such as bcrypt will greatly hinder attackers, even in the worst-case scenario of an offline attack against a leaked database.
The only thing you can do better, is to add 2fa and demand that your users use password managers with 24+ character long, truly random passwords that are unique for every site.
Note that password resets should not depend only on this second factor. As per the NIST guidelines, it should depend on the same identifying factors used in setting up the account in the first place (and make as much noise as possible on as many user communication channels as your application knows about) and utilize at least one additional factor, if available.
For most web applications, this means an email to the account used during account registration (NOT text messages to a cell phone, or security questions by themselves, though these can be additional factors after the email is sent). For an internal business application, this means the user has to make a call to the sysops team/helpdesk (or, more realistically, they need to stop by the developer's desk).
Everything currently uses http. Is https necessary if all access is within the local network?
You have to trust your local network to some extent. However, it never hurts to add HTTPS. If you're concerned at all, add HTTPS. (The fact that you asked this questions means that, yes, you are concerned. So yes, you should add HTTPS.)
Adding HTTPS is a good idea for non-security reasons as well, as web servers (and browsers) will use HTTP/2, if available, only over HTTPS connections. This can speed up your site noticeably. It doesn't apply in your case, as you're using a Python-script based web server, which is unlikely to have HTTP/2 capabilities, but it's a good habit to get in for reasons beyond "just" security. (Security is more than enough justification in my opinion, but there are managers who need additional convincing.)
Is using a cookie a good way to validate that a user has logged in successfully?
If that cookie contains a session ID rather than authentication information, that session ID is random, and the server verifies the session ID, then yes. This is standard practice for the vast majority of websites with any sort of user identification.
The salt isn't supposed to be secret. Its purpose is to be different for every password, so the hashed database can't be attacked with rainbow tables. The other parameters you mention aren't supposed to be secret either. So you can store all this information in the same database, which is how most applications do it. The database table that contain the hashes usually also contain the salts and all the other information about the hashing algorithm.
If you use a good method for hashing (use the appropriate library, don't try to implement a hashing algorithm yourself!), no one will be able to recover the passwords without bruteforcing them. That doesn't mean it will be impossible though! Weak passwords will remain weak passwords, so 12345
and admin
will still be easy to guess of course.
HTTPS is important everywhere, everytime, for everybody. Attackers aren't only out there, far away, in foreign countries. An attacker might be your collegue, an ex-collegue, a visitor, somebody who breaks in, or even an infected router or any other device. So get rid of HTTP whenever you can. Hopefully one day HTTP will become extinct.
Cookies are ok for session management, but there are several details you need to be careful about. For example use HTTP-only cookies, make sure the session IDs contained in the cookies are truly random and impossible to guess, decide if you need to set an expiration time (should the user will be logged out after some time of inactivity?), etc.
I am not sure where you intend to place this login screen but I assume that you will rewrite part of the servers to add a login page. I do not really understand what is "the server" written in Python you are referring to.
Anyway, do not reinvent how to store the passwords. There are methods for that and since you mention Python, a good starting point may be the Django documentation on that subject.
This will also take care of how to keep the session (through a cookie for instance)
As a general overview about password storage, see this 2013 answer by Thomas Pornin. You may consider using Argon2 as the hash algorithm.
Finally, you may want to read OWASP positioning on that matter, which mentions that (emphasis mine)
As with most areas of cryptography, there are a many different factors that need to be considered, but happily, the majority of modern languages and frameworks provide built-in functionality to help store passwords, which handles much of the complexity.