When encrypting cloud-stored data, is it acceptable to use the user's account password?
If you do the encryption and decryption on the server-side, there is always a chance for an administrator to decrypt the data without the knowledge of the user, by modifying the system so that when a user legitimately decrypts his or her data, the decryption key is stored to be used at the leisure of other interested parties, for instance, in the service of warrants.
That said, the scheme you describe is generally along the lines of how systems like this are in fact built, and in normal use, are reasonably secure. There are a couple of caveats, or clarifications, however.
You would not use the password directly as a key. You would use a PBKDF such as bcrypt, scrypt, or PBKDF2 to turn the password into a strong random key using as high a work factor as makes sense for your application.
You would generally not use this key to encrypt the data directly. You would generate a strong random key on the server that will be the actual data encryption key (DEK). The key derived from the password will then be used as a key-encryption-key (KEK) to encrypt the DEK. This way, if a user decides to change their password, you generate a new KEK, and you only have to re-encrypt the DEK, and not all of the data.
With this system, you're not actually required to store a password hash at all. When a user logs in, you merely need to derive the KEK, decrypt the DEK, and determine if it can correctly decrypt data for the user. If it can, the password is correct. If not, it isn't, and you can fail the authentication attempt. This may or may not be desirable depending on the application.
This has (at least) two big flaws:
- users can't easily change their passwords
- if the data leaks, it's protected by a likely weak password
Better is to use a full-strength encryption key generated randomly, and encrypt that with the user's password. Users can change their password easily (you only have to decrypt/reëncrypt one thing), and if encrypted data gets in the hands of attackers, it's protected by 128 or 256 bits of entropy.
You're right to be paying careful attention to client login/authentication/personal data systems, but your initial post only considers users' private encrypted data in terms of encryption techniques, not in terms of how its then accessed and stored. I'd highlight especially the reliance on approaches likely to have a single point of failure in the pathway, or a general hope that your software/systems will be "well enough set up". Long term that's not a very safe assumption, although it's very commonly made "by default". Assume this instead:
- Almost any widely used software will regularly have new vulnerabilities discovered. Your chosen softwares/DBs/OSes/web-facing services too.
- Many businesses will be targeted. Yours too.
So "layering" security matters here as much as anywhere. That means don't make the encryption become a "bottleneck" able to take your users' data security with it. Don't trust just a one-layer system (of encryption or any other kind) to keep it secure. Work with the mental view that at some point your DB and the encrypt/decrypt pathway or certificates will turn vulnerable at the same time, allowing attackers or insiders to read stored private user data if there is no "further hurdle" standing in the way, and face that threat directly as well.
So here is a further mix of ideas for security measures for designing-in, on top of everything else. I've described it in terms of authentication/login for ease, but the same principle holds true for your entire stored data from clients if it's sensitive:
Split the encrypted login data for each user account into two or more and separate them. For example, put the encrypted passwords or user data on one SQL DB server - but ensure that individualised random salts needed to decrypt them on a second independent SQL DB server. Or, once encrypted, store individualised per-account decryption keys in two halves on two separated DB servers.
Basically it forces a person to break into and exfiltrate two servers not one. They also have to leave two audit trails not one.
Make sure that a person gaining access to one doesn't trivially also find that this gets them access to the other (different DB engines/OS flavours to mitigate RDBMS/OS vulnerabilities, perhaps different login credentials or sites if it's viable, to make it harder for a single individual to gain physical/logical access without flagging an alert).....
The future "person gaining access" could as easily be an insider as an outsider, or pivot using other accessible devices on your networks or that you and your staff own (staff working at home for example), so make it hard for insiders to copy/skim/replicate the DB other than in controlled scripted ways.
Then look at access control for those DB servers. Perhaps the web service servers which are public facing and probably on the same LAN as most staff, are higher risk, but the authentication/client RDBMS servers only need maintenace access by very few staff, and are dedicated to just storing the client data, making any abuse/exception stand out much more sharply against the lower "background chat" in the logs. So perhaps the servers holding sensitive data might be secured by watching for unusual access patterns, being on better hardened or specialist hosted separate networks run by a reputable business (not yours in case it's an insider), perhaps they might only be responsive in a very limited way to the outside, for example restricting the IPs they respond to, to your own IPs for the servers providing provide the public-facing web service, and a dedicated console used for the client-login authentication DB server maintainance only, or only responsive to very limited and strictly formated plaintext requests that are translated into DB calls locally (to prevent SQL or other common API abuses)......
Then look at the rest of the pathway in the same way. Where can an attacker sit in the wider system, that lets them grab data or enhance their access in meaningful ways (or perhaps access to data that becomes meaningful once combined with other data they get). Who actually has to be able to sit there and can the inherent scope for harm be reasonably limited?
For example, very few people need access to a login/authentication server, or private keys/certs. Physical and logical access may only be needed by a handful of people/devices as well.
(If you can't do "end to end" encryption fully, perhaps you can still to an extent "black box" that side of things. For example, maybe the servers doing the actual encrypt/decrypt are only accessible this way, the DB servers holding encryption keys/data-at-rest are only accessible that way, and each of these only has very limited and well-defined access needs for maintenance and only communicates with specific secured non-internet IPs and (for user data processing) via a very limited data API. Your web service treats these as a black box, receives back encrypted data (end to end encryption from its perspective) and other than these very few points, your other systems or users only ever see encrypted user data.)
Don't forget to pay a bit of attention to routine risks as well (backups, power/hardware/connectivity issues, sudden loss of key people).
Put these together and it starts to get a bit harder for things to go "so catastrophically wrong" as to cause serious business damage.
What I am hoping to convey is, don't just think of encryption as your "solve-it-all". Look at the pathways that would allow an insider or outsider to get user data either when it's stored/modified, or retrieved/used, or in transit. Consider what can be done to strip those pathways down to their minimum (less is easier to secure and to notice patterns). Consider what you can do to make it hard to get the data they want from any other point than a very few points, and how to secure those very few points. Consider how to better detect bulk downloads or slow exfiltration. Layer it, and don't trust single points of security failure any more than you would single points of hardware failure. It won't be perfect but it's an approach that will pay off.
This kind of thinking has the potential benefit that even if your DB server is penetrated or internal encryption/decryption credentials accessed, perhaps useful user-data in bulk still can't be obtained or exfiltrated so easily. It adds a hurdle, and conveniently it doesn't need to be a very expensive or complicated one. It's almost the principle of it.