Criteria for Selecting an HSM
Some technical factors that may be relevant:
- Performance - across whatever matters for your application (if any): encryption/decryption/key generation/signing, symmetric, asymmetric, EC, ...
- Scale:
- Is there a limit to the number of keys it supports, and could that limit be a problem?
- How easy is it to add another HSM when your application becomes more demanding (size, speed, geographic distribution...)
- Redundancy - when one HSM breaks, how much of an impact is it on your operations, how easy is it to replace without loss of service, etc
- Backups - how easy is it to automate and restore? Do you need to independently protect the backup's confidentiality and/or integrity or does the product ensure that? How likely are you to end up in a position where you've irrecoverably lost your data (how many factors need to be lost / forgotten, HSMs died, etc).
- API support:
- MS CAPI/CNG (easy to program from a Windows environment);
- JCA (easy to develop for using Java. What version is supported?);
- PKCS#11 (and a recent version? Wide support across applications, though it comes with known security issues);
- vendor proprietary (probably the most flexible/powerful/secure-if-you-know-what-you're-doing but increases cost to move to another vendor), and whilst C is probably a given, does it have bindings for your preferred language?
- a related note: is there guidance on integration with your application (e.g. DBMS, OS services)?
- OS / hardware support
- Management options - what GUI / command line tools are there for doing management tasks - i.e. anything that you do infrequently enough to not want to automate (key generation?; authentication factor management?). Do your admins need to be physically present to commission the device or perform additional tasks after commissioning?
- Programmability - most of your development will likely be on the other end of one of the APIs, but sometimes it is useful to be able to write applications that run on the device for greater flexibility or speed (see Thomas' answer)
- Physical security - how resistant to direct physical attack does your solution need to be (bearing in mind not just the HSM but the whole solution)? If for whatever reason you decide it is particularly important (your HSM is exposed but your clients aren't, or disclosure of the keys is far worse than merely being able to use the keys for nefarious purposes - ref DigiNotar?) then you might want to look for active tamper detection and response, not just passive tamper resistance and evidence.
- Logical security model - can malicious entities on the network abuse your HSM? Malicious processes on the host PC?
- Algorithms - does the HSM support the crypto you want to use (primitives, modes of operation and parameters e.g. curves, key sizes)?
- Authentication options - passwords; quorums; n-factors; smartcards; OTP; ... You should probably at least be looking for something that can require a configurable quorum size of token+password authenticated users before allowing operations using a key.
- Policy options - you might want to be able to define policies such as controlling whether: keys can be exported from the HSM (wrapped or unencrypted); a key can only be used for signing/encryption/decryption/...; authentication is required for signing but not verifying; etc.
- Audit capability - including both HSM-like operations (generated key, signed something with key Y) and handling crashes (ref g3k's comment). How easy is it going to be to integrate the logs into something like Splunk (sane log format, syslog/snmp/other network accessible - or at least non-proprietary - output)?
- Form factor:
- network attached (for larger scale deployments, particularly where multiple applications/servers/clients need to make use of the keys);
- desktop (for individual use; performance, availability and scalability not a big concern but cost is, especially good if your solution requires lots of people needing direct access to an HSM);
- PCI (-express) (cheaper than network attached; more effort involved in making available to multiple applications);
- USB token (easy server upgrade; cheap and slow (and easy to steal!));
PC card (as per desktop, but good for laptop users).(PC cards are pretty dead now)
Some non-technical factors:
- Certifications - do you need any / do you want any because they give you confidence in the product's security? Ignoring what you need for regulatory reasons:
- FIPS 140-2 provides useful confirmation that the NIST-approved algorithms work and have run-time known answer tests (check the Security Policy to see what algs they've got approved), but don't put much stock in it otherwise showing the product is secure; my rule of thumb for Level 3 hardware security means people with only a couple of minutes access to the device will be hard pressed to compromise it. FIPS 140-2 Level 3 is the de-facto baseline certification for HSMs - be wary if it doesn't have one (though that's not to say you need to use it in a FIPS compliant way).
- Common Criteria evaluations are flexible in the assurance they provide: read the Security Target! There are no decent HSM Protection Profiles yet, so at the least you're going to have to read the Security Problem Definition (threats and assumptions) before you have an idea what the evaluation is providing.
- PCI-HSM will be useful if you're in the relevant industry
- Aside from certifications, how does the vendor look like at security? Having CC EAL4 certs is a good starting point, but remember Win2k has those too... Do they make convincing noises about supply chain integrity, Secure Software Development Lifecycle, ISO2700x, or something like The Open Group's Trusted Technology Provider Framework?
- Do you like the vendor's policy on disclosure?
- Support (options, reputation, available in your language)
- Services - if you have a complex requirement, it might be advantageous to have the vendor involved in your configuration/programming.
- Documentation:
- High level documentation - HSMs are complex general purpose products that can require somewhat involved management; good documentation is important to allow you to develop a secure and workable process around them (see Thomas' answer for more discussion).
- API documentation - good coverage, preferably including good examples of common (and complex) tasks
- Cost (units + maintenance)
- Lead time
- Vendor patch policy / frequency (+support for and ease of firmware upgrade)
- Country of design and/or manufacture - you might be a Government or company that particularly (dis)trusts certain countries
- Vendor stability - are they likely to be around to support the product for as long as you're going to be using it?
- What is the vendor's product roadmap, does it hold anything of value for you, and will you have access to the future versions via firmware upgrade?
- How good the swag was that you got off of them at RSA
There are probably many more.
The SANS institute have a good introductory paper describing why you might want an HSM, the positive attributes it (should) have and some of the downsides.
It seems an HSM vendor agrees with most of this list, and produced their own (unattributed) version of it.
A HSM will not avoid complexity; rather, it will add quite a lot of complexity to the whole system.
What HSM do best is key storage: the key is in the HSM and does not get out of it, never. However, you still have to worry about the key life cycle. With a "software" key, stored in a file or in the entrails of the operating system, backups are a vulnerability (you don't want to have many copies of the key floating around). With the HSM, this vulnerability is avoided, but backups become a major headache: losing the key is also a major risk, especially for encryption (if you lose the encryption key, you lose the data). So that's a first item to look at for HSM: backup procedures. I have some experience with Thales (nCipher) HSM, which do it like this: the keys are actually stored as encrypted files (which can be saved just like any file), and the decryption key for that key can be rebuilt with a quorum of administrator smart cards (within a new HSM).
HSM rarely do bulk symmetric encryption. It does not make much sense, actually, to do symmetric encryption with a HSM: you use encryption because the data is confidential. Logically, if the need for secrecy is such that the symmetric key must not leave the HSM, then the data itself should not leave it either. Also, symmetric encryption means that both encryption and decryption use the same key: if that key is in the HSM, then encryption and decryption will both have to go through it.
HSM are better used with hybrid encryption: the HSM stores and uses the private key of an asymmetric encryption system; when data is to be encrypted, whoever has the data generates a random symmetric key K, encrypts the data with K, and encrypts K with the public key corresponding to the HSM-stored private key. In that sense, HSM operate as (oversized, overpriced) smart cards.
Of course, there is another extreme, in which you fit your entire application within the HSM. This requires a programmable HSM, and that's a completely different context. Thales HSM allow that as an option (it's called "CodeSafe" and "SEE"), which they don't give away for free... and don't expect running traditional code in that. HSM have crypto accelerators, but they are otherwise fairly limited embedded systems (think 60 MHz ARM CPU at best: HSM shielding is at odds with heat dissipation). You can fit relatively complex code in a HSM (which allows for it) but it is a specific programming effort. Also, some HSM don't allow it at all.
Though HSM are expensive, the biggest cost in a HSM is operations: they entail a lot of procedures for installing, configuring, operating, restoring and retiring. You will need people. My main criterion would then be: procedures. A good HSM will come with a detailed usage manual which describes how things should be done. It's not the hardware which matters, but how you use it.
Certifications, like EAL 4+ or FIPS 140-2 Level 3, may be required for regulatory purposes. You rarely choose whether you need it or not; that's a requirement from the intended usage context. Obtaining such a certification is a very long and expensive process, so you won't do that by yourself. On the other hand, you might want to broaden your shopping area: if HSM are mainly big smart cards, smart cards might be usable in lieu of the HSM. A 20 EUR smart card can be FIPS 140-2 Level 3; it will compute only one RSA-2048 decryption per second instead of 500, but that may be sufficient for you.