How are passwords stolen from companies if they only store hashes?

There are two common failings, over and above letting the databases or files get stolen in the first place.

Unfortunately, and against all security recommendations, many systems still store plain text passwords.

Hashed passwords are technically not reversible, but as has been pointed out by others, it's possible to hash millions of password guesses then simply look for matches. In fact, what usually happens is that tables of pre-computed passwords and hashes (Rainbow Tables) are available and used to look for matches. A good rainbow table can support a high percentage match in fractions of a second per password hash.

Using a salt (an extra non-secret extension of the password) in the hash prevents the use of pre-computed rainbow tables.

Most compromisers depend upon rainbow tables. Computing their own hash set is certainly possible, but it's extremely time consuming (as in months or longer), so it's generally the vanilla hash that's vulnerable.

Using a salt stops rainbow tables, and a high round count of hashed hashes of hashes can make brute force transition from months to years or longer. Most institutions simply don't implement this level of security.

When you hear that passwords got stolen, sometimes companies will report it even if it's just hashed passwords that were stolen. This is so you can take action in the case that they are broken. Unfortunately, there are still companies that store their passwords incorrectly; for example, if you search for the rockyou password breach, you'll find that they were storing their passwords in clear text, which means that they were compromised as soon as they were stolen. In other cases, such as the Adobe password breach, there was mishandling of storing the encrypted passwords in their database. Other times, companies use hashing on their passwords but use insecure hashing algorithms or they don't salt their passwords properly. In short, if a company follows recommended password storage methods, the passwords in theory should be safe in their hashed form, but a good company will still inform their customers of the breach. However, there are plenty of examples where companies do not store passwords correctly leading them to be cracked quite quickly.

You hash a large number of potential passwords*, then check whether each output matches any hashes from the stolen password database. Brute force cracking is feasible because people do not usually choose highly unpredictable passwords.

When a password database is stolen, the stolen material includes all the information necessary to do offline cracking. (It's simply a guess and check process. Other methods may be available with less secure hashing or password storage methods.)

* If salts are used, then the cracker must consider those too. If each account uses a unique salt then crackers can't simply target everyone by hashing every candidate password once. If multiple accounts are being targeted then the password you want to try has to be hashed one time for each salt. If password hashes are unsalted or all use the same salt it's a lot easier to do untargeted attacks; you would only need to hash a candidate password once to figure out the full list of users that had that password. Salts also render useless attempts to use precomputing of hashes to save cracking effort. Salts DO NOT reduce the number of hashes that need to be evaluated if only one account is being targeted. Those nuances aside, the basis of password cracking remains a guess and check process.

Hashing passwords with a preimage resistant functions with a sufficiently unpredictable input is enough to make it impossible recover a password. (An inhumanly strong password.)

However, most people don't do this in the real world, a stolen database of hashes is potentially as worrying as a list of unhashed passwords for a large subset of users on a typical website.

If the password cracker finds candidate password whose hash matches the one stored in the database, then he will have recovered the original (weak) password.

Alternatively, if a hash function is not preimage resistant (including when the output of the hash is too short) a guess-and-check procedure may produce false positives. (Alternative passwords not identical to the original.)

The accounts of users from the company with the data breach are still vulnerable because these passwords will unlock a user's account, even if they aren't identical to the original password. (The server has no way to tell if it's the original password. The hash still matches the one in the stolen database in this case.)

Don't intentionally use an insecure hash function, of course... It's still possible to infer the original password or narrow down the number of possibilities. Which would still make users that reuse passwords on other websites extra vulnerable.

There are other ways passwords can get stolen which don't stem from a copy of a database of password hashes getting leaked. Plaintext login information could be logged. (By observing unencrypted/decrypted network traffic, by hacking and rewriting server code, or utilizing client side bugs, for example.) Then that log can be exfiltrated.

And of course some companies might not have been using secure password verification schemes in the first place or may leak the plaintext through a bug1,2.

Despite the alternative explanations for some instances of mass password breaches, plaintext passwords recovery from password hashes is common and effective. It is not so effective, however, that 100% of the hashes in every large database leak will be recovered.

If passwords are processed with a cryptographic hash function, then users with extremely strong passwords do not need to be as worried as typical users. (But most people over estimate password strength and their own cleverness.) After spending significant resources to crack 99% of the hashes it probably isn't worth it or practical to crack the last 1%. But strong passwords are no good if passwords aren't hashed.

Developers should use a password stretching algorithm. These algorithms just try to make password hashing more expensive. (For both legitimate users and password crackers.) Argon2 is currently the best password stretching algorithm, especially on Intel/ARM CPUs. Argon2 specifically can go a very long way to reduce the fraction of hashes which will get cracked. (Weak passwords will still be crackable.)