Is Torrent safe for sharing legitimate files (file verification)? Does it use SHA1 or SHA256?
BitTorrent uses a method called Chunking, in which files are divided into 64 KB – 2 MB pieces. Each piece is hashed and the hashes (along with the piece size) are stored in the torrent's metadata (the small .torrent
file, or the metadata you receive via DHT). That, along with the info_hash
, makes BitTorrent quite resistant to intentional tampering (poisoning). SHA-1 is used in the info_hash
and to verify the chunks.
The University of Southern California has made study on the subject:
We discover that BitTorrent is most resistant to content poisoning.
...
Because the index file is distributed outside of the P2P file-sharing system, each chunk can be verified with a reliable hash contained in the metadata. This verification provides BitTorrent protocol with high resistance to content poisoning.
Short answer
For the most part, yes, but there are some theoretical concerns.
Long answer
BitTorrent divides the file into "pieces". The torrent file contains a list with a SHA1 hash of each piece. Data that does not match the hash in the torrent file will be discarded, so each piece of the file you end up with will have the same SHA1 hash as the corresponding piece of the file used to create the torrent.
So the question becomes whether SHA1 is secure enough to protect against the substitution of "evil" piece data in place of "good" piece data. There are two scenarios to consider:
The attacker has no control over the "good" piece data. In this case, the attacker would need a preimage attack on SHA1.
The attacker has the ability to both predict and exert some control over the "good" piece data. For example, they may be asked to provide a new version of a file to be included in a new version of the torrent with either no other modifications from a previous version or modifications that the attacker can predict. In this case, they can likely make do with some form of collision attack.
A preimage attack on SHA1 seems unlikely in the foreseeable future. A collision attack seems far more likely. A "plain" collision attack is likely of limited utility outside of contrived scenarios, but a "distinct chosen prefix" collision attack (which is harder than a plain collision attack but much easier than a preimage attack) is far more powerful as it allows the "good" data to look pretty innocuous (just got to hide a small block of random-looking data in it somewhere) while the bad data could be arbitrarily bad.
Is a collision attack a major threat? That depends, if the files from which the torrent is made are being collected together by hand by a trusted party, it's pretty unlikely. It would be very difficult for an attacker to predict exactly what the person assembling the collection is going to include before submitting his file. On the other hand, if automation is involved in the creation of the torrents and the attacker can influence the automation in a fairly direct manner, the risk from a collision attack is much higher.