Scalability of 'sort -u' for gigantic files
The sort
that you find on Linux comes from the coreutils package and implements an External R-Way merge. It splits up the data into chunks that it can handle in memory, stores them on disc and then merges them. The chunks are done in parallel, if the machine has the processors for that.
So if there was to be a limit, it is the free disc space that sort
can use to store the temporary files it has to merge, combined with the result.