Sort --parallel isn't parallelizing
sort doesn't create a thread unless it needs to, and for small files it's just too much overhead. Now unfortunately sort treats a pipe like a small file. If you want to feed enough data to 24 threads then you'll need to specify to sort to use a large internal buffer (sort does that automatically when presented with large files). This is something we should improve on upstream (at least in documentation). So you'll want something like:
(export LC_ALL=C; grep -E <files> | sort -S1G --parallel=24 -u | wc -m)
Note I've set LC_ALL=C for all processes, since they'll all benefit with this data).
BTW you can monitor the sort threads with something like:
watch -n.1 ps -C sort -L -o pcpu