Performance of recursion vs accumulator style
The answer will depend on the details of the Racket system. Here's my take on it.
There are two major differences between the naturally recursive version and the accumulator version. First, the accumulator version is written in a fashion that allows tail-call optimization. This helps make the accumulator version faster, as fewer stack frames need to be created. But that is the opposite of what is discussed in HtDP and that you've seen on your work computer.
The other difference is the order of multiplication. The naturally recursive version multiples the numbers from 1 to 20 in ascending order, that is
((((1 * 2) * 3) * … * 19) * 20)
The accumulator version multiplies the same numbers in descending order, that is
((((20 * 19) * 18) * … * 2) * 1)
Mathematically, these are the same, and the two factorial functions will give the same result. Nonetheless, this difference can matter. In particular, at any intermediate multiplication, the intermediate result will be larger for the latter calculation than for the former calculation.
The factorial of 20 is a big number. It won't fit in a 32 bit integer. That means that racket will need to use an arbitrary length integer (a "bignum") to represent the answer, and some of the intermediate results. Arbitrary precision arithmetic, including multiplication involving bignums, is slower than fixed precision arithmetic.
Since the intermediate results in the accumulator version are always larger than for the naturally recursive version, the accumulator version will require a bignum earlier than the recursive version. In short, while both versions require the same number of multiplications, the accumulator version requires more arbitrary precision multiplications. This makes the accumulator version slower. Apparently, the additional cost of the arithmetic outweighs the saving from reducing the number of stack frames.
So why wouldn't the same trend show up on your home computer? You said it was an Intel iMac, so it is probably a 64 bit system. While 20! is a big number, it is small compared to what will fit in a 64 bit integer, so your home computer isn't doing any arbitrary precision arithmetic and the order doesn't matter. HtDP is old enough that it would have used a 32 bit system, as would Windows XP on your work computer.
More useful to explore the differences would be a function that calculates the product of a list of numbers, either
(define (product numlist)
(* (car numlist) (product (cdr numlist)))
or an accumulator version. You could then feed in the numbers in either ascending or descending order, independent of whether you're using a naturally recursive or accumulator-based approach.
I don't know the inners of the Racket compiler, but I'll speculate.
Tail calls are generally more expensive than normal calls (this is true in .NET, up to 7 times slower), but in some cases the tail call can be eliminated and it ends up being a while(1) { ... }
C-style loop, hence there will be no extra calls to make, just a simply local jump, effectively eliminating procedure application overhead.