Why do stacks typically grow downwards?
One possible reason might be that it simplifies alignment. If you place a local variable on the stack which must be placed on a 4-byte boundary, you can simply subtract the size of the object from the stack pointer, and then zero out the two lower bits to get a properly aligned address. If the stack grows upwards, ensuring alignment becomes a bit trickier.
Stanley Mazor (4004 and 8080 architect) explains how stack growth direction was chosen for 8080 (and eventually for 8086) in "Intel Microprocessors: 8008 to 8086":
The stack pointer was chosen to run "downhill" (with the stack advancing toward lower memory) to simplify indexing into the stack from the user's program (positive indexing) and to simplify displaying the contents of the stack from a front panel.
As to the historic rationale, I can't say for certain (because I didn't design them). My thoughts on the matter are that early CPUs got their original program counter set to 0 and it was a natural desire to start the stack at the other end and grow downwards, since their code naturally grows upward.
As an aside, note that this setting of the program counter to 0 on reset is not the case for all early CPUs. For example, the Motorola 6809 would fetch the program counter from addresses
0xfffe/f
so you could start running at an arbitrary location, depending on what was supplied at that address (usually, but by no means limited to, ROM).
One of the first things some historical systems would do would be to scan memory from the top until it found a location that would read back the same value written, so that it would know the actual RAM installed (e.g., a z80 with 64K address space didn't necessarily have 64K or RAM, in fact 64K would have been massive in my early days). Once it found the top actual address, it would set the stack pointer appropriately and could then start calling subroutines. This scanning would generally be done by the CPU running code in ROM as part of start-up.
With regard to the stacks growth, not all of them grow downwards, see this answer for details.
One good explanation I heard was that some machines in the past could only have unsigned offsets, so you'd want to the stack to grow downward so you could hit your locals without having to lose the extra instruction to fake a negative offset.