Recommendation for default settings for unused pins on an STM32 (ARM Cortex M3) - pull up/pull down?
This answer is not STM32 specific but is based on experience and many such discussions over many (many) years. Others can add to this - it covers the main points (I think) but may not be complete.
It's encouraging to see someone asking these simple but fundamental questions and showing an awareness how such "little things" can 'gang aglae' in real life.
ie "If the micro does not initialise properly ..." really reads " ... when the micro does not initialise properly ..." :-) - and it's obvious that you realise this.
So:
Use of external pullup or pulldown is essential for those really keen on getting a well defined result. This is the single biggest must-do here. All the rest is a bonus. ie Setting to inputs with internal pullxxx is a compromise which will almost always work.
BUT if "almost always" is not good enough for your design then you need external pull xxxs.Pullup or down does not seem to have an overwhelming better result. It may vary between ICs but can be determined from the data sheet. All things being equal (as they may be) I'd favour pull-down as there is a potential for lower leakage currents to device external circuitry - but this is liable to be minimal in a conformally coated PCB and/or a benign environment.
You may wish to look at startup action if you really care. eg a pulled up pin will start low and transit high at some stage. A pulled down pin will probably stay low throughout. This is probably not important but is mentioned for completeness.
ESD susceptibility will be device specific, quite likely symmetric and on average over many processors probably favours pull down as drivers tend to sink better than source if asymmetric. If you care a lot about ESD then you may wish to use low outputs with pull downs - as a a low impedance path will (probably) offer better ESD protection. But if you care a lot about ESD you will want to design for it in other ways and not rely on in-IC protection as your main protection.
Re question 3 - external pullxxxs is desirable but it seems safe to use values which are at the limiting high end of proper design and then use internal xxx's in parallel if desired. However, as internal pull xxxs often have a 2:1 Reffective you can get largest R and smallest current by using external only. What you of course want to avoid is external pull ups and internal pull downs or vice bersa - but that's unlikely to be an issue.
When I say " ... limiting high end of proper design ... " I mean just that and not "past the limiting ...". ie the pin will have a specified value of resistance which allows the worst case Vin spec to be met. A larger resistor may take less current in the resistor but may start to very slightly turn on the internal switch. ie it may be that there is an Rpulldown_current versus lowest overall current tradeoff as the internal driver starts to see leakage current (which will be extremely small) increasing the current to the dirver and whispering it on very slightly.
If you use eg pulldown you may then find it lower power to set the pin to output and drive it low, but this is an option that can be decided on in due course.
Almost an aside - NEVER allow protection diodes to handle "any significant currents" at any stage during operation. Allowing them to do so can lead to totally inexplicable processor action. The less the current the lower the chance of things gong wrong - and the harder to find it when they do.
What are you optimizing for? Cost optimization dictates that you set unused pins to outputs. Reliability optimization dictates that all pin levels are defined, even in the short period before the firmware has the chance to set unused pins to what it deems appropriate.
I once had to check the reliability calculations of a processor board. It was well designed, with decoupling caps all over the place and pull-to-whatever resistors on all I/O pins. The reliability engineer took out his handbook, added the failure rates of all components involved, and ended up with a figure that was dominated by the failure rates of the passive components. That figure was higher than the requirement, so we had a problem. Remove those resistors, and the figure would be OK. But at that proposal the electrical engineers started to shout in anger (rightly so, IMO). I don't recall how the story ended; I think we went to the client and asked dispensation for omitting the failure rates of the resistors from the calculation, on the grounds that they carried no significant current.