Why are conditionally executed instructions not present in later ARM instruction sets?
General claim is modern systems have better branch predictors and compilers are much more advanced so their cost on instruction encoding space is not justified.
This is from ARMv8 Instruction Set Overview
The A64 instruction set does not include the concept of predicated or conditional execution. Benchmarking shows that modern branch predictors work well enough that predicated execution of instructions does not offer sufficient benefit to justify its significant use of opcode space, and its implementation cost in advanced implementations.
And it continues
A very small set of “conditional data processing” instructions are provided. These instructions are unconditionally executed but use the condition flags as an extra input to the instruction. This set has been shown to be beneficial in situations where conditional branches predict poorly, or are otherwise inefficient.
Another paper titled Trading Conditional Execution for More Registers on ARM Processors claims:
... conditional execution takes up precious instruction space as conditions are encoded into a 4-bit condition code selector on every 32-bit ARM instruction. Besides, only small percentages of instructions are actually conditionalized in modern embedded applications, and conditional execution might not even lead to performance improvement on modern embedded processors.
One of the reasons is because of instruction encoding.
In thumb, you cannot squeeze four more bits into the tight 16-bit space while there isn't even enough room for the 3 high bits of the register operands and they must be reduced to a subset of only 8 registers. Note that in thumb2 you have a separate IT(E) instruction for selecting the conditions for the next 4 instructions. You can't store the condition in the same instruction though, because of the reason stated above.
For AArch64 the number of registers has been doubled compared to 32-bit ARM, but again you don't have any remaining bits for the new 3 high bits of the registers. If you want to use the old encoding then you must "borrow" either from the narrow 12-bit immediate or the 4-bit condition. 12 bits are already too small compared to other RISC architectures such as MIPS and reducing the number making everything worse, so removing the condition is a better choice. Because branch prediction has become more and more advanced, it won't be much a problem. It also makes implementing out-of-order execution easier because now there's one less thing to rename and care about
It's somewhat misleading to say that conditional execution is not present in ARMv8. The issue is to understand why you don't want to execute some instructions. Perhaps in the early ARM days, the actual non-execution of instructions mattered (for power or whatever) but today the significance of this feature is that it allows you to avoid branches for small dumb jumps, for example code like a=(b>0? 1: 2). This sort of thing is more common than you might imagine --- conceptually it's things like MAX/MIN or ABS (though for some CPUs there may be instructions to do these particular tasks).
In ARMv8, while there are not general conditionally executed instructions there are a few instructions that perform the specific task I am describing, namely allowing you to avoid branching for short dumb jumps; CSEL is the most obvious example, though there are other cases (e.g. conditional setting of conditions) to handle other common patterns (in that case the pattern of C short-circuited expression evaluation).
IMHO what ARM has done here is what makes the most sense. They've extracted the feature of conditional execution that remains valuable on modern CPUs (avoid many branches) while changing the details of the implementation to match the micro-architecture of modern CPUs.