Why is inlining considered faster than a function call?

There is no calling and stack activity, which certainly saves a few CPU cycles. In modern CPU's, code locality also matters: doing a call can flush the instruction pipeline and force the CPU to wait for memory being fetched. This matters a lot in tight loops, since primary memory is quite a lot slower than modern CPU's.

However, don't worry about inlining if your code is only being called a few times in your application. Worry, a lot, if it's being called millions of times while the user waits for answers!


Aside from the fact that there's no call (and therefore no associated expenses, like parameter preparation before the call and cleanup after the call), there's another significant advantage of inlining. When the function body is inlined, it's body can be re-interpreted in the specific context of the caller. This might immediately allow the compiler to further reduce and optimize the code.

For one simple example, this function

void foo(bool b) {
  if (b) {
    // something
  }
  else {
    // something else
  }
}

will require actual branching if called as a non-inlined function

foo(true);
...
foo(false);

However, if the above calls are inlined, the compiler will immediately be able to eliminate the branching. Essentially, in the above case inlining allows the compiler to interpret the function argument as a compile-time constant (if the parameter is a compile-time constant) - something that is generally not possible with non-inlined functions.

However, it is not even remotely limited to that. In general, the optimization opportunities enabled of inlining are significantly more far-reaching. For another example, when the function body is inlined into the specific caller's context, the compiler in general case will be able to propagate the known aliasing-related relationships present in the calling code into the inlined function code, thus making it possible to optimize the function's code better.

Again, the possible examples are numerous, all of them stemming from the basic fact that inlined calls are immersed into the specific caller's context, thus enabling various inter-context optimizations, which would not be possible with non-inlined calles. With inlining you basically get many individual versions of your original function, each version is tailored and optimized individually for each specific caller context. The price of that is, obviously, the potential danger of code bloat, but if used correctly, it can provide noticeable performance benefits.


"A few pushes and a jump to call a function, is there really that much overhead?"

It depends on the function.

If the body of the function is just one machine code instruction, the call and return overhead can be many many hundred %. Say, 6 times, 500% overhead. Then if your program consists of nothing but a gazillion calls to that function, with no inlining you've increased the running time by 500%.

However, in the other direction inlining can have a detrimental effect, e.g. because code that without inlining would fit in one page of memory doesn't.

So the answer is always when it comes to optimization, first of all MEASURE.