Why is creating an array with inline initialization so slow?
Static array initializes are implemented bit differently. It will store the bits in the assembly as a embedded class which will be named something like <PrivateImplementationDetails>...
.
What it does is stores the array data as bits inside the assembly in some special location; which will then be loaded from the assembly and it will call RuntimeHelpers.InitializeArray
to initialize the array.
Do note that if you use reflector to view the compiled source as C#
you'll not notice anything what I'm describing here. You'll need to look at the IL
view in reflector or any such decompiling tools.
[MethodImpl(MethodImplOptions.InternalCall), SecuritySafeCritical, __DynamicallyInvokable]
public static extern void InitializeArray(Array array, RuntimeFieldHandle fldHandle);
You can see this is implemented in CLR
(marked as InternalCall
), which then maps to COMArrayInfo::InitializeArray
(ecall.cpp in sscli).
FCIntrinsic("InitializeArray", COMArrayInfo::InitializeArray, CORINFO_INTRINSIC_InitializeArray)
COMArrayInfo::InitializeArray
(lives in comarrayinfo.cpp) is the magical method which initializes the array with the value from bits embedded in assembly.
I'm not sure why this takes a lot of time to complete; I don't have good explanations for that. I guess it is because it goes and pulls the data from the physical assembly? I'm not sure. You can dig into the methods by yourself. But you can get some idea that it doesn't gets compiled to as what you see in your code.
You can use tools like IlDasm
, and Dumpbin
to find more about this and of course download sscli.
FWIW: I've got this information from Pluralsight
course by "bart de smet"
First of all, profiling at the C# level will give us nothing since it will show us the C# code line which takes longest to execute which is of course the inline array initialization, but for the sport:
Now when we see the expected results, lets Observe the code at the IL Level and try to see what is different between the initializations of the 2 arrays:
First of all we will look at the standard array initialization:
Everything looks good, the loop is doing exactly what we expect with no noticeable overhead.
Now let's take a look at the inline array initialization:
- The first 2 lines are creating an array at the size of 4.
- The third line duplicates the generated array's pointer onto the evaluation stack.
- The last line set's the array-local to the array that was just created.
Now we will focus on the 2 remaining lines:
The first line (L_001B
) loads some Compilation-Time-Type whose type name is __StaticArrayInitTypeSize=16
and it's field name is 1456763F890A84558F99AFA687C36B9037697848
and it is inside a class named <PrivateImplementationDetails>
in the Root Namespace
. if we look at this field we see that it contains the desired array entirely just as we want it coded to bytes:
.field assembly static initonly valuetype <PrivateImplementationDetails>/__StaticArrayInitTypeSize=16 1456763F890A84558F99AFA687C36B9037697848 = ((01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00))
The second line, calls a method which returns the initialized array using the empty array that we have just created in L_0060
and using this Compile-Time-Type.
If we try to look at this method's code we will see that it is implemented within the CLR:
[MethodImpl(MethodImplOptions.InternalCall), SecuritySafeCritical, __DynamicallyInvokable]
public static extern void InitializeArray(Array array, RuntimeFieldHandle fldHandle);
So either we need to find it's source code in the published CLR sources, which I couldn't find for this method, or we can debug in the assembly level. Since I am having trouble with my Visual-Studio right now and having problems with it's assembly view, Let's try another attitude and look at the memory writes for each array initialization.
Starting from the loop initialization, at the beginning we can see there is en empty int[]
initialized (in the picture 0x724a3c88
seen in Little-Endian is the type of int[]
and 0x00000004
is the size of the array, than we can see 16 bytes of zeros).
When the array is initialized we can see that the memory is filled with the same type and size indicators, only it also has the numbers 0 to 3 in it:
When the loop iterates we can see that the next array (signed in red) it allocated right after our first array (not signed), which implies also that each array consumes 16 + type + size + padding = 19 bytes
:
Doing the same process on the inline-type-initializer we can see that after the array is initialized, the heap contains other types also other than our array; this is probably from within the System.Runtime.CompilerServices.InitializeArray
method since the array pointer and the compile-time-type token are loaded on the evaluation stack and not on the heap (lines L_001B
and L_0020
in the IL code):
Now allocating the next array with the inline array initializer shows us that the next array is allocated only 64 bytes after the beginning of the first array!
So the inline-array-initializer is slower at the minimum because of few reasons:
- Much more memory is allocated (unwanted memory from within the CLR).
- There is a method call overhead in addition to the array constructor.
- Also if the CLR allocated more memory other than the array - it probably does some more unnecessary actions.
Now for the difference between Debug and Release in the inline array initializer:
If you inspect the assembly code of the debug version it looks like that:
00952E46 B9 42 5D FF 71 mov ecx,71FF5D42h //The pointer to the array.
00952E4B BA 04 00 00 00 mov edx,4 //The desired size of the array.
00952E50 E8 D7 03 F7 FF call 008C322C //Array constructor.
00952E55 89 45 90 mov dword ptr [ebp-70h],eax //The result array (here the memory is an empty array but arr cannot be viewed in the debug yet).
00952E58 B9 E4 0E D7 00 mov ecx,0D70EE4h //The token of the compilation-time-type.
00952E5D E8 43 EF FE 72 call 73941DA5 //First I thought that's the System.Runtime.CompilerServices.InitializeArray method but thats the part where the junk memory is added so i guess it's a part of the token loading process for the compilation-time-type.
00952E62 89 45 8C mov dword ptr [ebp-74h],eax
00952E65 8D 45 8C lea eax,[ebp-74h]
00952E68 FF 30 push dword ptr [eax]
00952E6A 8B 4D 90 mov ecx,dword ptr [ebp-70h]
00952E6D E8 81 ED FE 72 call 73941BF3 //System.Runtime.CompilerServices.InitializeArray method.
00952E72 8B 45 90 mov eax,dword ptr [ebp-70h] //Here the result array is complete
00952E75 89 45 B4 mov dword ptr [ebp-4Ch],eax
On the other hand the code for the release version looks like that:
003A2DEF B9 42 5D FF 71 mov ecx,71FF5D42h //The pointer to the array.
003A2DF4 BA 04 00 00 00 mov edx,4 //The desired size of the array.
003A2DF9 E8 2E 04 F6 FF call 0030322C //Array constructor.
003A2DFE 83 C0 08 add eax,8
003A2E01 8B F8 mov edi,eax
003A2E03 BE 5C 29 8C 00 mov esi,8C295Ch
003A2E08 F3 0F 7E 06 movq xmm0,mmword ptr [esi]
003A2E0C 66 0F D6 07 movq mmword ptr [edi],xmm0
003A2E10 F3 0F 7E 46 08 movq xmm0,mmword ptr [esi+8]
003A2E15 66 0F D6 47 08 movq mmword ptr [edi+8],xmm0
The debug optimization makes it impossible to view the memory of arr, since the local at the IL level is never set.
As you can see this version is using movq
which is for that matter the fastest way to copy the memory of the compilation-time-type to the initialized array by copying 2 times a QWORD
(2 int
s together!) which is exacly the content of our array which is 16 bit
.