Why is creating an array with inline initialization so slow?

Static array initializes are implemented bit differently. It will store the bits in the assembly as a embedded class which will be named something like <PrivateImplementationDetails>....

What it does is stores the array data as bits inside the assembly in some special location; which will then be loaded from the assembly and it will call RuntimeHelpers.InitializeArray to initialize the array.

Do note that if you use reflector to view the compiled source as C# you'll not notice anything what I'm describing here. You'll need to look at the IL view in reflector or any such decompiling tools.

[MethodImpl(MethodImplOptions.InternalCall), SecuritySafeCritical, __DynamicallyInvokable]
public static extern void InitializeArray(Array array, RuntimeFieldHandle fldHandle);

You can see this is implemented in CLR (marked as InternalCall), which then maps to COMArrayInfo::InitializeArray (ecall.cpp in sscli).

FCIntrinsic("InitializeArray", COMArrayInfo::InitializeArray, CORINFO_INTRINSIC_InitializeArray)

COMArrayInfo::InitializeArray (lives in comarrayinfo.cpp) is the magical method which initializes the array with the value from bits embedded in assembly.

I'm not sure why this takes a lot of time to complete; I don't have good explanations for that. I guess it is because it goes and pulls the data from the physical assembly? I'm not sure. You can dig into the methods by yourself. But you can get some idea that it doesn't gets compiled to as what you see in your code.

You can use tools like IlDasm, and Dumpbin to find more about this and of course download sscli.

FWIW: I've got this information from Pluralsight course by "bart de smet"

First of all, profiling at the C# level will give us nothing since it will show us the C# code line which takes longest to execute which is of course the inline array initialization, but for the sport:

Profiling Results

Now when we see the expected results, lets Observe the code at the IL Level and try to see what is different between the initializations of the 2 arrays:

First of all we will look at the standard array initialization:

Everything looks good, the loop is doing exactly what we expect with no noticeable overhead.
Now let's take a look at the inline array initialization:
- The first 2 lines are creating an array at the size of 4.
- The third line duplicates the generated array's pointer onto the evaluation stack.
- The last line set's the array-local to the array that was just created.

Now we will focus on the 2 remaining lines:

The first line (L_001B) loads some Compilation-Time-Type whose type name is __StaticArrayInitTypeSize=16 and it's field name is 1456763F890A84558F99AFA687C36B9037697848 and it is inside a class named <PrivateImplementationDetails> in the Root Namespace. if we look at this field we see that it contains the desired array entirely just as we want it coded to bytes:

.field assembly static initonly valuetype <PrivateImplementationDetails>/__StaticArrayInitTypeSize=16 1456763F890A84558F99AFA687C36B9037697848 = ((01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00))

The second line, calls a method which returns the initialized array using the empty array that we have just created in L_0060 and using this Compile-Time-Type.

If we try to look at this method's code we will see that it is implemented within the CLR:

[MethodImpl(MethodImplOptions.InternalCall), SecuritySafeCritical, __DynamicallyInvokable]
public static extern void InitializeArray(Array array, RuntimeFieldHandle fldHandle);

So either we need to find it's source code in the published CLR sources, which I couldn't find for this method, or we can debug in the assembly level. Since I am having trouble with my Visual-Studio right now and having problems with it's assembly view, Let's try another attitude and look at the memory writes for each array initialization.

Starting from the loop initialization, at the beginning we can see there is en empty int[] initialized (in the picture 0x724a3c88 seen in Little-Endian is the type of int[] and 0x00000004 is the size of the array, than we can see 16 bytes of zeros).

Empty Array Memory

When the array is initialized we can see that the memory is filled with the same type and size indicators, only it also has the numbers 0 to 3 in it:

Initialized Array Memory

When the loop iterates we can see that the next array (signed in red) it allocated right after our first array (not signed), which implies also that each array consumes 16 + type + size + padding = 19 bytes:

New Array

Doing the same process on the inline-type-initializer we can see that after the array is initialized, the heap contains other types also other than our array; this is probably from within the System.Runtime.CompilerServices.InitializeArray method since the array pointer and the compile-time-type token are loaded on the evaluation stack and not on the heap (lines L_001B and L_0020 in the IL code):

Inline Array Initialization

Now allocating the next array with the inline array initializer shows us that the next array is allocated only 64 bytes after the beginning of the first array!

2 Inline Initialized Arrays

So the inline-array-initializer is slower at the minimum because of few reasons:

Much more memory is allocated (unwanted memory from within the CLR).
There is a method call overhead in addition to the array constructor.
Also if the CLR allocated more memory other than the array - it probably does some more unnecessary actions.

Now for the difference between Debug and Release in the inline array initializer:

If you inspect the assembly code of the debug version it looks like that:

00952E46 B9 42 5D FF 71       mov         ecx,71FF5D42h  //The pointer to the array.
00952E4B BA 04 00 00 00       mov         edx,4  //The desired size of the array.
00952E50 E8 D7 03 F7 FF       call        008C322C  //Array constructor.
00952E55 89 45 90             mov         dword ptr [ebp-70h],eax  //The result array (here the memory is an empty array but arr cannot be viewed in the debug yet).
00952E58 B9 E4 0E D7 00       mov         ecx,0D70EE4h  //The token of the compilation-time-type.
00952E5D E8 43 EF FE 72       call        73941DA5  //First I thought that's the System.Runtime.CompilerServices.InitializeArray method but thats the part where the junk memory is added so i guess it's a part of the token loading process for the compilation-time-type.
00952E62 89 45 8C             mov         dword ptr [ebp-74h],eax
00952E65 8D 45 8C             lea         eax,[ebp-74h]  
00952E68 FF 30                push        dword ptr [eax]  
00952E6A 8B 4D 90             mov         ecx,dword ptr [ebp-70h]  
00952E6D E8 81 ED FE 72       call        73941BF3  //System.Runtime.CompilerServices.InitializeArray method.
00952E72 8B 45 90             mov         eax,dword ptr [ebp-70h]  //Here the result array is complete  
00952E75 89 45 B4             mov         dword ptr [ebp-4Ch],eax

On the other hand the code for the release version looks like that:

003A2DEF B9 42 5D FF 71       mov         ecx,71FF5D42h  //The pointer to the array.
003A2DF4 BA 04 00 00 00       mov         edx,4  //The desired size of the array.
003A2DF9 E8 2E 04 F6 FF       call        0030322C  //Array constructor.
003A2DFE 83 C0 08             add         eax,8  
003A2E01 8B F8                mov         edi,eax  
003A2E03 BE 5C 29 8C 00       mov         esi,8C295Ch  
003A2E08 F3 0F 7E 06          movq        xmm0,mmword ptr [esi]  
003A2E0C 66 0F D6 07          movq        mmword ptr [edi],xmm0  
003A2E10 F3 0F 7E 46 08       movq        xmm0,mmword ptr [esi+8]  
003A2E15 66 0F D6 47 08       movq        mmword ptr [edi+8],xmm0

The debug optimization makes it impossible to view the memory of arr, since the local at the IL level is never set. As you can see this version is using movq which is for that matter the fastest way to copy the memory of the compilation-time-type to the initialized array by copying 2 times a QWORD (2 ints together!) which is exacly the content of our array which is 16 bit.

Why is creating an array with inline initialization so slow?

Tags:

C#

.Net

Performance

Arrays

Initialization

Related

Recent Posts