What is metadata in .NET?

If you're familiar with .NET Reflection you can think of metadata as "the data that Reflection accesses". Each .NET assembly stores information about what types and methods it contains, the attributes on those methods, etc. It wouldn't need to store that just to run the code (native EXEs don't have that kind of information), but it needs it for other purposes, like enforcing declarative security and enabling Reflection.

So metadata is "something physical", but most of it is automatically generated from the code you write. Adding attributes to your classes or methods is probably the only way you can directly change metadata. In particular, your source code comments will not be stored in the assembly as metadata (or in any other way).

The Wikipedia page on this is pretty good: http://en.wikipedia.org/wiki/.NET_metadata

Edit: No, metadata is not like comments. It is simply "data about the code", which is not part of the code itself (not needed to run the program). It's not like the HTML metadata at all. An example of metadata is the fact that the assembly contains a class named "MyClass" and that class contains a method named "DoSomething" with certain parameters, etc. So it's nothing mysterious - just "obvious" stuff mainly.


Since others already provided great explanatory answers, I'll just mention how you can view metadata yourself.

In your Microsoft SDK directory (most likely variations of C:\Program Files\Microsoft SDKs\Windows\v7.0A\Bin\NETFX 4.0 Tools) there's program called ildasm.exe - it's simple disassembler that allows you to view compiled .NET binaries.

You can build very simple console application and use ildasm.exe to view compiled contents. View/MetaInfo/Show! command (or simply Ctrl + M) will display metadata - you can check how they look like. Part of metadata from application printing Hello to console:

TypeDef #1 (02000002)
-------------------------------------------------------
TypDefName: Program  (02000002)
Flags     : [Public] [AutoLayout] [Class] [AnsiClass] [BeforeFieldInit](00100001)
Extends   : 01000001 [TypeRef] System.Object
Method #1 (06000001) [ENTRYPOINT]
-------------------------------------------------------
    MethodName: Main (06000001)
    Flags     : [Public] [Static] [HideBySig] [ReuseSlot]  (00000096)
    RVA       : 0x00002050
    ImplFlags : [IL] [Managed]  (00000000)
    CallCnvntn: [DEFAULT]
    ReturnType: Void
    1 Arguments
        Argument #1:  SZArray String
    1 Parameters
        (1) ParamToken : (08000001) Name : args flags: [none] (00000000)

Here you can see type definition (Program) and one of its methods (Main), which takes single input argument and returns void. This is naturally only part of metadata, even for simpliest programs there's a lot more.


This is a great and comprehensive article about meta data in dot net. Take a look at it. I hope it will clear many things. It has link to a page explaining how meta data is used at runtime.

Reflection in dot net is a very powerful concept and it is based on reading the metadata stored along with the actual code.


Metadata is parts of the information from the source code itself which is stored in a special section in the assembly when compiled. It is really an implementation detail in how assemblies are structured. For typical C# application development you don't really need to know about this. It is mostly relevant if you develop developer tools.

The term "metadata" is somewhat misleading. Assembly metadata includes stuff from the code like constants and string literals which is not really metadata in the usual sense of the word. A more correct term would perhaps be non-executable data.

When C# is compiled into an assembly, the compilation output is separated into two sections. The IL which is the actual executable code in bytecode format, and the "metadata" which is all the other stuff: type, interface, and member declarations, method signatures, constants, external dependencies and so on.

Take this program:

class Program
{
    public static void Main(string[] args)
    {
        var x = 2 + 2;
        Console.WriteLine("Hello World!");
    }
}

When this program is compiled into an assembly, it is separated into metadata and IL. The metadata contains these declarations (represented in a language-independent binary format):

class Program
{
    public static void Main(string[] args);
}

Furthermore metadata contains the string literal "Hello World!", and the information that the assembly references System.Console.WriteLine in mscorlib.dll.

Only this part gets compiled into IL:

var x = 2 + 2;
Console.WriteLine("Hello World!");

With the caveat that the method reference and the literal string are represented in the IL as pointers into the metadata. On the other hand the method declarations in the metadata have pointers into the IL to the code which implement the method body.

So it comes down to a way to separate the executable (imperative) IL code from the non-executable (declarative) parts.

Why is this separation useful? Because it allows tools to extract and use the metadata without having to actually execute any of the IL. For example Visual Studio is able to provide code completion to members defined in an assembly just by reading the metadata. The compiler can check that methods called from other assemblies actually exists and parameters match and so on.

Tags:

C#

.Net

Metadata