C library to read EXE version from Linux?
The version of the file is in the VS_FIXEDFILEINFO
struct, but you have to find it into the executable data. There are two ways of doing what you want:
- Search for the VERSION_INFO signature in the file and read the
VS_FIXEDFILEINFO
struct directly. - Find the
.rsrc
section, parse the resource tree, find theRT_VERSION
resource, parse it and extract theVS_FIXEDFILEINFO
data.
The first one is easier, but susceptible to find the signature by chance in the wrong place. Moreover, the other data you ask for (product name, description, etc.) are not in this structure, so I'll try to explain how to obtain the data the hard way.
The PE format is a bit convoluted so I'm pasting the code piece by piece, with comments, and with minimum error checking. I'll write a simple function that dumps the data to the standard output. Writing it as a proper function is left as an exercise to the reader :)
Note that I will be using offsets in the buffer instead of mapping the structs directly to avoid portability problems related to the alignment or padding of the struct fields. Anyway, I've annotated the type of the structs used (see include file winnt.h for details).
First a few useful declarations, they should be self-explanatory:
typedef uint32_t DWORD;
typedef uint16_t WORD;
typedef uint8_t BYTE;
#define READ_BYTE(p) (((unsigned char*)(p))[0])
#define READ_WORD(p) ((((unsigned char*)(p))[0]) | ((((unsigned char*)(p))[1]) << 8))
#define READ_DWORD(p) ((((unsigned char*)(p))[0]) | ((((unsigned char*)(p))[1]) << 8) | \
((((unsigned char*)(p))[2]) << 16) | ((((unsigned char*)(p))[3]) << 24))
#define PAD(x) (((x) + 3) & 0xFFFFFFFC)
Then a function that finds the Version resource in the executable image (no size checks).
const char *FindVersion(const char *buf)
{
The first structure in the EXE is the MZ header (for compatibility with MS-DOS).
//buf is a IMAGE_DOS_HEADER
if (READ_WORD(buf) != 0x5A4D) //MZ signature
return NULL;
The only field interesting in the MZ header is the offset of the PE header. The PE header is the real thing.
//pe is a IMAGE_NT_HEADERS32
const char *pe = buf + READ_DWORD(buf + 0x3C);
if (READ_WORD(pe) != 0x4550) //PE signature
return NULL;
Actually, the PE header is quite boring, we want the COFF header, that have all the symbolic data.
//coff is a IMAGE_FILE_HEADER
const char *coff = pe + 4;
We just need the following fields from this one.
WORD numSections = READ_WORD(coff + 2);
WORD optHeaderSize = READ_WORD(coff + 16);
if (numSections == 0 || optHeaderSize == 0)
return NULL;
The optional header is actually mandatory in an EXE and it is just after the COFF. The magic is different for 32 and 64 bits Windows. I'm assuming 32 bits from here on.
//optHeader is a IMAGE_OPTIONAL_HEADER32
const char *optHeader = coff + 20;
if (READ_WORD(optHeader) != 0x10b) //Optional header magic (32 bits)
return NULL;
Here comes the interesting part: we want to find the resources section. It has two parts: 1. the section data, 2. the section metadata.
The data location is in a table at the end of the optional header, and each section has a well known index in this table. Resource section is in index 2, so we obtain the virtual address (VA) of the resource section with:
//dataDir is an array of IMAGE_DATA_DIRECTORY
const char *dataDir = optHeader + 96;
DWORD vaRes = READ_DWORD(dataDir + 8*2);
//secTable is an array of IMAGE_SECTION_HEADER
const char *secTable = optHeader + optHeaderSize;
To get the section metadata we need to iterate the section table looking for a section named .rsrc
.
int i;
for (i = 0; i < numSections; ++i)
{
//sec is a IMAGE_SECTION_HEADER*
const char *sec = secTable + 40*i;
char secName[9];
memcpy(secName, sec, 8);
secName[8] = 0;
if (strcmp(secName, ".rsrc") != 0)
continue;
The section struct has two relevant members: the VA of the section and the offset of the section into the file (also the size of the section, but I'm not checking it!):
DWORD vaSec = READ_DWORD(sec + 12);
const char *raw = buf + READ_DWORD(sec + 20);
Now the offset in the file that correspond to the vaRes
VA we got before is easy.
const char *resSec = raw + (vaRes - vaSec);
This is a pointer to the resource data. All the individual resources are set up in the form of a tree, with 3 levels: 1) type of resource, 2) identifier of resource, 3) language of resource. For the version we will get the very first one of the correct type.
First, we have a resource directory (for the type of resource), we get the number of entries in the directory, both named and unnamed and iterate:
WORD numNamed = READ_WORD(resSec + 12);
WORD numId = READ_WORD(resSec + 14);
int j;
for (j = 0; j < numNamed + numId; ++j)
{
For each resource entry we get the type of the resource and discard it if it is not the RT_VERSION constant (16).
//resSec is a IMAGE_RESOURCE_DIRECTORY followed by an array
// of IMAGE_RESOURCE_DIRECTORY_ENTRY
const char *res = resSec + 16 + 8 * j;
DWORD name = READ_DWORD(res);
if (name != 16) //RT_VERSION
continue;
If it is a RT_VERSION we get to the next resource directory in the tree:
DWORD offs = READ_DWORD(res + 4);
if ((offs & 0x80000000) == 0) //is a dir resource?
return NULL;
//verDir is another IMAGE_RESOURCE_DIRECTORY and
// IMAGE_RESOURCE_DIRECTORY_ENTRY array
const char *verDir = resSec + (offs & 0x7FFFFFFF);
And go on to the next directory level, we don't care about the id. of this one:
numNamed = READ_WORD(verDir + 12);
numId = READ_WORD(verDir + 14);
if (numNamed == 0 && numId == 0)
return NULL;
res = verDir + 16;
offs = READ_DWORD(res + 4);
if ((offs & 0x80000000) == 0) //is a dir resource?
return NULL;
The third level has the language of the resource. We don't care either, so just grab the first one:
//and yet another IMAGE_RESOURCE_DIRECTORY, etc.
verDir = resSec + (offs & 0x7FFFFFFF);
numNamed = READ_WORD(verDir + 12);
numId = READ_WORD(verDir + 14);
if (numNamed == 0 && numId == 0)
return NULL;
res = verDir + 16;
offs = READ_DWORD(res + 4);
if ((offs & 0x80000000) != 0) //is a dir resource?
return NULL;
verDir = resSec + offs;
And we get to the real resource, well, actually a struct that contains the location and size of the real resource, but we don't care about the size.
DWORD verVa = READ_DWORD(verDir);
That's the VA of the version resouce, that is converted into a pointer easily.
const char *verPtr = raw + (verVa - vaSec);
return verPtr;
And done! If not found return NULL
.
}
return NULL;
}
return NULL;
}
Now that the version resource is found, we have to parse it. It is actually a tree (what else) of pairs "name" / "value". Some values are well known and that's what you are looking for, just do some test and you will find out which ones.
NOTE: All strings are stored in UNICODE (UTF-16) but my sample code does the dumb conversion into ASCII. Also, no checks for overflow.
The function takes the pointer to the version resource and the offset into this memory (0 for starters) and returns the number of bytes analyzed.
int PrintVersion(const char *version, int offs)
{
First of all the offset have to be a multiple of 4.
offs = PAD(offs);
Then we get the properties of the version tree node.
WORD len = READ_WORD(version + offs);
offs += 2;
WORD valLen = READ_WORD(version + offs);
offs += 2;
WORD type = READ_WORD(version + offs);
offs += 2;
The name of the node is a Unicode zero-terminated string.
char info[200];
int i;
for (i=0; i < 200; ++i)
{
WORD c = READ_WORD(version + offs);
offs += 2;
info[i] = c;
if (!c)
break;
}
More padding, if neccesary:
offs = PAD(offs);
If type
is not 0, then it is a string version data.
if (type != 0) //TEXT
{
char value[200];
for (i=0; i < valLen; ++i)
{
WORD c = READ_WORD(version + offs);
offs += 2;
value[i] = c;
}
value[i] = 0;
printf("info <%s>: <%s>\n", info, value);
}
Else, if the name is VS_VERSION_INFO
then it is a VS_FIXEDFILEINFO
struct. Else it is binary data.
else
{
if (strcmp(info, "VS_VERSION_INFO") == 0)
{
I'm just printing the version of the file and product, but you can find the other fields of this struct easily. Beware of the mixed endian order.
//fixed is a VS_FIXEDFILEINFO
const char *fixed = version + offs;
WORD fileA = READ_WORD(fixed + 10);
WORD fileB = READ_WORD(fixed + 8);
WORD fileC = READ_WORD(fixed + 14);
WORD fileD = READ_WORD(fixed + 12);
WORD prodA = READ_WORD(fixed + 18);
WORD prodB = READ_WORD(fixed + 16);
WORD prodC = READ_WORD(fixed + 22);
WORD prodD = READ_WORD(fixed + 20);
printf("\tFile: %d.%d.%d.%d\n", fileA, fileB, fileC, fileD);
printf("\tProd: %d.%d.%d.%d\n", prodA, prodB, prodC, prodD);
}
offs += valLen;
}
Now do the recursive call to print the full tree.
while (offs < len)
offs = PrintVersion(version, offs);
And some more padding before returning.
return PAD(offs);
}
Finally, as a bonus, a main
function.
int main(int argc, char **argv)
{
struct stat st;
if (stat(argv[1], &st) < 0)
{
perror(argv[1]);
return 1;
}
char *buf = malloc(st.st_size);
FILE *f = fopen(argv[1], "r");
if (!f)
{
perror(argv[1]);
return 2;
}
fread(buf, 1, st.st_size, f);
fclose(f);
const char *version = FindVersion(buf);
if (!version)
printf("No version\n");
else
PrintVersion(version, 0);
return 0;
}
I've tested it with a few random EXEs and it seems to work just fine.
I know pev
is a tool on Ubuntu that allows you to see this information, along with a lot of other PE header info. I also know it's written in C. Maybe you'll want to have a look at it. A bit from its history section in the docs:
pev has born in 2010 from a simple need: a program to find out the version (File Version) of a PE32 file and that could be run in Linux. This version number is stored in Resources (.rsrc) section but at the time we've decided to simply search for the string in the whole binary, without any optimization.
Later on we've decided to parse the PE32 file until reach .rsrc section and get the File Version field. In order to do that, we realized we must parse the entire file and we thought if we could print out all the fields and values as well...
Until version 0.40, pev was an unique program for parse the PE headers and sections (now readpe is responsible for this). In version 0.50 we focused on malware analysis and splitted pev into various programs beyond a library, called libpe. Currently all pev programs use libpe.