How can the content of a file refer to its own MD5?
'face2face' is only 9 characters, i.e. 36 bits since we are using hexadecimal encoding. It suffices to generate many pictures with some internal variations (subtle variations that do not impact the graphical output) and hash them all until the target string is obtained. Since we are looking for a 36-bit pattern and accept that pattern wherever it appears in the 32-character output (24 possible positions), then the average number of pictures to produce and hash will be about 236/24, i.e. about 2.8 billion. Since a basic desktop PC can compute several (many) million MD5 hashes per second, this should be done in less than an hour with some decently optimized code.
This has nothing to do with known weaknesses of MD5 with regards to collisions. The same could be done with SHA-1 or SHA-256.
This has already been discussed in this question.