Identifying the author(s) of a piece of malware
There are a number of different techniques, depending on the skill level of the malware author:
- Embedded metadata - compiled programs can contain details about their authors. This is most commonly seen in legitimate programs, and shows in the details screen if you look in Windows properties. Attackers who are out for fame might well put identifying details in these fields
- Accidental embedding - compilers will often include details on compiler flags used, which may well include paths to source files. If the source file was in
/users/evilbob/malware
, you can make a pretty good guess that evil bob wrote it. There are ways to turn off these inclusions, but everyone makes mistakes sometimes - Common code - malware authors are like any other programmer, and will reuse useful bits of code from previous work. It is sometimes possible to spot that a section of compiled code matches a previously discovered section of code so closely that it seems probable that the same source code was used for each. If that is the case, can deduce that the second author had access to the code from the first, or may be the same person.
- Common toolchain - if a developer tends to use Visual Studio, it would be unexpected to see their code turning up compiled with GCC. If they use a specific packer, it would be strange to see them using a different packer. It's not perfect, but it could suggest a distinction.
- Common techniques - similar to the above, coders often have specific patterns of coding. People are unlikely to switch patterns, so you can make a reasonable guess that if some compiled code couldn't have been generated in a particular coding style, it probably wasn't written by someone who has previously been known to use a different style. This is much easier with interpreted languages, as seeing consistent use of, say,
for
loops rather thanwhile
loops is easier than spotting the differences between the compiled output of each (modern compilers may well reduce them to exactly the same set of instructions). - Malware origin - where did it come from? Does it have any text in specific languages, or typos which suggest a particular background? (e.g.
colour
would suggest that the author wasn't American,generale
might suggest someone used to writing in a Romance language such as French or Italian)
None of these are on their own enough to determine an author, but combined, they might suggest a common author with previous malware, or even with other known code (e.g. from OS projects).
Matthew's answer was excellent. There are a few other ways as well.
- Not a whole lot of malware authors are all that bright. For example, you can open a lot of executables in notepad and look for string data. I've seen countless authors who simply put their email address/server name, username, and passwords inside the programs in a string, and it literally shows up notepad.
- Reverse-engineering malware made by authors who obfuscated the above step.
- Finding the address which the malware connects to, and investigating anyone behind it. If it's a specific type of malware that infects a lot of machines, the developer is probably already known to begin with. If not, track it to the source. There are data trails everywhere.