Metamorphic Code Examples
Publically available metamorphic code samples are limited by several factors:
1) Expertise: Metamorphic coding is an extremely advanced technique in computer programming. The number of programmers capable of coding coherent and clean metamorphic code suitable for sampling is a very small number.
2) Financial Incentives: Metamorphic coding has limited use in commercial application. Because of this the number of programmers who have sufficient skill to create metamorphic code have no professional exposure/incentive to create/learn metamorphic coding techniques.
3) Legitamicy: Metamorphic coding has large applications in potent virus creation. Hence any responsible professional who created metamorphic code would have ethical issues freely distributing samples as an ametuer hacker may be able to use the code to enhance a malicious attack. Conversely, any hacker who was competent enough to create metamorphic code would have no incentive to advertise his skill, should one of his attacks be uncovered as he would then be on a very short list of suspects based on competency.
4) Secrecy: Lastly, and probably the most realist reason metamorphic code is so difficult to find is because any programmer who demonstrates competency in metamorphic programming, and is not apprehended by authorities for cyber crimes, is likely to be recruited by a government security agency, private security firm, or anti-virus company and the programmer's subsequent research/knowledge is then subject to a non-disclosure agreement to maintain a competitive edge.
Why only C/C++
examples?
You mention finding only C/C++
code examples of poly/metamorphic programming and inferred that only languages close to the hardware can be poly/metamorphic. This is true for the strictest definitions of poly/metamorphic code. Interpreted languages can have poly/metamorphic behavior but rely on a statically complied interpreter to execute, hence a large portion of the 'run-time signature' is not mutable. Only compiled low level languages offer the computational flexibility to have a highly mutable 'run time signature.'
Here is some 'polymorphic' PHP code I wrote. PHP being an interpreted language and not a compiled language makes true polymorphism impossible.
PHP Code:
<?php
// Programs functional Execution Section
system("echo Hello World!!\\n");
// mutable malicious payload goes here (if you were devious)
// Programs Polymorphic Engine Section
recombinate();
?>
<?php
function recombinate() {
$file = __FILE__; //assigns file path to $file using magic constant
$contents = file_get_contents($file); //gets file contents as string
$fileLines = explode("\n", $contents); //splits into file lines as string array
$varLine = $fileLines[2]; //extracts third file line as string
$charArr = str_split($varLine); //splits third line into char array
$augStr = augmentStr($charArr); //recursively augments char array
$newLine = implode("",$augStr); //rebuilds char array into string
$fileLines[2] = $newLine; //inserts string back into array
$newContents = implode("\n",$fileLines); //rebuilds array into single string
file_put_contents($file,$newContents); //writes out augmented file
sleep(1); //let the CPU rest
$pid = pcntl_fork(); //forks process
if($pid) { //if in parent:
exit(0); //exit parent process
} //WARNING: creates 'Zombie' child process
else { //else in child process
system("nohup php -f " .$file . " 2> /dev/null"); //executes augmented file
exit(0); //exits exit child process
}
}
function augmentStr($inArr) {
if (mt_rand(0,6) < 5) { //determines mutability
/*$startIndex & $endIndex define mutable parts of file line as Xs
* system("echo XXXXX ... XXXXX\\n");
* 01234567890123 -7654321
*/
$startIndex = 13;
$endIndex = count($inArr)-7;
$targetIndex = mt_rand($startIndex,$endIndex); //choose mutable index
$inArr[$targetIndex] = getSafeChar(mt_rand(0,62)); //mutate index
$inArr = augmentStr($inArr); //recurse
}
return $inArr;
}
function getSafeChar($inNum) { //cannot use escaped characters
$outChar; //must be a standard PHP char
if ($inNum >= 0 && $inNum <= 9 ) { $outChar = chr($inNum + 48); }
else if ($inNum >= 10 && $inNum <= 35) { $outChar = chr($inNum + 55); }
else if ($inNum >= 36 && $inNum <= 61) { $outChar = chr($inNum + 61); }
else if ($inNum == 62) { $outChar = " "; }
else { $outChar = " "; }
return $outChar;
}
?>
WARNING: Creates a zombie process, know how to kill a zombie process before running code
Information Finding Techniques:
This article contains more specific information then Wikipedia. This article does not, however, contain true source code. If you would like my advice, though it is highly unlikely that would will find sample source code, you may be able to find sufficient academic documentation to create your own metamorphic code. Consider this to start (google scholar):
When reading academic articles/papers be sure to look at the sources at the end of the document as these sources my also have valuable information.
Best of luck in your quest for knowledge!
Below is an example of what I believe would classify as metamorphic code written in C. I'm afraid I don't have a great deal of experience writing portable C code, so it may require some modification to compile on other platforms (I'm using an old version of Borland on Windows). Also, it relies on the target platform being x86 since it involves some machine code generation. In theory it should compile on any x86 OS though.
How it works
Each time the program is run, it generates a randomly modified copy of itself, with a different filename. It also prints out a list of offsets that have been modified so you can see it actually doing something.
The modification process is very simplistic. The source code is just interpreted with sequences of assembly instructions that effectively do nothing. When the program is run, it finds these sequences and randomly replaces them with different code (which obviously also does nothing).
Hardcoding a list of offsets obviously isn't realistic for something that other people need to be able to compile, so the sequences are generated in a way that makes them easy to identify in a search through the object code, hopefully without matching any false positives.
Each sequence starts with a push operation on a certain register, a set of instructions that modify that register, and then a pop operation to restore the register to its initial value. To keep things simple, in the original source all of the sequences are just PUSH EAX
, eight NOP
s, and POP EAX
. In all subsequent generations of the app, though, the sequences will be entirely random.
Explaining the code
I've split the code up into multiple parts so I can try to explain it step by step. If you want to compile it yourself, you'll just need to join all the parts together.
First some fairly standard includes:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
Next we have defines for various x86 opcodes. These will typically be combined with other values to generate a full instruction. For example, the PUSH
define (0x50
) by itself is PUSH EAX
, but you can derive the values for other registers by adding an offset in the range 0 to 7. Same thing for POP
and MOV
.
#define PUSH 0x50
#define POP 0x58
#define MOV 0xB8
#define NOP 0x90
The last six are the prefix bytes of several two-byte opcodes. The second byte encodes the operands and will be explained in more detail later.
#define ADD 0x01
#define AND 0x21
#define XOR 0x31
#define OR 0x09
#define SBB 0x19
#define SUB 0x29
const unsigned char prefixes[] = { ADD,AND,XOR,OR,SBB,SUB,0 };
JUNK
is a macro that inserts our sequence of junk operations anywhere we want in the code. As I explained before, it's initially just writing out PUSH EAX
, NOP
, and POP EAX
. JUNKLEN
is the number of NOP
s in that sequence - not the full length of the sequence.
And in case you're not aware, __emit__
is a pseudo-function that injects literal values directly into the object code. I suspect it may be something you need to port if you're using a different compiler.
#define JUNK __emit__(PUSH,NOP,NOP,NOP,NOP,NOP,NOP,NOP,NOP,POP)
#define JUNKLEN 8
Some global variables where our code will be loaded. Global variables are bad, but I'm not a particularly good coder.
unsigned char *code;
int codelen;
Next we have a simple function that will read our object code into memory. I never free that memory because I just don't care.
Notice the JUNK
macro calls inserted at random points. You're going to see a lot more of these throughout the code. You can insert them almost anywhere, but if you're using a real C compiler (as opposed to C++) it'll complain if you try to put them before or in-between variable declarations.
void readcode(const char *filename) {
FILE *fp = fopen(filename, "rb"); JUNK;
fseek(fp, 0L, SEEK_END); JUNK;
codelen = ftell(fp);
code = malloc(codelen); JUNK;
fseek(fp, 0L, SEEK_SET);
fread(code, codelen, 1, fp); JUNK;
}
Another simple function to write the application out again after it has been modified. For the new filename we just replace the last character of the original filename with a digit that is incremented each time. No attempt is made to check whether the file already exists and that we're not overwriting a crucial piece of the operating system.
void writecode(const char *filename) {
FILE *fp;
int lastoffset = strlen(filename)-1;
char lastchar = filename[lastoffset];
char *newfilename = strdup(filename); JUNK;
lastchar = '0'+(isdigit(lastchar)?(lastchar-'0'+1)%10:0);
newfilename[lastoffset] = lastchar;
fp = fopen(newfilename, "wb"); JUNK;
fwrite(code, codelen, 1, fp); JUNK;
fclose(fp);
free(newfilename);
}
This next function writes out a random instruction for our junk sequence. The reg parameter represents the register we're working with - what will be pushed and popped at either end of the sequence. The offset is the offset in the code where the instruction will be written. And space gives the number of bytes we have left in our sequence.
Depending on how much space we have, we may be restricted to which instructions we can write out, otherwise we choose at random whether it's a NOP
, MOV
or one of the others. NOP
is just a single byte. MOV is five bytes: our MOV opcode (with the reg parameter added), and 4 random bytes representing the number moved into the register.
For the two byte sequences, the first is just one of our prefixes chosen at random. The second is a byte in the range 0xC0
to 0xFF
where the least significant 3 bits represent the primary register - i.e. that must be set to the value of our reg
parameter.
int writeinstruction(unsigned reg, int offset, int space) {
if (space < 2) {
code[offset] = NOP; JUNK;
return 1;
}
else if (space < 5 || rand()%2 == 0) {
code[offset] = prefixes[rand()%6]; JUNK;
code[offset+1] = 0xC0 + rand()%8*8 + reg; JUNK;
return 2;
}
else {
code[offset] = MOV+reg; JUNK;
*(short*)(code+offset+1) = rand();
*(short*)(code+offset+3) = rand(); JUNK;
return 5;
}
}
Now we have the equivalent function for reading back one of these instructions. Assuming we've already identified the reg
from the PUSH
and POP
operations at either end of the sequence, this function can attempt to validate whether the instruction at the given offset
is one of our junk operations and that the primary register matches the given reg
parameter.
If it finds a valid match, it returns the instruction length, otherwise it returns zero.
int readinstruction(unsigned reg, int offset) {
unsigned c1 = code[offset];
if (c1 == NOP)
return 1; JUNK;
if (c1 == MOV+reg)
return 5; JUNK;
if (strchr(prefixes,c1)) {
unsigned c2 = code[offset+1]; JUNK;
if (c2 >= 0xC0 && c2 <= 0xFF && (c2&7) == reg)
return 2; JUNK;
} JUNK;
return 0;
}
This next function is the main loop the searches for and replaces the junk sequences. It starts by looking for a PUSH
opcode followed by a POP
opcode on the same register eight bytes later (or whatever JUNKLEN
was set to).
void replacejunk(void) {
int i, j, inc, space;
srand(time(NULL)); JUNK;
for (i = 0; i < codelen-JUNKLEN-2; i++) {
unsigned start = code[i];
unsigned end = code[i+JUNKLEN+1];
unsigned reg = start-PUSH;
if (start < PUSH || start >= PUSH+8) continue; JUNK;
if (end != POP+reg) continue; JUNK;
If the register turns out to be ESP
, we can safely skip it because we'll never use ESP
in our generated code (stack operations on ESP
need special consideration that isn't worth the effort).
if (reg == 4) continue; /* register 4 is ESP */
Once we've matched a likely looking PUSH and POP
combination, we then try to read the instructions in-between. If we successfully match the length of bytes we're expecting, we consider that a match that can be replaced.
j = 0; JUNK;
while (inc = readinstruction(reg,i+1+j)) j += inc;
if (j != JUNKLEN) continue; JUNK;
We then pick one of 7 registers at random (as explained before we don't consider ESP
), and write out the PUSH
and POP
operations for that register at either end of the sequence.
reg = rand()%7; JUNK;
reg += (reg >= 4);
code[i] = PUSH+reg; JUNK;
code[i+JUNKLEN+1] = POP+reg; JUNK;
Then all we need to do is fill in the space in-between using our writeinstruction
function.
space = JUNKLEN;
j = 0; JUNK;
while (space) {
inc = writeinstruction(reg,i+1+j,space); JUNK;
j += inc;
space -= inc; JUNK;
}
And here's where we display the offset that we just patched.
printf("%d\n",i); JUNK;
}
}
Finally we have the main function. This just calls the functions previously described. We read in the code, replace the junk, then write it out again. The argv[0]
argument contains the application filename.
int main(int argc, char* argv[]) {
readcode(argv[0]); JUNK;
replacejunk(); JUNK;
writecode(argv[0]); JUNK;
return 0;
}
And that's all there is to it.
Some final notes
When running this code, obviously you need to make sure the user has the appropriate permissions to write out a file in the same location as the original code. Then once the new file has been generated, you'll typically need to rename it if you're on a system where the file extension is important, or set its execute attributes if that is needed.
Finally, I suspect you may want to run the generated code through a debugger rather than just executing it directly and hoping for the best. I found that if I copied the generated file over the original executable, the debugger was happy to let me step through it while still viewing the original source code. Then whenever you get to a point in the code that says JUNK, you can pop into the assembly view and look at the code that has been generated.
Anyway, I hope my explanations have been reasonably clear, and this was the kind of example you were looking for. If you have any questions, feel free to ask in the comments.
Bonus update
As a bonus, I thought I'd also include an example of metamorphic code in a scripting language. This is quite different from the C example, since in this case we need to mutate the source code, rather than the binary executable, which is a little easier I think.
For this example, I've made extensive use of php's goto
function. Every line starts with a label, and ends with a goto
pointing to the label of the following line. That way each line is essentially self contained, and we can happily shuffle them and still have the program work exactly as before.
Conditions and loop structures are a little more complicated, but they just need to be rewritten in the form of a condition that jumps to one of two different labels. I've included comment markers in the code where the loops would be to try and make it easier to follow.
Example code on ideone.com
All the code does is echo the shuffled copy of itself, so you can easily test it on ideone just by cutting and pasting the output back into the source field and running it again.
If you wanted it to mutate even more, it would be fairly easy to do something like replace all the labels and variables with a different set of random strings every time the code was run. But I thought it best to try and keep things as simple as possible. These examples are just meant to demonstrate the concept - we're not actually trying to avoid detection. :)
This answer is not finished, I will continue to expand on it over time, until this question's answer is complete
Scripted Example - PHP
I have made my own copy of the PHP script James Holderness provided, so that I could see for my self through demonstration how a metamorphic script could work. A full write up of the code is here; http://null.53bits.co.uk/index.php?page=php-goto-replicator
Simply, after initially executing the script it copies itself to a new file with a random file name, with the lines of code in a new random order, it then forks a new process which is executing the new copy of the script file and the original copy exits. Now there is a new copy of the script running, which is a copy of the original file but with a random file name and the lines of code are in a different order. This is a perpetual process; reordering and replicating, then executing a new instance (process) killing the previous one.
I aimed to extend James Holderness's PHP answer a little, into a working self replicating and morphing code example.
This is the raw PHP code I have come up with;
<?php goto a01;
a01: $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'; goto a02;
a02: $randomString = __DIR__."/"; goto a03;
a03: $i = 0; goto a04;
a04: if ($i < 10) goto a05; else goto a07;
a05: $randomString .= $characters[rand(0, strlen($characters) - 1)]; goto a06;
a06: $i++; goto a04;
a07: $randomString .= ".php"; goto a08;
a08: $ARGS=Array("-f",$randomString); goto a09;
a09: $handle_out = fopen("$randomString", "w"); goto l01;
l01: $filename = __FILE__; goto l02;
l02: $contents = file_get_contents($filename); goto l03;
l03: $lines = explode("\n",$contents); goto l04;
l04: $collection = array(); goto l05;
l05: $pattern = '%^[^:]+:.*goto [^;]+;$%'; goto l06;
l06: $i = 0; goto l07;
l07: if ($i < count($lines)-1) goto l08; else goto l23;
l08: $line = $lines[$i]; goto l09;
l09: $line = trim($line); goto l10;
l10: if (substr($line,0,2) != '//') goto l11; else goto l22;
l11: if (preg_match($pattern, $line) === 1) goto l12; else goto l13;
l12: $collection[] = $line; goto l22;
l13: shuffle($collection); goto l14;
l14: $j = 0; goto l15;
l15: if ($j < count($collection)) goto l16; else goto l19;
l16: echo $collection[$j]."\n"; goto l17;
l17: fwrite($handle_out, $collection[$j]."\n"); goto l18;
l18: $j++; goto l15;
l19: $collection = array(); goto l20;
l20: fwrite($handle_out, $line."\n"); goto l21;
l21: echo $line."\n"; goto l22;
l22: $i++; goto l07;
l23: fclose($handle_out); goto f01;
f01: $pid = pcntl_fork(); goto f02;
f02: if ($pid == -1) goto f03; else goto f04;
f03: die("Could not fork a new child\n"); goto f03;
f04: if ($pid) goto f05; else goto f06;
f05: exit(0); goto f05;
f06: $sid = posix_setsid(); goto f07;
f07: if ($sid < 0) goto f08; else goto f09;
f08: die("Child posix_setsid error\n"); goto f08;
f09: sleep(10); goto f10;
f10: pcntl_exec(PHP_BINARY, $ARGS);
l24: exit(0); goto l24;
?>