Smallest possible runnable Mach-O executable

Smallest runnable Mach-O has to be at least 0x1000 bytes. Because of XNU limitation, file has to be at least of PAGE_SIZE. See xnu-4570.1.46/bsd/kern/mach_loader.c, around line 1600.

However, if we don't count that padding, and only count meaningful payload, then minimal file size runnable on macOS is 0xA4 bytes.

It has to start with mach_header (or fat_header / mach_header_64, but those are bigger).

struct mach_header {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
};

It's size is 0x1C bytes.
magic has to be MH_MAGIC.
I'll be using CPU_TYPE_X86 since it's an x86_32 executable.
filtetype has to be MH_EXECUTE for executable, ncmds and sizeofcmds depend on commands, and have to be valid.
flags aren't that important and are too small to provide any other value.

Next are load commands. Header has to be exactly in one mapping, with R-X rights -- again, XNU limitations.
We'd also need to place our code in some R-X mapping, so this is fine.
For that we need a segment_command.

Let's look at definition.

struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};

cmd has to be LC_SEGMENT, and cmdsize has to be sizeof(struct segment_command) => 0x38.
segname contents don't matter, and we'll use that later.

vmaddr has to be valid address (I'll use 0x1000), vmsize has to be valid & multiple of PAGE_SIZE, fileoff has to be 0, filesize has to be smaller than size of file, but larger than mach_header at least (sizeof(header) + header.sizeofcmds is what I've used).

maxprot and initprot have to be VM_PROT_READ | VM_PROT_EXECUTE. maxport usually also has VM_PROT_WRITE.
nsects are 0, since we don't really need any sections and they'll add up to size. I've set flags to 0.

Now, we need to execute some code. There are two load commands for that: entry_point_command and thread_command.
entry_point_command doesn't suit us: see xnu-4570.1.46/bsd/kern/mach_loader.c, around line 1977:

1977    /* kernel does *not* use entryoff from LC_MAIN.  Dyld uses it. */
1978    result->needs_dynlinker = TRUE;
1979    result->using_lcmain = TRUE;

So, using it would require getting DYLD to work, and that means we'll need __LINKEDIT, empty symtab_command and dysymtab_command, dylinker_command and dyld_info_command. Overkill for "smallest" file.

So, we'll use thread_command, specifically LC_UNIXTHREAD since it also sets up stack which we'll need.

struct thread_command {
    uint32_t    cmd;        /* LC_THREAD or  LC_UNIXTHREAD */
    uint32_t    cmdsize;    /* total size of this command */
    /* uint32_t flavor         flavor of thread state */
    /* uint32_t count          count of uint32_t's in thread state */
    /* struct XXX_thread_state state   thread state for this flavor */
    /* ... */
};

cmd is going to be LC_UNIXTHREAD, cmdsize would be 0x50 (see below).
flavour is x86_THREAD_STATE32, and count is x86_THREAD_STATE32_COUNT (0x10).

Now the thread_state. We need x86_thread_state32_t aka _STRUCT_X86_THREAD_STATE32:

#define _STRUCT_X86_THREAD_STATE32  struct __darwin_i386_thread_state
_STRUCT_X86_THREAD_STATE32
{
    unsigned int    __eax;
    unsigned int    __ebx;
    unsigned int    __ecx;
    unsigned int    __edx;
    unsigned int    __edi;
    unsigned int    __esi;
    unsigned int    __ebp;
    unsigned int    __esp;
    unsigned int    __ss;
    unsigned int    __eflags;
    unsigned int    __eip;
    unsigned int    __cs;
    unsigned int    __ds;
    unsigned int    __es;
    unsigned int    __fs;
    unsigned int    __gs;
};

So, it is indeed 16 uint32_t's which would be loaded into corresponding registers before thread is started.

Adding header, segment command and thread command gives us 0xA4 bytes.

Now, time to craft the payload.
Let's say we want it to print Hi Frand and exit(0).

Syscall convention for macOS x86_32:

  • arguments passed on the stack, pushed right-to-left
  • stack 16-bytes aligned (note: 8-bytes aligned seems to be fine)
  • syscall number in the eax register
  • call by interrupt

See more about syscalls on macOS here.

So, knowing that, here's our payload in assembly:

push   ebx          #; push chars 5-8
push   eax          #; push chars 1-4
xor    eax, eax     #; zero eax
mov    edi, esp     #; preserve string address on stack
push   0x8          #; 3rd param for write -- length
push   edi          #; 2nd param for write -- address of bytes
push   0x1          #; 1st param for write -- fd (stdout)
push   eax          #; align stack
mov    al, 0x4      #; write syscall number
#; --- 14 bytes at this point ---
int    0x80         #; syscall
push   0x0          #; 1st param for exit -- exit code
mov    al, 0x1      #; exit syscall number
push   eax          #; align stack
int    0x80         #; syscall

Notice the line before first int 0x80.
segname can be anything, remember? So we can put our payload in it. However, it's only 16 bytes, and we need a bit more.
So, at 14 bytes we'll place a jmp.

Another "free" space is thread state registers.
We can set anything in most of them, and we'll put rest of our payload there.

Also, we place our string in __eax and __ebx, since it's shorter than mov'ing them.

So, we can use __ecx, __edx, __edi to fit the rest of our payload. Looking at difference between address of thread_cmd.state.__ecx and end of segment_cmd.segname we calculate that we need to put jmp 0x3a (or EB38) in last two bytes of segname.

So, our payload assembled is 53 50 31C0 89E7 6A08 57 6A01 50 B004 for first part, EB38 for jmp, and CD80 6A00 B001 50 CD80 for second part.

And last step -- setting the __eip. Our file is loaded at 0x1000 (remember vmaddr), and payload starts at offset 0x24.

Here's xxd of result file:

00000000: cefa edfe 0700 0000 0300 0000 0200 0000  ................
00000010: 0200 0000 8800 0000 0000 2001 0100 0000  .......... .....
00000020: 3800 0000 5350 31c0 89e7 6a08 576a 0150  8...SP1...j.Wj.P
00000030: b004 eb38 0010 0000 0010 0000 0000 0000  ...8............
00000040: a400 0000 0700 0000 0500 0000 0000 0000  ................
00000050: 0000 0000 0500 0000 5000 0000 0100 0000  ........P.......
00000060: 1000 0000 4869 2046 7261 6e64 cd80 6a00  ....Hi Frand..j.
00000070: b001 50cd 8000 0000 0000 0000 0000 0000  ..P.............
00000080: 0000 0000 0000 0000 0000 0000 2410 0000  ............$...
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000                                ....

Pad it with anything up to 0x1000 bytes, chmod +x and run :)

P.S. About x86_64 -- 64bit binaries are required to have __PAGEZERO (any mapping with VM_PROT_NONE protection covering page at 0x0). IIRC they [Apple] didn't make it required on 32bit mode only because some legacy software didn't have it and they're afraid to break it.


28 Bytes, Pre-compiled.

Below is a formated hex dump of the Mach-O binary.

00 00 00 00 FF FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
|---------| |---------| |---------| |---------| |---------| |---------| |---------/
|           |           |           |           |           |           +---------- uint32_t        flags;          // Once again redundant, no flags for safety.
|           |           |           |           |           +---------------------- uint32_t        sizeofcmds;     // Size of the commands. Not sure the specifics for this, yet it doesn't particularly matter when there are 0 commands. 0 is used for safety.
|           |           |           |           +---------------------------------- uint32_t        ncmds;          // Number of commands this library proivides. 0, this is a redundant library.
|           |           |           +---------------------------------------------- uint32_t        filetype;       // Once again, documentation is lacking in this department, yet I don't think it particularly matters for our useless library.
|           |           +---------------------------------------------------------- cpu_subtype_t   cpusubtype;     // Like cputype, this suggests what systems this can run on. Here, 0 is ANY.
|           +---------------------------------------------------------------------- cpu_type_t      cputype;        // Defines what cpus this can run on, I guess. -1 is ANY. This library is definitely cross system compatible.
+---------------------------------------------------------------------------------- uint32_t        magic;          // This number seems to be provided by the compiling system, as I lack a system to compile Mach-O, I can't retrieve the actual value for this. But it will always be 4 bytes. (On 32bit systems)

Consists entirely of the header, and does not need the data nor the cmds. This is, by nature, the smallest Mach-O binary possible. It might not run correctly on any conceivable hardware, but it matches the specification.

I'd supply the actual file, but it entirely consists of unprintable characters.