How does native android code written for ARM run on x86?
Yes, ARM native code runs on Intel x86 using an emulation feature named Houdini
What this library does is reads ARM instructions on the fly and converts them to equivalent x86 instructions. This is the reason why many apps may work as is on x86 without actually having to build an equivalent library.
You can actually include different native code for different architecture, not sure how Netflix is running but if you open apk you can see /lib/armeabi-v7a/
, so I assume there can be a folder something like /lib/x86/
Edit: I just checked Amazon shopping app it has native code for arm and x86. So maybe Thats how netflix does it too.
The Android Studio 3 emulator uses QEMU as a backend
https://en.wikipedia.org/wiki/QEMU
QEMU is arguably the leading open source cross arch emulator. It is GPL software, and supports many, many more archs in addition to x86 and ARM.
Android then just adds some bit of UI magic on top of QEMU and possibly some patches, but the core is definitely in QEMU upstream.
QEMU uses a technique called binary translation to achieve reasonably fast emulation: https://en.wikipedia.org/wiki/Binary_translation
Binary translation basically translates ARM instructions into equivalent x86 instructions.
Therefore, to understand the details, the best way is to:
- read QEMU source code: https://github.com/qemu/qemu
- study binary translation in general, possibly write your own toy implementation
Theory
- CPUs are "Turing complete" (up to memory limits)
- CPUs have a simple deterministic behavior that can be simulated with finite memory Turing machines
Therefore, it is clear that any CPU can emulate any CPU given enough memory.
The hard question is how to do that fast.
Practice: QEMU user mode simulation
QEMU has an userland mode that makes it very easy to play with userland ARM code on your x86 machine to see what is happening, as long as your guest and host are the same OS.
In this mode, what happens is that binary translation takes care of the basic instructions, and system calls are just forwarded to the host system calls.
E.g., for Linux on Linux with a Linux freestanding (no glibc) hello world:
main.S
.text
.global _start
_start:
asm_main_after_prologue:
/* write */
mov x0, 1
adr x1, msg
ldr x2, =len
mov x8, 64
svc 0
/* exit */
mov x0, 0
mov x8, 93
svc 0
msg:
.ascii "hello syscall v8\n"
len = . - msg
GitHub upstream.
Then assemble and run as:
sudo apt-get install qemu-user gcc-aarch64-linux-gnu
aarch64-linux-gnu-as -o main.o main.S
aarch64-linux-gnu-ld -o main.out main.o
qemu-aarch64 main.out
and it outputs the expected:
hello syscall v8
You can even run ARM programs compiled against the C standard library, and GDB step debug the program! See this concrete example: How to single step ARM assembly in GDB on QEMU?
Since we are talking about binary translation, we can also enable some logging to see the exact translation that QEMU is doing:
qemu-aarch64 -d in_asm,out_asm main.out
Here:
in_asm
refers to the ARM guest input assemblyout_asm
refers to X86 host generated assembly that gets run
The output contains:
----------------
IN:
0x0000000000400078: d2800020 mov x0, #0x1
0x000000000040007c: 100000e1 adr x1, #+0x1c (addr 0x400098)
0x0000000000400080: 58000182 ldr x2, pc+48 (addr 0x4000b0)
0x0000000000400084: d2800808 mov x8, #0x40
0x0000000000400088: d4000001 svc #0x0
OUT: [size=105]
0x5578d016b428: mov -0x8(%r14),%ebp
0x5578d016b42c: test %ebp,%ebp
0x5578d016b42e: jne 0x5578d016b482
0x5578d016b434: mov $0x1,%ebp
0x5578d016b439: mov %rbp,0x40(%r14)
0x5578d016b43d: mov $0x400098,%ebp
0x5578d016b442: mov %rbp,0x48(%r14)
0x5578d016b446: mov $0x4000b0,%ebp
0x5578d016b44b: mov 0x0(%rbp),%rbp
0x5578d016b44f: mov %rbp,0x50(%r14)
0x5578d016b453: mov $0x40,%ebp
0x5578d016b458: mov %rbp,0x80(%r14)
0x5578d016b45f: mov $0x40008c,%ebp
0x5578d016b464: mov %rbp,0x140(%r14)
0x5578d016b46b: mov %r14,%rdi
0x5578d016b46e: mov $0x2,%esi
0x5578d016b473: mov $0x56000000,%edx
0x5578d016b478: mov $0x1,%ecx
0x5578d016b47d: callq 0x5578cfdfe130
0x5578d016b482: mov $0x7f8af0565013,%rax
0x5578d016b48c: jmpq 0x5578d016b416
so in the IN
section, we see our hand written ARM assembly code, and in the OUT
section we see the generated x86 assembly.
Tested in Ubuntu 16.04 amd64, QEMU 2.5.0, binutils 2.26.1.
QEMU full system emulation
When you boot Android in QEMU however, it is not running an userland binary of course, but rather doing full system simulation, where it runs the actual Linux kernel and all devices in the simulation.
Full system simulation is more accurate, but a bit slower, and you need to give a kernel and disk image to QEMU.
To try that out, have a look at the following setups:
- build AOSP from source and run it on QEMU: How to compile the Android AOSP kernel and test it with the Android Emulator?
- build a minimal beautiful Linux system with Buildroot and run it with QEMU: How to download the Torvalds Linux Kernel master, (re)compile it, and boot it with QEMU?
- build and run run baremetal code on QEMU: https://github.com/cirosantilli/linux-kernel-module-cheat/tree/79b35fb395f9f7f7621609186931408fe2f79881#baremetal-setup-getting-started
KVM
If you run Android X86 on QEMU, you will notice that it is much faster.
The reason is that QEMU uses KVM, which is a Linux kernel feature that can run the guest instructions directly on the host!
If you happen to have a powerful ARM machine (yet rare as of 2019), you can also run ARM on ARM with KVM much faster.
For this reason, I recommend that you stick to X86 simulation of AOSP if you are on an X86 host as mentioned at: How to compile the Android AOSP kernel and test it with the Android Emulator?, unless you really need to touch something low level.