ARM assembly: auto-increment register on store
For store and load you do this:
ldr r0,[r1],#4
str r0,[r2],#4
whatever you put at the end, 4 in this case, is added to the base register (r1 in the ldr example and r2 in the str example) after the register is used for the address but before the instruction has completed it is very much like
unsigned int a,*b,*c;
...
a = *b++;
*c++ = a;
EDIT, you need to look at the disassembly to see what is going on, if anything. I am using the latest code sourcery or now just sourcery lite from mentor graphics toolchain.
arm-none-linux-gnueabi-gcc (Sourcery CodeBench Lite 2011.09-70) 4.6.1
#include <stdio.h>
int main ()
{
int out[]={0, 0};
asm volatile (
"mov r0, #1 \n\t"
"str r0, [%0], #4 \n\t"
"add r0, r0, #1 \n\t"
"str r0, [%0] \n\t"
:: "r"(out)
: "r0" );
printf("%d %d\n", out[0], out[1]);
return 0;
}
arm-none-linux-gnueabi-gcc str.c -O2 -o str.elf
arm-none-linux-gnueabi-objdump -D str.elf > str.list
00008380 <main>:
8380: e92d4010 push {r4, lr}
8384: e3a04000 mov r4, #0
8388: e24dd008 sub sp, sp, #8
838c: e58d4000 str r4, [sp]
8390: e58d4004 str r4, [sp, #4]
8394: e1a0300d mov r3, sp
8398: e3a00001 mov r0, #1
839c: e4830004 str r0, [r3], #4
83a0: e2800001 add r0, r0, #1
83a4: e5830000 str r0, [r3]
83a8: e59f0014 ldr r0, [pc, #20] ; 83c4 <main+0x44>
83ac: e1a01004 mov r1, r4
83b0: e1a02004 mov r2, r4
83b4: ebffffe5 bl 8350 <_init+0x20>
83b8: e1a00004 mov r0, r4
83bc: e28dd008 add sp, sp, #8
83c0: e8bd8010 pop {r4, pc}
83c4: 0000854c andeq r8, r0, ip, asr #10
so the
sub sp, sp, #8
is to allocate the two local ints out[0] and out[1]
mov r4,#0
str r4,[sp]
str r4,[sp,#4]
is because they are initialized to zero, then comes the inline assembly
8398: e3a00001 mov r0, #1
839c: e4830004 str r0, [r3], #4
83a0: e2800001 add r0, r0, #1
83a4: e5830000 str r0, [r3]
and then the printf:
83a8: e59f0014 ldr r0, [pc, #20] ; 83c4 <main+0x44>
83ac: e1a01004 mov r1, r4
83b0: e1a02004 mov r2, r4
83b4: ebffffe5 bl 8350 <_init+0x20>
and now it is clear why it didnt work. you are didnt declare out as volatile. You gave the code no reason to go back to ram to get the values of out[0] and out[1] for the printf, the compiler knows that r4 contains the value for both out[0] and out[1], there is so little code in this function that it didnt have to evict r4 and reuse it so it used r4 for the printf.
If you change it to be volatile
volatile int out[]={0, 0};
Then you should get the desired result:
83a8: e59f0014 ldr r0, [pc, #20] ; 83c4 <main+0x44>
83ac: e59d1000 ldr r1, [sp]
83b0: e59d2004 ldr r2, [sp, #4]
83b4: ebffffe5 bl 8350 <_init+0x20>
the preparation for printf reads from ram.