Build interpreter for non-existent language

Ruby, 67 lines of regex substitutions

I decided to write the interpreter in regex, while sticking to efficient algorithms.

I could have gone for plain bytes, but using symbols makes the code more readable in my opinion. Of course, if we could pack two instructions in one byte...

Concatenation of negative values results in ten's complement behavior, reflecting the internal representation.

Division is integer division and the remainder is never negative.

subs = [
  # stack expansion
  [/^ ?([$iv*\/+\-^%dtsz.])/,  ' 0 \1'  ],
  [/^ (\d+ [$*\/+\-^%tsz])/,   ' 0 \1'  ],
  [/^ ((\d+ ){2,3}z)/,         ' 0 \1'  ],
  [/ (0|9)\1+/,                ' \1'    ],
  # concatenation
  [/ (\d+) (?:0+|9+)(\d+) \$/, ' \1\2 ' ], 
  [/ (\d+) (0|9) \$/,          ' \1\2 ' ],
  # swaps
  [/ ((?:\d+ )*)(\d+) </,      ' \2 \1' ],
  [/ (\d+)((?: \d+)*) >/,      '\2 \1 ' ],
  [/ (\d+) (\d+) s/,           ' \2 \1 '],
  [/ (\d+ \d+) (\d+ \d+) z/,   ' \2 \1 '],
  # dups
  [/ (\d+) d/,                 ' \1 \1 '],
  [/ (\d+ \d+) t/,             ' \1 \1 '],
  # pop
  [/ (\d+) \./,                ' '      ],

  # increment / decrement
  [/ (\d+) i/, ' \1I '], [/ (\d+) v/, ' \1V '],
  *(%w[0I 1I 2I 3I 4I 5I 6I 7I 8I 9I].zip [*?1..?9, 'I0']),
  *(%w[0V 1V 2V 3V 4V 5V 6V 7V 8V 9V].zip ['V9', *?0..?8]), 
  [' 1', ' 01'], [' 8', ' 98'], [' I', ' '], [' V', ' '],
  # addition, subtraction
  [/ (\d+) (\d+) \+/,                ' \1P \2P '       ], #init addition
  [/ (\d+) (\d+) \-/,                ' \1S \2S '       ], #init subtraction
  [/ ([PS](\d)\w*) (\d+[PS]\w*) /,   ' \2\1 \3 '       ], #sign extend left
  [/ (\d+[PS]\w*) ([PS](\d)\w*) /,   ' \1 \3\2 '       ], #sign extend right
  [/ (\d*)(\d)P(\S*) (\d*)0P(0*) /,  ' \1P\2\3 \4P0\5 '], #advance addition
  [/ (\d*)(\d)S(\S*) (\d*)0S(0*) /,  ' \1S\2\3 \4S0\5 '], #advance subtraction
  [/ (\d+)P(\S*) (\d*[1-5])P(0*) /,  ' \1IP\2 \3VP\4 ' ], #transfer left
  [/ (\d+)P(\S*) (\d*[6-9])P(0*) /,  ' \1VP\2 \3IP\4 ' ], #transfer right
  [/ (\d+)S(\S*) (\d*[1-5])S(0*) /,  ' \1VS\2 \3VS\4 ' ], #decrement both
  [/ (\d+)S(\S*) (\d*[6-9])S(0*) /,  ' \1IS\2 \3IS\4 ' ], #increment both
  [/ [PS](\S+) [PS]0+ /,             ' \1 '            ], #finish 

  # digitwise negation
  *(%w[9N 8N 7N 6N 5N 4N 3N 2N 1N 0N].zip [*'N0'..'N9']),
  #multiplication and division by 2
  *([*'H0'..'H9'].zip %w[0H 0F 1H 1F 2H 2F 3H 3F 4H 4F]),
  *([*'F0'..'F9'].zip %w[5H 5F 6H 6F 7H 7F 8H 8F 9H 9F]),  
  *(%w[0T 1T 2T 3T 4T 5T 6T 7T 8T 9T].zip %w[T0 T2 T4 T6 T8 TI0 TI2 TI4 TI6 TI8]), 
  ['H ', ' '], [' T', ' '],

  # sign correction for */%
  [/ (\d+) (9\d*) ([*\/%])/, ' \1NI \2NI \3'], [' N', ' '],
  # multiplication
  [/ (0+ \d+|\d+ 0+) \*/,     ' 0 '          ], #multiplication by zero
  [/ (\d+) (0\d*[02468]) \*/, ' \1T H\2 *'   ], #multiplication by an even number
  [/ (\d+) (0\d*[13579]) \*/, ' \1 \1 \2V *+'], #multiplication by an odd number
  # division / modulo
  [?/, 'r.'], [?%, 'rs.'],
  [/ (0|9)(\d*) (0\d+) r/,           ' \3 0 \1D\2 '          ], #init division
  [/ (\d+) (\d+) (0\d*)D(\d*) /,     ' \1 \2I \3SD\4 \1S '   ], #subtract divisor
  [/ (\d+) (\d+) (9\d*)D(\d)(\d*) /, ' \1 \2V0 \3P\4D\5 \1P '], #add divisor and advance
  [/ (\d+) (\d+) (9\d*)D /,          ' \2V \3P \1P '         ], #add divisor and finish  

  #exponentiation
  [/ \d+ 0+ \^/,             ' 01 '          ], # case: zeroth power
  [/ 9\d+ 9+ \^/,            ' 9 '           ], # case: reciprocal of negative
  [/ \d+ 9\d+ \^/,           ' 0 '           ], # case: high negative power
  [/ 0\d+ 9\d+ \^/,          ' 0 '           ], # case: reciprocal of positive
  [/ (\d+) 0+1 \^/,          ' \1 '          ], # case: power of one
  [/ (\d+) (\d*[02468]) \^/, ' \1 \1 *H\2 ^' ], # case: even exponent
  [/ (\d+) (\d*[13579]) \^/, ' \1 \2V ^\1 *' ], # case: odd exponent
]                                   

x = gets.tr '^$iv*/+\-^%><dtsz.', ''
until x =~ /^ (\d+ )*$/
  subs.each do |sub|
    x.sub!(*sub) # && (puts x; sleep 0.1)
  end
end

As for the bonus round, the shortest solution I've come up with (13 characters) is a clean solution:

iistisii$<$<$

x86 assembly (on Win32)

“SPEED!” seems to be hugely important here, and we all know nothing beats assembly language in that regard. So, let’s do that in assembly!

This is an implementation of the language in x86 assembly language (in NASM syntax), with the numbers stored and interpreted as unsigned 32-bit integers, using the native x86 stack directly. Stack underflow, and overflow during any arithmetic operation (or division by zero) is a runtime error, terminating the program with an error message.

        global _start

        extern _GetCommandLineA@0
        extern _GetStdHandle@4
        extern _CreateFileA@28
        extern _GetFileSize@8
        extern _LocalAlloc@8
        extern _ReadFile@20
        extern _CloseHandle@4
        extern _WriteFile@20

section .text

; ---------------------------------------------------------------------------------------
; Initialization
; ---------------------------------------------------------------------------------------

_start:
        ; Retrieve command line
        CALL _GetCommandLineA@0

        ; Skip argv[0]
        MOV ESI, EAX
        XOR EAX, EAX
skipuntilspace:
        MOV AL, [ESI]
        INC ESI
        TEST EAX, EAX
        JE missingparam
        CMP EAX, ' '
        JNE skipuntilspace
        INC ESI

        ; Open the file
        PUSH 0
        PUSH 80h
        PUSH 3
        PUSH 0
        PUSH 1
        PUSH 80000000h
        PUSH ESI
        CALL _CreateFileA@28
        CMP EAX, -1
        JE  cannotopenfile

        ; Get its size
        PUSH EAX
        PUSH 0
        PUSH EAX
        CALL _GetFileSize@8

        PUSH EAX

        ; Allocate memory buffer
        PUSH EAX
        PUSH 0
        CALL _LocalAlloc@8
        TEST EAX, EAX
        MOV ESI, EAX
        JZ outofmemory

        POP ECX
        POP EAX
        PUSH EAX

        ; Store end-of-program pointer
        MOV [programend], ESI
        ADD [programend], ECX

        ; Read the file contents
        PUSH 0
        PUSH buff
        PUSH ECX
        PUSH ESI
        PUSH EAX
        CALL _ReadFile@20
        TEST EAX, EAX
        JZ cannotopenfile

        ; Close the file
        CALL _CloseHandle@4

; ---------------------------------------------------------------------------------------
; Main loop of the interpreter
; ---------------------------------------------------------------------------------------

        ; Store the end of stack into EBP
        MOV EBP, ESP

        ; Push an initial 0 onto the stack
        XOR EAX, EAX
        PUSH EAX

mainloop:
        ; Load the next opcode, if not end of program
        XOR EAX, EAX
        CMP ESI, [programend]
        MOV AL, [ESI]
        JAE endloop
        LEA ESI, [ESI+1]

        ; Check if the opcode is valid
        CMP EAX, (maxop - opcodetable) / 8
        JA  fault_invalidopcode

        ; Check for required stack space
        MOV ECX, [opcodetable + 8 * EAX + 4]
        LEA EDI, [ESP + ECX]
        CMP EDI, EBP
        JA  fault_stackunderflow

        ; Jump to the respective opcode handler
        MOV EAX, [opcodetable + 8 * EAX]
        JMP EAX

; ---------------------------------------------------------------------------------------
; Implementation of the specific operations
; ---------------------------------------------------------------------------------------

        ; ************** CAT 0000 (0): Concatenate (Combine top two numbers in a stack as if they were a string. ex: 12,5 -> 125)
op_concatenate:
        POP EBX
        POP EAX
        MOV ECX, EAX
        MOV EDI, 10
concat_loop:
        XOR EDX, EDX
        SHL EBX, 1
        DIV EDI
        LEA EBX, [4 * EBX + EBX]
        TEST EAX, EAX
        JNZ concat_loop

        ADD EBX, ECX
        PUSH EBX
        JMP mainloop

        ; ************** INC 0001 (1): Increment (Add 1 to the number on the top of the stack)
op_increment:
        POP EAX
        ADD EAX, 1
        PUSH EAX
        JNC mainloop
        JMP fault_intoverflow

        ; ************** DEC 0010 (2): Decrement (Subtract one from the number at the top of the stack)
op_decrement:
        POP EAX
        SUB EAX, 1
        PUSH EAX
        JNC mainloop
        JMP fault_intoverflow

        ; ************** MUL 0011 (3): Multiply (Multiply the top two numbers in the stack)
op_multiply:
        POP EAX
        POP EDX
        MUL EDX
        TEST EDX, EDX
        PUSH EAX
        JZ mainloop
        JMP fault_intoverflow

        ; ************** DIV 0100 (4): Divide (Divide the 2nd-to-top number by the top number on the stack)
op_divide:
        POP ECX
        TEST ECX, ECX
        POP EAX
        JZ fault_dividebyzero
        XOR EDX, EDX
        DIV ECX
        PUSH EAX
        JMP mainloop

        ; ************** MOD 0101 (5): Add (Add the top two numbers on the stack)
op_add:
        POP EAX
        ADD [ESP], EAX
        JNC mainloop
        JMP fault_intoverflow

        ; ************** SUB 0110 (6): Subtract (Subtract the top number on the stack from the one below it)
op_subtract:
        POP EAX
        SUB [ESP], EAX
        JNC mainloop
        JMP fault_intoverflow

        ; ************** EXP 0111 (7): Exponent (Calculate the second-to-top number to the power of the top number)
op_exponent:
        POP ECX
        POP EBX
        MOV EAX, 1
exploop:
        TEST ECX, 1
        JZ expnomult
        MUL EBX
        TEST EDX, EDX
        JNZ fault_intoverflow
expnomult:
        SHR ECX, 1
        JZ expdone
        XCHG EAX, EBX
        MUL EAX
        TEST EDX, EDX
        XCHG EAX, EBX
        JZ exploop
        JMP fault_intoverflow
expdone:
        PUSH EAX
        JMP mainloop

        ; ************** MOD 1000 (8): Modulus: (Find the second-to-top number modulo the top one)
op_modulus:
        POP ECX
        TEST ECX, ECX
        POP EAX
        JZ fault_dividebyzero
        XOR EDX, EDX
        IDIV ECX
        PUSH EDX
        JMP mainloop

        ; ************** ROR 1001 (9): Rotate Right (Shift the stack down one. The number on the bottom is now on the top)
op_rotright:
        MOV EAX, [EBP - 4]
        LEA ECX, [EBP - 4]
        SUB ECX, ESP
        MOV EDX, ESI
        SHR ECX, 2
        LEA EDI, [EBP - 4]
        LEA ESI, [EBP - 8]
        STD
        REP MOVSD
        MOV [ESP], EAX
        CLD
        MOV ESI, EDX
        JMP mainloop

        ; ************** ROL 1010 (A): Rotate Left (Shift the stack up one. The number on the top is now on the bottom)
op_rotleft:
        MOV EAX, [ESP]
        LEA ECX, [EBP - 4]
        SUB ECX, ESP
        MOV EDX, ESI
        SHR ECX, 2
        LEA ESI, [ESP + 4]
        MOV EDI, ESP
        REP MOVSD
        MOV [EBP - 4], EAX
        MOV ESI, EDX
        JMP mainloop

        ; ************** DUP 1011 (B): Duplicate (Copy the top number so that it appears twice. ex: 4,1 becomes 4,1,1)
op_duplicate:
        PUSH DWORD [ESP]
        JMP mainloop

        ; ************** DU2 1100 (C): Double Duplicate (Copy the top two numbers on the stack. ex: 4,1,2 becomes 4,1,2,1,2)
op_dblduplicate:
        PUSH DWORD [ESP+4]
        PUSH DWORD [ESP+4]
        JMP mainloop

        ; ************** SWP 1101 (D): Swap (Swap the top two numbers on the stack. ex: 4,1,2 becomes 4,2,1)
op_swap:
        POP EAX
        POP EDX
        PUSH EAX
        PUSH EDX
        JMP mainloop

        ; ************** SW2 1110 (E): Double Swap (Swap the top two numbers with two below them.ex: 1,2,3,4,5 becomes 1,4,5,2,3)
op_dblswap:
        POP EAX
        POP EBX
        POP ECX
        POP EDX
        PUSH EBX
        PUSH EAX
        PUSH EDX
        PUSH ECX
        JMP mainloop

        ; ************** POP 1111 (F): Delete/Pop (Remove the number at the top of the stack)
op_pop:
        POP EAX
        JMP mainloop


; ---------------------------------------------------------------------------------------
; End of the program: print out the resulting stack and exit
; ---------------------------------------------------------------------------------------

endloop:
        MOV ESI, ESP

printloop:
        CMP ESI, EBP
        JNB exit
        MOV EAX, [ESI]
        MOV EBX, ESI
        PUSH EBX
        CALL printnum
        POP EBX
        LEA ESI, [EBX + 4]
        JMP printloop

exit:
        MOV ESP, EBP
        ;POP EAX
        XOR EAX, EAX
        RET


; ---------------------------------------------------------------------------------------
; Faults
; ---------------------------------------------------------------------------------------

fault_invalidopcode:
        MOV EAX, err_invalidopcode
        JMP fault

fault_stackunderflow:
        MOV EAX, err_stackunderflow
        JMP fault

fault_dividebyzero:
        MOV EAX, err_dividebyzero
        JMP fault

fault_intoverflow:
        MOV EAX, err_intoverflow
        JMP fault

fault:
        CALL print
        MOV EAX, crlf
        CALL print

        MOV ESP, EBP
        MOV EAX, 1
        RET


missingparam:
        MOV EAX, err_missingparameter
        JMP fault

cannotopenfile:
        MOV EAX, err_cannotopenfile
        JMP fault

outofmemory:
        MOV EAX, err_outofmemory
        JMP fault

; ---------------------------------------------------------------------------------------
; Helper functions
; ---------------------------------------------------------------------------------------

printnum:
        MOV EBX, 10
        CALL printnumrec
        MOV EAX, crlf
        JMP print

printnumrec:
        PUSH EAX
        PUSH EDX
        XOR EDX, EDX
        DIV EBX
        TEST EAX, EAX
        JZ printnumend
        CALL printnumrec
printnumend:
        MOV EAX, EDX
        CALL printdigit
        POP EDX
        POP EAX
        RET


printdigit:
        ADD EAX, '0'
        MOV [printbuff], EAX
        MOV EAX, printbuff
        JMP print


print:
        MOV  ESI, EAX
        PUSH 0
        PUSH buff
        CALL strlen
        PUSH EAX
        PUSH ESI
        PUSH -11
        CALL _GetStdHandle@4
        PUSH EAX
        CALL _WriteFile@20
        RET

strlen:
        XOR ECX, ECX
strlen_loop:
        CMP BYTE [ESI+ECX], 0
        JE strlen_end
        LEA ECX, [ECX+1]
        JMP strlen_loop
strlen_end:
        MOV EAX, ECX
        RET


; ---------------------------------------------------------------------------------------
; Data
; ---------------------------------------------------------------------------------------

section .data

; Table of opcode handlers and required stack space (in bytes, i.e. 4*operands)
opcodetable:
        DD op_concatenate, 8
        DD op_increment, 4
        DD op_decrement, 4
        DD op_multiply, 8
        DD op_divide, 8
        DD op_add, 8
        DD op_subtract, 8
        DD op_exponent, 8
        DD op_modulus, 8
        DD op_rotright, 0
        DD op_rotleft, 0
        DD op_duplicate, 4
        DD op_dblduplicate, 8
        DD op_swap, 8
        DD op_dblswap, 16
        DD op_pop, 4
maxop:

crlf                    DB 13, 10, 0
err_invalidopcode       DB "Invalid opcode", 0
err_stackunderflow      DB "Stack underflow", 0
err_dividebyzero        DB "Division by zero", 0
err_intoverflow         DB "Integer overflow", 0

err_missingparameter:   DB "Missing parameter: Use nexlang file.bin", 0
err_cannotopenfile:     DB "Unable to open input file", 0
err_outofmemory:        DB "Not enough memory", 0

section .bss

programend      RESD 1
printbuff       RESD 1
buff            RESD 1

To compile this, use something like

nasm.exe -fwin32 nexlang.asm
ld -o nexlang.exe -e _start nexlang.obj -s -lkernel32

The program receives the name of the binary file containing the program on the command line (e.g. nexlang.exe testprg.bin). When finished, it prints the final contents of the stack to standard output in a human-readable format.

To help with testing, save the following into nex.def:

%define CAT DB 00h
%define INC DB 01h
%define DEC DB 02h
%define MUL DB 03h
%define DIV DB 04h
%define ADD DB 05h
%define SUB DB 06h
%define EXP DB 07h
%define MOD DB 08h
%define ROR DB 09h
%define ROL DB 0Ah
%define DUP DB 0Bh
%define DU2 DB 0Ch
%define SWP DB 0Dh
%define SW2 DB 0Eh
%define POP DB 0Fh

And then write your NEX (“non-existing”, as named in the question title) programs using the above-defined mnemonics, and compile with something like

nasm.exe -p nex.def -o prg.bin prg.nex

E.g. for the original test case, use the following prg.nex:

INC     ; 1
INC     ; 2
INC     ; 3
INC     ; 4
DUP     ; 4 4
DU2     ; 4 4 4 4
ADD     ; 8 4 4
DU2     ; 8 4 8 4 4
ADD     ; 12 8 4 4
DUP     ; 12 12 8 4 4
ROR     ; 4 12 12 8 4
ADD     ; 16 12 8 4

And finally, for the “2014” challenge, use the following 14-byte NEX program:

DUP     ; 0 0
DUP     ; 0 0 0
INC     ; 1 0 0
INC     ; 2 0 0
SWP     ; 0 2 0
CAT     ; 20 0
SWP     ; 0 20
INC     ; 1 20
DUP     ; 1 1 20
INC     ; 2 1 20
INC     ; 3 1 20
INC     ; 4 1 20
CAT     ; 14 20
CAT     ; 2014

GolfScript, 64 chars

OK, so I decided to try and golf this. And what better language for golfing than GolfScript?

Conveniently, GolfScript itself is already a stack-based language with single-byte commands, and, as it happens, 11 out of your 16 commands map directly to built-in GolfScript commands. So all I really need to do to interpret your language is to implement the remaining five commands in GolfScript and build a translation table:

0\{'`+~
)
(
*
/
+
-
?
%
](+~
])\~
.
1$1$
\
[@]\+~\
;'n%=~}/]-1%`

The code looks kind of spread out, because I'm using newlines as delimiters for the translation table. The initial 0\ pushes a zero onto the stack and moves it below the input program. The { }/ loop, comprising most of the code, takes the input program off the stack and iterates the loop body over each of its characters, and the final ]-1%` collects the stack into an array, reverses it (because your sample output starts from the top of the stack) and stringifies it.

The loop body starts with a 16-line single-quoted string. n% splits this string at line breaks, = looks up the substring corresponding to the input character, and ~ evaluates the substring as GolfScript code.

Finally, here are the GolfScript implementations of the 16 commands:

  • 0 = `+~: concatenate two numbers as strings
  • 1 = ): increment
  • 2 = (: decrement
  • 3 = *: multiply
  • 4 = /: divide
  • 5 = +: add
  • 6 = -: subtract
  • 7 = ?: raise to power
  • 8 = %: modulus
  • 9 = ](+~: rotate stack right
  • A = ])\~: rotate stack left
  • B = .: duplicate
  • C = 1$1$: double duplicate
  • D = \: swap
  • E = [@]\+~\: double swap
  • F = ;: pop

I'm kind of unhappy with the double swap — it's ugly and much longer than any of the other commands. It feels like there ought to be a better way, but if so, I haven't found it yet. Still, at least it works.

For example, running the program above on the input (given as a GolfScript / Ruby / Perl / Python / etc. double-quoted string):

"\x01\x01\x0B\x0C\x05\x0C\x05\x0B\x09\x05"

yields the output:

[8 6 4 2]

Edit: I managed to save two more chars, for a total of 62 chars, using a more compact encoding of the translation table. However, it kind of sacrifices readability:

0\{(')(*/+-?%'1/'](+~
])\~
.
1$1$
\
[@]\+~\
;
`+~'n/+=~}/]-1%`

Notable features of this version include the ( at the beginning of the loop, which shifts the command indices from 0..15 to -1..14 so that I can put the long sequence of single-character commands from 1 to 8 at the beginning of the table. This allows me to store them in a separate string and eliminate the eight newlines delimiting them; alas, the extra complexity costs me six characters elsewhere.